Re: splitting functions

Brock Noland Wed, 12 Sep 2012 15:02:22 -0700

Nevermind, it doesn't look like FILE_ROLL supports batching....

On Wed, Sep 12, 2012 at 4:56 PM, Brock Noland <[email protected]> wrote:
> It looks like have a batch size of 1000 which could mean the sink is
> waiting for a 1000 entries...
>
> node102.sinks.filesink1.batchSize = 1000
>
>
>
> On Wed, Sep 12, 2012 at 3:12 PM, Cochran, David M (Contractor)
> <[email protected]> wrote:
>> Putting a copy of hadoop-core.jar in the lib directory did the trick.. at 
>> least it made the errors go away..
>>
>> Just trying to sort out why nothing is getting written to the sink's files 
>> now... but when I add entries to the file being tailed nothing makes it to 
>> the sink log file(s). guess I need to run tcpdump on that port and see if 
>> anything is being sent or if the problem is on the receive side now.
>>
>> Thanks for the help!
>> Dave
>>
>>
>>
>> -----Original Message-----
>> From: Brock Noland [mailto:[email protected]]
>> Sent: Wed 9/12/2012 12:41 PM
>> To: [email protected]
>> Subject: Re: splitting functions
>>
>> Yeah that is my fault. FileChannel uses a few hadoop classes for
>> serialization. I want to get rid of that but it's just not a priority
>> item. You either need the hadoop command in your path or the
>> hadoop-core.jar in your lib directory.
>>
>> On Wed, Sep 12, 2012 at 1:38 PM, Cochran, David M (Contractor)
>> <[email protected]> wrote:
>>> Brock,
>>>
>>> Thanks for the sample!  Starting to see a bit more light and making a 
>>> little more sense now...
>>>
>>> If you wouldn't mind and have a couple mins to spare...I'm getting this 
>>> error and not sure how to make it go away.. I can not use hadoop for 
>>> storage instead just FILE_ROLL (ultimately the logs will need to be 
>>> processed further in plain text)  I'm just not sure why....
>>>
>>> The error follows and my conf further down.
>>>
>>> 12 Sep 2012 13:18:54,120 INFO  [lifecycleSupervisor-1-0] 
>>> (org.apache.flume.channel.file.FileChannel.start:211)  - Starting 
>>> FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, 
>>> /tmp/flume/data3] }...
>>> 12 Sep 2012 13:18:54,124 ERROR [lifecycleSupervisor-1-0] 
>>> (org.apache.flume.channel.file.FileChannel.start:234)  - Failed to start 
>>> the file channel [channel=fileChannel]
>>> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>>>         at 
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>>>         at 
>>> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>>>         at 
>>> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>>>         at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>         at 
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>>         at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>         ... 24 more
>>> 12 Sep 2012 13:18:54,126 ERROR [lifecycleSupervisor-1-0] 
>>> (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:238)  - 
>>> Unable to start FileChannel fileChannel { dataDirs: [/tmp/flume/data1, 
>>> /tmp/flume/data2, /tmp/flume/data3] } - Exception follows.
>>> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
>>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>>         at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>>>         at 
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>>         at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>         at org.apache.flume.channel.file.Log$Builder.build(Log.java:144)
>>>         at 
>>> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223)
>>>         at 
>>> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
>>>         at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>         at 
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>>         at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>         ... 24 more
>>> 12 Sep 2012 13:18:54,127 INFO  [lifecycleSupervisor-1-0] 
>>> (org.apache.flume.channel.file.FileChannel.stop:249)  - Stopping 
>>> FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, 
>>> /tmp/flume/data3] }...
>>> 12 Sep 2012 13:18:54,127 ERROR [lifecycleSupervisor-1-0] 
>>> (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:249)  - 
>>> Unsuccessful attempt to shutdown component: {} due to missing dependencies. 
>>> Please shutdown the agentor disable this component, or the agent will bein 
>>> an undefined state.
>>> java.lang.IllegalStateException: Channel closed[channel=fileChannel]
>>>         at 
>>> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>>>         at 
>>> org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:282)
>>>         at 
>>> org.apache.flume.channel.file.FileChannel.stop(FileChannel.java:250)
>>>         at 
>>> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:244)
>>>         at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>         at 
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>>         at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>>         at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:662)
>>> 12 Sep 2012 13:18:54,622 INFO  [conf-file-poller-0] 
>>> (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:141)
>>>   - Starting Sink filesink1
>>> 12 Sep 2012 13:18:54,624 INFO  [conf-file-poller-0] 
>>> (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:152)
>>>   - Starting Source avroSource
>>> 12 Sep 2012 13:18:54,626 INFO  [lifecycleSupervisor-1-1] 
>>> (org.apache.flume.source.AvroSource.start:138)  - Starting Avro source 
>>> avroSource: { bindAddress: 0.0.0.0, port: 9432 }...
>>> 12 Sep 2012 13:18:54,641 ERROR 
>>> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
>>> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver 
>>> event. Exception follows.
>>> java.lang.IllegalStateException: Channel closed [channel=fileChannel]
>>>         at 
>>> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>>>         at 
>>> org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:267)
>>>         at 
>>> org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118)
>>>         at 
>>> org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:172)
>>>         at 
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>         at 
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>         at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>>
>>>
>>> Using your config this is my starting point... (trying to get it 
>>> functioning on a single host first)
>>>
>>> node105.sources = tailsource
>>> node105.channels = fileChannel
>>> node105.sinks = avroSink
>>>
>>> node105.sources.tailsource.type = exec
>>> node105.sources.tailsource.command =tail -F 
>>> /root/Desktop/apache-flume-1.3.0-SNAPSHOT/test.log
>>> #node105.sources.stressSource.batchSize = 1000
>>> node105.sources.tailsource.channels = fileChannel
>>>
>>> ## Sink sends avro messages to node103.bashkew.com port 9432
>>> node105.sinks.avroSink.type = avro
>>> node105.sinks.avroSink.batch-size = 1000
>>> node105.sinks.avroSink.channel = fileChannel
>>> node105.sinks.avroSink.hostname = localhost
>>> node105.sinks.avroSink.port = 9432
>>>
>>> node105.channels.fileChannel.type = file
>>> node105.channels.fileChannel.checkpointDir = 
>>> /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/checkpoint
>>> node105.channels.fileChannel.dataDirs = 
>>> /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/tmp/flume/data
>>> node105.channels.fileChannel.capacity = 10000
>>> node105.channels.fileChannel.checkpointInterval = 3000
>>> node105.channels.fileChannel.maxFileSize = 5242880
>>>
>>> node102.sources = avroSource
>>> node102.channels = fileChannel
>>> node102.sinks = filesink1
>>>
>>> ## Source listens for avro messages on port 9432 on all ips
>>> node102.sources.avroSource.type = avro
>>> node102.sources.avroSource.channels = fileChannel
>>> node102.sources.avroSource.bind = 0.0.0.0
>>> node102.sources.avroSource.port = 9432
>>>
>>> node102.sinks.filesink1.type = FILE_ROLL
>>> node102.sinks.filesink1.batchSize = 1000
>>> node102.sinks.filesink1.channel = fileChannel
>>> node102.sinks.filesink1.sink.directory = 
>>> /root/Desktop/apache-flume-1.3.0-SNAPSHOT/logs/rhel5/
>>> node102.channels.fileChannel.type = file
>>> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>>> node102.channels.fileChannel.dataDirs = 
>>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>>> node102.channels.fileChannel.capacity = 5000
>>> node102.channels.fileChannel.checkpointInterval = 45000
>>> node102.channels.fileChannel.maxFileSize = 5242880
>>>
>>>
>>>
>>> Thanks!
>>> Dave
>>>
>>>
>>> -----Original Message-----
>>> From: Brock Noland [mailto:[email protected]]
>>> Sent: Wed 9/12/2012 9:11 AM
>>> To: [email protected]
>>> Subject: Re: splitting functions
>>>
>>> Hi,
>>>
>>> Below is a config I use to test out the FileChannel. See the comments
>>> "##" for how messages are sent from one host to another.
>>>
>>> node105.sources = stressSource
>>> node105.channels = fileChannel
>>> node105.sinks = avroSink
>>>
>>> node105.sources.stressSource.type = org.apache.flume.source.StressSource
>>> node105.sources.stressSource.batchSize = 1000
>>> node105.sources.stressSource.channels = fileChannel
>>>
>>> ## Sink sends avro messages to node103.bashkew.com port 9432
>>> node105.sinks.avroSink.type = avro
>>> node105.sinks.avroSink.batch-size = 1000
>>> node105.sinks.avroSink.channel = fileChannel
>>> node105.sinks.avroSink.hostname = node102.bashkew.com
>>> node105.sinks.avroSink.port = 9432
>>>
>>> node105.channels.fileChannel.type = file
>>> node105.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>>> node105.channels.fileChannel.dataDirs =
>>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>>> node105.channels.fileChannel.capacity = 10000
>>> node105.channels.fileChannel.checkpointInterval = 3000
>>> node105.channels.fileChannel.maxFileSize = 5242880
>>>
>>> node102.sources = avroSource
>>> node102.channels = fileChannel
>>> node102.sinks = nullSink
>>>
>>>
>>> ## Source listens for avro messages on port 9432 on all ips
>>> node102.sources.avroSource.type = avro
>>> node102.sources.avroSource.channels = fileChannel
>>> node102.sources.avroSource.bind = 0.0.0.0
>>> node102.sources.avroSource.port = 9432
>>>
>>> node102.sinks.nullSink.type = null
>>> node102.sinks.nullSink.batchSize = 1000
>>> node102.sinks.nullSink.channel = fileChannel
>>>
>>> node102.channels.fileChannel.type = file
>>> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints
>>> node102.channels.fileChannel.dataDirs =
>>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3
>>> node102.channels.fileChannel.capacity = 5000
>>> node102.channels.fileChannel.checkpointInterval = 45000
>>> node102.channels.fileChannel.maxFileSize = 5242880
>>>
>>>
>>>
>>> On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor)
>>> <[email protected]> wrote:
>>>> Okay folks, after spending the better part of a week reading the docs and
>>>> experimenting I'm lost.  I have flume 1.3.x working pretty much as expected
>>>> on a single host.  It tails a log file and writes it to another rolling log
>>>> file via flume.  No problem there, seems to work flawlessly.  Where my 
>>>> issue
>>>> is trying to break apart the functions across multiple hosts... a single
>>>> host listening for others to send their logs to.  All of my efforts have
>>>> resulted in little more than headaches.
>>>>
>>>> I can't even see the specified port open on what should be the logging 
>>>> host.
>>>> I've tried the basic examples posted on different docs but can't seem to 
>>>> get
>>>> things working across multiple hosts.
>>>>
>>>> Would someone post a working example of the conf's needed to get me 
>>>> started?
>>>> Something simple that works, so I can them pick it apart to gain more
>>>> understanding.  Apparently, I just don't yet have a firm enough grasp on 
>>>> all
>>>> the pieces yet, but want to learn!
>>>>
>>>> Thanks in advance!
>>>> Dave
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/




-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: splitting functions

Reply via email to