It looks like have a batch size of 1000 which could mean the sink is waiting for a 1000 entries...
node102.sinks.filesink1.batchSize = 1000 On Wed, Sep 12, 2012 at 3:12 PM, Cochran, David M (Contractor) <[email protected]> wrote: > Putting a copy of hadoop-core.jar in the lib directory did the trick.. at > least it made the errors go away.. > > Just trying to sort out why nothing is getting written to the sink's files > now... but when I add entries to the file being tailed nothing makes it to > the sink log file(s). guess I need to run tcpdump on that port and see if > anything is being sent or if the problem is on the receive side now. > > Thanks for the help! > Dave > > > > -----Original Message----- > From: Brock Noland [mailto:[email protected]] > Sent: Wed 9/12/2012 12:41 PM > To: [email protected] > Subject: Re: splitting functions > > Yeah that is my fault. FileChannel uses a few hadoop classes for > serialization. I want to get rid of that but it's just not a priority > item. You either need the hadoop command in your path or the > hadoop-core.jar in your lib directory. > > On Wed, Sep 12, 2012 at 1:38 PM, Cochran, David M (Contractor) > <[email protected]> wrote: >> Brock, >> >> Thanks for the sample! Starting to see a bit more light and making a little >> more sense now... >> >> If you wouldn't mind and have a couple mins to spare...I'm getting this >> error and not sure how to make it go away.. I can not use hadoop for storage >> instead just FILE_ROLL (ultimately the logs will need to be processed >> further in plain text) I'm just not sure why.... >> >> The error follows and my conf further down. >> >> 12 Sep 2012 13:18:54,120 INFO [lifecycleSupervisor-1-0] >> (org.apache.flume.channel.file.FileChannel.start:211) - Starting >> FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, >> /tmp/flume/data3] }... >> 12 Sep 2012 13:18:54,124 ERROR [lifecycleSupervisor-1-0] >> (org.apache.flume.channel.file.FileChannel.start:234) - Failed to start the >> file channel [channel=fileChannel] >> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable >> at java.lang.ClassLoader.defineClass1(Native Method) >> at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) >> at java.lang.ClassLoader.defineClass(ClassLoader.java:615) >> at >> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) >> at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) >> at java.net.URLClassLoader.access$000(URLClassLoader.java:58) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:197) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >> at org.apache.flume.channel.file.Log$Builder.build(Log.java:144) >> at >> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223) >> at >> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >> at >> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) >> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >> ... 24 more >> 12 Sep 2012 13:18:54,126 ERROR [lifecycleSupervisor-1-0] >> (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:238) - >> Unable to start FileChannel fileChannel { dataDirs: [/tmp/flume/data1, >> /tmp/flume/data2, /tmp/flume/data3] } - Exception follows. >> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable >> at java.lang.ClassLoader.defineClass1(Native Method) >> at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) >> at java.lang.ClassLoader.defineClass(ClassLoader.java:615) >> at >> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) >> at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) >> at java.net.URLClassLoader.access$000(URLClassLoader.java:58) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:197) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >> at org.apache.flume.channel.file.Log$Builder.build(Log.java:144) >> at >> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:223) >> at >> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >> at >> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) >> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >> ... 24 more >> 12 Sep 2012 13:18:54,127 INFO [lifecycleSupervisor-1-0] >> (org.apache.flume.channel.file.FileChannel.stop:249) - Stopping FileChannel >> fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, >> /tmp/flume/data3] }... >> 12 Sep 2012 13:18:54,127 ERROR [lifecycleSupervisor-1-0] >> (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:249) - >> Unsuccessful attempt to shutdown component: {} due to missing dependencies. >> Please shutdown the agentor disable this component, or the agent will bein >> an undefined state. >> java.lang.IllegalStateException: Channel closed[channel=fileChannel] >> at >> com.google.common.base.Preconditions.checkState(Preconditions.java:145) >> at >> org.apache.flume.channel.file.FileChannel.getDepth(FileChannel.java:282) >> at >> org.apache.flume.channel.file.FileChannel.stop(FileChannel.java:250) >> at >> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:244) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >> at >> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) >> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> 12 Sep 2012 13:18:54,622 INFO [conf-file-poller-0] >> (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:141) >> - Starting Sink filesink1 >> 12 Sep 2012 13:18:54,624 INFO [conf-file-poller-0] >> (org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.startAllComponents:152) >> - Starting Source avroSource >> 12 Sep 2012 13:18:54,626 INFO [lifecycleSupervisor-1-1] >> (org.apache.flume.source.AvroSource.start:138) - Starting Avro source >> avroSource: { bindAddress: 0.0.0.0, port: 9432 }... >> 12 Sep 2012 13:18:54,641 ERROR >> [SinkRunner-PollingRunner-DefaultSinkProcessor] >> (org.apache.flume.SinkRunner$PollingRunner.run:160) - Unable to deliver >> event. Exception follows. >> java.lang.IllegalStateException: Channel closed [channel=fileChannel] >> at >> com.google.common.base.Preconditions.checkState(Preconditions.java:145) >> at >> org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:267) >> at >> org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:118) >> at >> org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:172) >> at >> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) >> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) >> at java.lang.Thread.run(Thread.java:662) >> >> >> >> >> Using your config this is my starting point... (trying to get it functioning >> on a single host first) >> >> node105.sources = tailsource >> node105.channels = fileChannel >> node105.sinks = avroSink >> >> node105.sources.tailsource.type = exec >> node105.sources.tailsource.command =tail -F >> /root/Desktop/apache-flume-1.3.0-SNAPSHOT/test.log >> #node105.sources.stressSource.batchSize = 1000 >> node105.sources.tailsource.channels = fileChannel >> >> ## Sink sends avro messages to node103.bashkew.com port 9432 >> node105.sinks.avroSink.type = avro >> node105.sinks.avroSink.batch-size = 1000 >> node105.sinks.avroSink.channel = fileChannel >> node105.sinks.avroSink.hostname = localhost >> node105.sinks.avroSink.port = 9432 >> >> node105.channels.fileChannel.type = file >> node105.channels.fileChannel.checkpointDir = >> /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/checkpoint >> node105.channels.fileChannel.dataDirs = >> /root/Desktop/apache-flume-1.3.0-SNAPSHOT/tmp/flume/tmp/flume/data >> node105.channels.fileChannel.capacity = 10000 >> node105.channels.fileChannel.checkpointInterval = 3000 >> node105.channels.fileChannel.maxFileSize = 5242880 >> >> node102.sources = avroSource >> node102.channels = fileChannel >> node102.sinks = filesink1 >> >> ## Source listens for avro messages on port 9432 on all ips >> node102.sources.avroSource.type = avro >> node102.sources.avroSource.channels = fileChannel >> node102.sources.avroSource.bind = 0.0.0.0 >> node102.sources.avroSource.port = 9432 >> >> node102.sinks.filesink1.type = FILE_ROLL >> node102.sinks.filesink1.batchSize = 1000 >> node102.sinks.filesink1.channel = fileChannel >> node102.sinks.filesink1.sink.directory = >> /root/Desktop/apache-flume-1.3.0-SNAPSHOT/logs/rhel5/ >> node102.channels.fileChannel.type = file >> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints >> node102.channels.fileChannel.dataDirs = >> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3 >> node102.channels.fileChannel.capacity = 5000 >> node102.channels.fileChannel.checkpointInterval = 45000 >> node102.channels.fileChannel.maxFileSize = 5242880 >> >> >> >> Thanks! >> Dave >> >> >> -----Original Message----- >> From: Brock Noland [mailto:[email protected]] >> Sent: Wed 9/12/2012 9:11 AM >> To: [email protected] >> Subject: Re: splitting functions >> >> Hi, >> >> Below is a config I use to test out the FileChannel. See the comments >> "##" for how messages are sent from one host to another. >> >> node105.sources = stressSource >> node105.channels = fileChannel >> node105.sinks = avroSink >> >> node105.sources.stressSource.type = org.apache.flume.source.StressSource >> node105.sources.stressSource.batchSize = 1000 >> node105.sources.stressSource.channels = fileChannel >> >> ## Sink sends avro messages to node103.bashkew.com port 9432 >> node105.sinks.avroSink.type = avro >> node105.sinks.avroSink.batch-size = 1000 >> node105.sinks.avroSink.channel = fileChannel >> node105.sinks.avroSink.hostname = node102.bashkew.com >> node105.sinks.avroSink.port = 9432 >> >> node105.channels.fileChannel.type = file >> node105.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints >> node105.channels.fileChannel.dataDirs = >> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3 >> node105.channels.fileChannel.capacity = 10000 >> node105.channels.fileChannel.checkpointInterval = 3000 >> node105.channels.fileChannel.maxFileSize = 5242880 >> >> node102.sources = avroSource >> node102.channels = fileChannel >> node102.sinks = nullSink >> >> >> ## Source listens for avro messages on port 9432 on all ips >> node102.sources.avroSource.type = avro >> node102.sources.avroSource.channels = fileChannel >> node102.sources.avroSource.bind = 0.0.0.0 >> node102.sources.avroSource.port = 9432 >> >> node102.sinks.nullSink.type = null >> node102.sinks.nullSink.batchSize = 1000 >> node102.sinks.nullSink.channel = fileChannel >> >> node102.channels.fileChannel.type = file >> node102.channels.fileChannel.checkpointDir = /tmp/flume/checkpoints >> node102.channels.fileChannel.dataDirs = >> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3 >> node102.channels.fileChannel.capacity = 5000 >> node102.channels.fileChannel.checkpointInterval = 45000 >> node102.channels.fileChannel.maxFileSize = 5242880 >> >> >> >> On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor) >> <[email protected]> wrote: >>> Okay folks, after spending the better part of a week reading the docs and >>> experimenting I'm lost. I have flume 1.3.x working pretty much as expected >>> on a single host. It tails a log file and writes it to another rolling log >>> file via flume. No problem there, seems to work flawlessly. Where my issue >>> is trying to break apart the functions across multiple hosts... a single >>> host listening for others to send their logs to. All of my efforts have >>> resulted in little more than headaches. >>> >>> I can't even see the specified port open on what should be the logging host. >>> I've tried the basic examples posted on different docs but can't seem to get >>> things working across multiple hosts. >>> >>> Would someone post a working example of the conf's needed to get me started? >>> Something simple that works, so I can them pick it apart to gain more >>> understanding. Apparently, I just don't yet have a firm enough grasp on all >>> the pieces yet, but want to learn! >>> >>> Thanks in advance! >>> Dave >>> >>> >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ >> > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
