Hadoop 2x Compatability

2014-08-29 Thread Chris Horrocks
Hi All, I'm pretty new to Flume so forgive the newbish question, but I've been working with Hadoop 2x for a little while. I'm trying to configure flume (1.5.0) with a HDFS sink however the agent won't start citing the following error: 29 Aug 2014 13:40:13,435 ERROR [conf-file-poller-0] (org.apac

Re: Hadoop 2x Compatability

2014-08-29 Thread Chris Horrocks
e sink > fine and is running without any problem. > > > Let me know if you want to see jars in lib folder of my flume installation. > ------ > From: Chris Horrocks > Sent: ‎29-‎08-‎2014 20:16 > To: user@flume.apache.org > Subject: Hadoop 2x Compata

Re: Hadoop 2x Compatability

2014-08-30 Thread Chris Horrocks
rmat in flume > hdfs sink setting. Try changing from > hdfs.fileType SequenceFile to > > hdfs.fileType DataStream > > in your flume conf file. > > Inline image 1 > > > On Fri, Aug 29, 2014 at 8:39 PM, Chris Horrocks > mailto:chrisjhorro.

Re: Flume/hadoop question

2016-03-20 Thread Chris Horrocks
Does the hadoop slave have the HDFS client config & jars? How are you deploying the flume agent? Are you using a hadoop distribution manager like Cloudera Manager/Ambari/etc or is it a standalone instance? > On 20 Mar 2016, at 15:12, Kartik Vashishta wrote: > > hadoop slave signature.asc De

Re: Priority and Expiry time of flume events

2016-04-15 Thread Chris Horrocks
The Kafka channel would allow you to set event retention within Kafka. > On 15 Apr 2016, at 12:54, Gonzalo Herreros wrote: > > That would depend on the channel. > AFAIK, all the channels provided are FIFO without expiration but technically > you could implement a channel that does that. > > Y

RE: Flume not marking log files as completed and do not process file further

2016-04-20 Thread Chris Horrocks
Are the permissions on the files the same? Does the user running the flume agents have read permissions? Are the files still being written to/locked open by another process? Are there any logs being generated by the flume agent? -- Chris Horrocks On 20 April 2016 at 08:00:14, Saurabh Sharma

Re: Kafka Sink random partition assignment

2016-06-07 Thread Chris Horrocks
ucer groups (yet) to ensure that producers apportion the available partitions between them as this would create a synchronisation issue between what should be entirely independent processes. -- Chris Horrocks On 7 June 2016 at 00:32:29, Jason J. W. Williams (jasonjwwilli...@gmail.com

Re: Kafka Sink random partition assignment

2016-06-07 Thread Chris Horrocks
It's by design of Kafka (and by extension flume). The producers are designed to be many-to-one (producers to partitions) and as such picking a random partition every 10 minutes prevents separate producer instances from all randomly picking the same partition.  --  Chris Horrocks From: 

Re: Kafka Sink random partition assignment

2016-06-07 Thread Chris Horrocks
r use-case. --  Chris Horrocks From: Jason J. W. Williams Reply: user@flume.apache.org Date: 7 June 2016 at 19:03:59 To: user@flume.apache.org Subject:  Re: Kafka Sink random partition assignment Thanks again Chris. I am curious why I see the round-robin behavior I expected when using kaf

Re: Interceptors on a replicated source fan-out

2016-06-16 Thread Chris Horrocks
eptor implementations would allow you to conditionally bind certain events to a specific channel. I suppose you could always write a custom interceptor to do it. --  Chris Horrocks From: Hai Thai Reply: user@flume.apache.org Date: 15 June 2016 at 22:40:07 To: user@flume.apache.org Su

Re: File left as OPEN_FOR_WRITE state.

2016-07-20 Thread Chris Horrocks
Have you tried increasing the HDFS sink timeouts? -- Chris Horrocks On Wed, Jul 20, 2016 at 8:03 am, no jihun <'jees...@gmail.com'> wrote: Hi. I found some files on hdfs left as OPEN_FOR_WRITE state. This is flume's log about the file. 01 18 7 2016 16:12:02

Re: File left as OPEN_FOR_WRITE state.

2016-07-20 Thread Chris Horrocks
You could look at tuning either hdfs.idleTimeout, hdfs.callTimeout, or hdfs.retryInterval which can all be found at: http://flume.apache.org/FlumeUserGuide.html#hdfs-sink -- Chris Horrocks On Wed, Jul 20, 2016 at 9:01 am, no jihun <'jees...@gmail.com'> wrote: @ch

Re: File left as OPEN_FOR_WRITE state.

2016-07-20 Thread Chris Horrocks
In fact looking at your error the timeout looks like the hdfs.callTimeout, so that's where I'd focus. Is your HDFS cluster particularily unperformant? 10s to respond to a call is pretty slow. -- Chris Horrocks On Wed, Jul 20, 2016 at 9:25 am, Chris Horrocks <'chris@hor.r

Re: Is it a good idea to use Flume Interceptor to process data?

2016-07-27 Thread Chris Horrocks
into Spark Streaming and keeping flume as low overhead as possible, particularily if it's monitoring data that's latency sensitive. For storing the calculations variables for consumption by the interceptor I'd go with something like ZooKeeper. -- Chris Horrocks On Wed, Jul 27

Re: Flume JMS message timestamp from IBM MQ => Kafka

2016-09-26 Thread Chris Horrocks
If the time stamp is passed as part of the flume event body you could extract it via an interceptor and only pass the headers to Kafka. -- Chris Horrocks -- Chris Horrocks On Mon, Sep 26, 2016 at 12:02 am, Kevin Tran <'kevin...@gmail.com'> wrote: Hi, Does anyone know ho

Re: how to make KafkaSource consume the existing messages

2016-10-13 Thread Chris Horrocks
Hi, Which version of Kafka are you using? Off the top of my head it should be: tier2.sources.source1.kafka.auto.offset.reset = earliest Of course changing the group ID or if it's an older version of Kafka removing the corresponding offset znode from zookeeper ought to do the trick --

Re: Understand JMS source + HDFS sink batch management

2016-11-16 Thread Chris Horrocks
Hi Roberto, Setting the roll intervals to 0 will stop the sink rolling the files in HDFS. Try setting hdfs.rollCount to the number of messages you want to roll the file on (I.e. The number of messages per file). Bare in mind setting this low will result in higher HDFS overhead. -- Chris

Re: A puzzy problem about flume Failover Sink Processor

2017-04-01 Thread Chris Horrocks
The following error occurs when your flume agent tries to write to the standby NameNode: "Operation category WRITE is not supported in state standby" What failover mechanism are you using for your NameNodes? -- Chris Horrocks On Sat, Apr 1, 2017 at 11:31 am, hui@wbkit.com wrote

Re: Where to put the flume agents within a cluster

2017-06-24 Thread Chris Horrocks
Hi Ive seen this before. If you put a flume agent on a worker node that is running a HDFS data node, and asusming you are using flume to write into HDFS, you will find that the worker that has the flume agent on it will be the data node chosen to house the (first replica of the) data. This may s