RE: [spark-streaming] kafka source and flow control

2014-08-12 Thread Gwenhael Pasquiers
issue with file (hdfs) inputs ? how can I be sure the input won’t “overflow” the process chain ? From: Tobias Pfeiffer [mailto:t...@preferred.jp] Sent: mardi 12 août 2014 02:58 To: Gwenhael Pasquiers Cc: u...@spark.incubator.apache.org Subject: Re: [spark-streaming] kafka source and flow control

RE: [spark-streaming] kafka source and flow control

2014-08-11 Thread Gwenhael Pasquiers
[mailto:t...@preferred.jp] Sent: lundi 11 août 2014 11:44 To: Gwenhael Pasquiers Subject: Re: [spark-streaming] kafka source and flow control Hi, On Mon, Aug 11, 2014 at 6:19 PM, gpasquiers gwenhael.pasqui...@ericsson.commailto:gwenhael.pasqui...@ericsson.com wrote: I’m using spark-streaming

RE: [spark-streaming] kafka source and flow control

2014-08-11 Thread Gwenhael Pasquiers
Subject: Re: [spark-streaming] kafka source and flow control Hi, On Mon, Aug 11, 2014 at 6:19 PM, gpasquiers gwenhael.pasqui...@ericsson.commailto:gwenhael.pasqui...@ericsson.com wrote: I’m using spark-streaming in a cloudera environment to consume a kafka source and store all data into hdfs. I

Re: [spark-streaming] kafka source and flow control

2014-08-11 Thread Tobias Pfeiffer
Hi, On Mon, Aug 11, 2014 at 9:41 PM, Gwenhael Pasquiers gwenhael.pasqui...@ericsson.com wrote: We intend to apply other operations on the data later in the same spark context, but our first step is to archive it. Our goal is somth like this Step 1 : consume kafka Step 2 : archive to

Re: [spark-streaming] kafka source and flow control

2014-08-11 Thread Xuri Nagarin
In general, (and I am prototyping), I have a better idea :) - Consume kafka in Spark from topic-A - transform data in Spark (normalize, enrich etc etc) - Feed it back to Kafka (in a different topic-B) - Have flume-HDFS (for M/R, Impala, Spark batch) or Spark-streaming or any other compute