issue with file (hdfs) inputs
? how can I be sure the input won’t “overflow” the process chain ?
From: Tobias Pfeiffer [mailto:t...@preferred.jp]
Sent: mardi 12 août 2014 02:58
To: Gwenhael Pasquiers
Cc: u...@spark.incubator.apache.org
Subject: Re: [spark-streaming] kafka source and flow control
[mailto:t...@preferred.jp]
Sent: lundi 11 août 2014 11:44
To: Gwenhael Pasquiers
Subject: Re: [spark-streaming] kafka source and flow control
Hi,
On Mon, Aug 11, 2014 at 6:19 PM, gpasquiers
gwenhael.pasqui...@ericsson.commailto:gwenhael.pasqui...@ericsson.com wrote:
I’m using spark-streaming
Subject: Re: [spark-streaming] kafka source and flow control
Hi,
On Mon, Aug 11, 2014 at 6:19 PM, gpasquiers
gwenhael.pasqui...@ericsson.commailto:gwenhael.pasqui...@ericsson.com wrote:
I’m using spark-streaming in a cloudera environment to consume a kafka
source and store all data into hdfs.
I
Hi,
On Mon, Aug 11, 2014 at 9:41 PM, Gwenhael Pasquiers
gwenhael.pasqui...@ericsson.com wrote:
We intend to apply other operations on the data later in the same spark
context, but our first step is to archive it.
Our goal is somth like this
Step 1 : consume kafka
Step 2 : archive to
In general, (and I am prototyping), I have a better idea :)
- Consume kafka in Spark from topic-A
- transform data in Spark (normalize, enrich etc etc)
- Feed it back to Kafka (in a different topic-B)
- Have flume-HDFS (for M/R, Impala, Spark batch) or Spark-streaming
or any other compute