Re: Spark Streaming with Flume or Kafka?

Guillermo Ortiz Wed, 19 Nov 2014 13:58:13 -0800

Thank you for your answer, I don't know if I typed the question
correctly. But your nswer helps me.


I'm going to make the question again for knowing if you understood me.

I have this topology:

DataSource1, .... , DataSourceN --> Kafka --> SparkStreaming --> HDFS

DataSource1, .... , DataSourceN --> Flume --> SparkStreaming --> HDFS

All data are going to be pro


2014-11-19 21:50 GMT+01:00 Hari Shreedharan <hshreedha...@cloudera.com>:
> Btw, if you want to write to Spark Streaming from Flume -- there is a sink
> (it is a part of Spark, not Flume). See Approach 2 here:
> http://spark.apache.org/docs/latest/streaming-flume-integration.html
>
>
>
> On Wed, Nov 19, 2014 at 12:41 PM, Hari Shreedharan
> <hshreedha...@cloudera.com> wrote:
>>
>> As of now, you can feed Spark Streaming from both kafka and flume.
>> Currently though there is no API to write data back to either of the two
>> directly.
>>
>> I sent a PR which should eventually add something like this:
>> https://github.com/harishreedharan/spark/blob/Kafka-output/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaOutputWriter.scala
>> that would allow Spark Streaming to write back to Kafka. This will likely be
>> reviewed and committed after 1.2.
>>
>> I would consider writing something similar to push data to Flume as well,
>> if there is a sufficient use-case for it. I have seen people talk about
>> writing back to kafka quite a bit - hence the above patch.
>>
>> Which one is better is upto your use-case and existing infrastructure and
>> preference. Both would work as is, but writing back to Flume would usually
>> be if you want to write to HDFS/HBase/Solr etc -- which you could write back
>> directly from Spark Streaming (of course, there are benefits of writing back
>> using Flume like the additional buffering etc Flume gives), but it is still
>> possible to do so from Spark Streaming itself.
>>
>> But for Kafka, the usual use-case is a variety of custom applications
>> reading the same data -- for which it makes a whole lot of sense to write
>> back to Kafka. An example is to sanitize incoming data in Spark Streaming
>> (from Flume or Kafka or something else) and make it available for a variety
>> of apps via Kafka.
>>
>> Hope this helps!
>>
>> Hari
>>
>>
>> On Wed, Nov 19, 2014 at 8:10 AM, Guillermo Ortiz <konstt2...@gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I'm starting with Spark and I just trying to understand if I want to
>>> use Spark Streaming, should I use to feed it Flume or Kafka? I think
>>> there's not a official Sink for Flume to Spark Streaming and it seems
>>> that Kafka it fits better since gives you readibility.
>>>
>>> Could someone give a good scenario for each alternative? When would it
>>> make sense to use Kafka and when Flume for Spark Streaming?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming with Flume or Kafka?

Reply via email to