With Flume, what would be your sink?
On Mon, Jan 16, 2017 at 10:44 PM, Guillermo Ortiz <konstt2...@gmail.com> wrote: > I'm wondering to use Flume (channel file)-Spark Streaming. > > I have some doubts about it: > > 1.The RDD size is all data what it comes in a microbatch which you have > defined. Risght? > > 2.If there are 2Gb of data, how many are RDDs generated? just one and I > have to make a repartition? > > 3.When is the ACK sent back from Spark to Flume? > I guess that if Flume dies, Flume is going to send the same data again > to Spark > If Spark dies, I have no idea if Spark is going to reprocessing same > data again when it is sent again. > Coult it be different if I use Kafka Channel? > > > > > -- Best Regards, Ayan Guha