subject:"Join between DStream and Periodically\-Changing\-RDD"

RE: Join between DStream and Periodically-Changing-RDD

2015-06-15 Thread Evo Eftimov

15, 2015 8:38 AM To: 'Ilove Data'; 'Tathagata Das' Cc: 'Akhil Das'; 'user' Subject: RE: Join between DStream and Periodically-Changing-RDD Then go for the second option I suggested - simply turn (keep turning) your HDFS file (Batch RDD) into a str

RE: Join between DStream and Periodically-Changing-RDD

2015-06-15 Thread Evo Eftimov

; user Subject: Re: Join between DStream and Periodically-Changing-RDD @Akhil Das Join two Dstreams might not be an option since I want to join stream with historical data in HDFS folder. @Tagatha Das & @Evo Eftimov Batch RDD to be reloaded is considerably huge compare to Dstream data s

Re: Join between DStream and Periodically-Changing-RDD

2015-06-14 Thread Ilove Data

eam >> RDDs >> >> >> >> You can feed your HDFS file into a Message Broker topic and consume it >> from there in the form of DStream RDDs which you keep aggregating over the >> lifetime of the spark streaming app instance >> >> >>

Re: Join between DStream and Periodically-Changing-RDD

2015-06-11 Thread Tathagata Das

to:ak...@sigmoidanalytics.com] > *Sent:* Wednesday, June 10, 2015 8:36 AM > *To:* Ilove Data > *Cc:* user@spark.apache.org > *Subject:* Re: Join between DStream and Periodically-Changing-RDD > > > > RDD's are immutable, why not join two DStreams? >

RE: Join between DStream and Periodically-Changing-RDD

2015-06-10 Thread Evo Eftimov

Data Cc: user@spark.apache.org Subject: Re: Join between DStream and Periodically-Changing-RDD RDD's are immutable, why not join two DStreams? Not sure, but you can try something like this also: kvDstream.foreachRDD(rdd => { val file = ssc.sparkContext.textFile(&

Re: Join between DStream and Periodically-Changing-RDD

2015-06-10 Thread Akhil Das

RDD's are immutable, why not join two DStreams? Not sure, but you can try something like this also: kvDstream.foreachRDD(rdd => { val file = ssc.sparkContext.textFile("/sigmoid/") val kvFile = file.map(x => (x.split(",")(0), x)) rdd.join(kvFile) }) Thanks Best Regards

Join between DStream and Periodically-Changing-RDD

2015-06-09 Thread Ilove Data

Hi, I'm trying to join DStream with interval let say 20s, join with RDD loaded from HDFS folder which is changing periodically, let say new file is coming to the folder for every 10 minutes. How should it be done, considering the HDFS files in the folder is periodically changing/adding new files?

RE: Join between DStream and Periodically-Changing-RDD

RE: Join between DStream and Periodically-Changing-RDD

Re: Join between DStream and Periodically-Changing-RDD

Re: Join between DStream and Periodically-Changing-RDD

RE: Join between DStream and Periodically-Changing-RDD

Re: Join between DStream and Periodically-Changing-RDD

Join between DStream and Periodically-Changing-RDD

7 matches

Site Navigation

Mail list logo

Footer information