Re: Is IndexedRDD available in Spark 1.4.0?
Or Spark on HBase ) http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ -- Ruslan Dautkhanov On Tue, Jul 14, 2015 at 7:07 PM, Ted Yu wrote: > bq. that is, key-value stores > > Please consider HBase for this purpose :-) > > On Tue, Jul 14, 2015 at 5:55 PM, Tathagata Das > wrote: > >> I do not recommend using IndexRDD for state management in Spark >> Streaming. What it does not solve out-of-the-box is checkpointing of >> indexRDDs, which important because long running streaming jobs can lead to >> infinite chain of RDDs. Spark Streaming solves it for the updateStateByKey >> operation which you can use, which gives state management capabilities. >> Though for most flexible arbitrary look up of stuff, its better to use a >> dedicated system that is designed and optimized for long term storage of >> data, that is, key-value stores, databases, etc. >> >> On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu wrote: >> >>> Please take a look at SPARK-2365 which is in progress. >>> >>> On Tue, Jul 14, 2015 at 5:18 PM, swetha >>> wrote: >>> Hi, Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark Streaming to do lookups/updates/deletes in RDDs using keys by storing them as key/value pairs. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org >>> >> >
Re: Is IndexedRDD available in Spark 1.4.0?
bq. that is, key-value stores Please consider HBase for this purpose :-) On Tue, Jul 14, 2015 at 5:55 PM, Tathagata Das wrote: > I do not recommend using IndexRDD for state management in Spark Streaming. > What it does not solve out-of-the-box is checkpointing of indexRDDs, which > important because long running streaming jobs can lead to infinite chain of > RDDs. Spark Streaming solves it for the updateStateByKey operation which > you can use, which gives state management capabilities. Though for most > flexible arbitrary look up of stuff, its better to use a dedicated system > that is designed and optimized for long term storage of data, that is, > key-value stores, databases, etc. > > On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu wrote: > >> Please take a look at SPARK-2365 which is in progress. >> >> On Tue, Jul 14, 2015 at 5:18 PM, swetha >> wrote: >> >>> Hi, >>> >>> Is IndexedRDD available in Spark 1.4.0? We would like to use this in >>> Spark >>> Streaming to do lookups/updates/deletes in RDDs using keys by storing >>> them >>> as key/value pairs. >>> >>> Thanks, >>> Swetha >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >
Re: Is IndexedRDD available in Spark 1.4.0?
I do not recommend using IndexRDD for state management in Spark Streaming. What it does not solve out-of-the-box is checkpointing of indexRDDs, which important because long running streaming jobs can lead to infinite chain of RDDs. Spark Streaming solves it for the updateStateByKey operation which you can use, which gives state management capabilities. Though for most flexible arbitrary look up of stuff, its better to use a dedicated system that is designed and optimized for long term storage of data, that is, key-value stores, databases, etc. On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu wrote: > Please take a look at SPARK-2365 which is in progress. > > On Tue, Jul 14, 2015 at 5:18 PM, swetha wrote: > >> Hi, >> >> Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark >> Streaming to do lookups/updates/deletes in RDDs using keys by storing them >> as key/value pairs. >> >> Thanks, >> Swetha >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Is IndexedRDD available in Spark 1.4.0?
Please take a look at SPARK-2365 which is in progress. On Tue, Jul 14, 2015 at 5:18 PM, swetha wrote: > Hi, > > Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark > Streaming to do lookups/updates/deletes in RDDs using keys by storing them > as key/value pairs. > > Thanks, > Swetha > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD-available-in-Spark-1-4-0-tp23841.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >