Re: Indexed Store for lookup table

2015-07-16 Thread Jem Tucker
Thanks! On Thu, Jul 16, 2015 at 1:59 PM Vetle Leinonen-Roeim wrote: > By the way - if you're going this route, see > https://github.com/datastax/spark-cassandra-connector > > On Thu, Jul 16, 2015 at 2:40 PM Vetle Leinonen-Roeim > wrote: > >> You'll probably have to install it separately. >> >>

Re: Indexed Store for lookup table

2015-07-16 Thread Vetle Leinonen-Roeim
By the way - if you're going this route, see https://github.com/datastax/spark-cassandra-connector On Thu, Jul 16, 2015 at 2:40 PM Vetle Leinonen-Roeim wrote: > You'll probably have to install it separately. > > On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker wrote: > >> Hi Vetle, >> >> IndexedRDD i

Re: Indexed Store for lookup table

2015-07-16 Thread Vetle Leinonen-Roeim
You'll probably have to install it separately. On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker wrote: > Hi Vetle, > > IndexedRDD is persisted in the same way RDDs are as far as I am aware. Are > you aware if Cassandra can be built into my application or has to be a > stand alone database which is ins

Re: Indexed Store for lookup table

2015-07-16 Thread Jem Tucker
Hi Vetle, IndexedRDD is persisted in the same way RDDs are as far as I am aware. Are you aware if Cassandra can be built into my application or has to be a stand alone database which is installed separately? Thanks, Jem On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim wrote: > Hi, > > No

Re: Indexed Store for lookup table

2015-07-16 Thread Vetle Leinonen-Roeim
Hi, Not sure how IndexedRDD is persisted, but perhaps you're better off using a NOSQL database for lookups (perhaps using Cassandra, with the Cassandra connector)? That should give you good performance on lookups, but persisting those billion records sounds like something that will take some time

Indexed Store for lookup table

2015-07-16 Thread Jem Tucker
Hello, I have been using IndexedRDD as a large lookup (1 billion records) to join with small tables (1 million rows). The performance of indexedrdd is great until it has to be persisted on disk. Are there any alternatives to IndexedRDD or any changes to how I use it to improve performance with big