You'll probably have to install it separately. On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker <jem.tuc...@gmail.com> wrote:
> Hi Vetle, > > IndexedRDD is persisted in the same way RDDs are as far as I am aware. Are > you aware if Cassandra can be built into my application or has to be a > stand alone database which is installed separately? > > Thanks, > > Jem > > On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim <ve...@roeim.net> > wrote: > >> Hi, >> >> Not sure how IndexedRDD is persisted, but perhaps you're better off using >> a NOSQL database for lookups (perhaps using Cassandra, with the Cassandra >> connector)? That should give you good performance on lookups, but >> persisting those billion records sounds like something that will take some >> time in any case. >> >> Regards, >> Vetle >> >> >> On Thu, Jul 16, 2015 at 10:02 AM Jem Tucker <jem.tuc...@gmail.com> wrote: >> >>> Hello, >>> >>> I have been using IndexedRDD as a large lookup (1 billion records) to >>> join with small tables (1 million rows). The performance of indexedrdd is >>> great until it has to be persisted on disk. Are there any alternatives to >>> IndexedRDD or any changes to how I use it to improve performance with big >>> data volumes? >>> >>> Kindest Regards, >>> >>> Jem >>> >>