Thanks!
On Thu, Jul 16, 2015 at 1:59 PM Vetle Leinonen-Roeim
wrote:
> By the way - if you're going this route, see
> https://github.com/datastax/spark-cassandra-connector
>
> On Thu, Jul 16, 2015 at 2:40 PM Vetle Leinonen-Roeim
> wrote:
>
>> You'll probably have to install it separately.
>>
>>
By the way - if you're going this route, see
https://github.com/datastax/spark-cassandra-connector
On Thu, Jul 16, 2015 at 2:40 PM Vetle Leinonen-Roeim
wrote:
> You'll probably have to install it separately.
>
> On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker wrote:
>
>> Hi Vetle,
>>
>> IndexedRDD i
You'll probably have to install it separately.
On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker wrote:
> Hi Vetle,
>
> IndexedRDD is persisted in the same way RDDs are as far as I am aware. Are
> you aware if Cassandra can be built into my application or has to be a
> stand alone database which is ins
Hi Vetle,
IndexedRDD is persisted in the same way RDDs are as far as I am aware. Are
you aware if Cassandra can be built into my application or has to be a
stand alone database which is installed separately?
Thanks,
Jem
On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim
wrote:
> Hi,
>
> No
Hi,
Not sure how IndexedRDD is persisted, but perhaps you're better off using a
NOSQL database for lookups (perhaps using Cassandra, with the Cassandra
connector)? That should give you good performance on lookups, but
persisting those billion records sounds like something that will take some
time
Hello,
I have been using IndexedRDD as a large lookup (1 billion records) to join
with small tables (1 million rows). The performance of indexedrdd is great
until it has to be persisted on disk. Are there any alternatives to
IndexedRDD or any changes to how I use it to improve performance with big