Thanks!. We can somewhat approximate number of rows returned by where(), as
a result we can approximate number of partitions, so repartition approach
will work.
Lets say if the .where() had resulted in widel varying number of rows, we
would not have been to approximate # of partition, that would
Did you try repartitioning? You might end up with a lot of time spending on
GC though.
Thanks
Best Regards
On Fri, May 8, 2015 at 11:59 PM, Vijay Pawnarkar vijaypawnar...@gmail.com
wrote:
I am using the Spark Cassandra connector to work with a table with 3
million records. Using .where() API
I am using the Spark Cassandra connector to work with a table with 3
million records. Using .where() API to work with only a certain rows in
this table. Where clause filters the data to 1 rows.
CassandraJavaUtil.javaFunctions(sparkContext) .cassandraTable(KEY_SPACE,
MY_TABLE,