Hi, I am working with Cassandra and Spark, would like to know what is best performance using Cassandra filter based on primary key and cluster key vs using spark data frame transformation/filters.
for example in spark: val rdd = sqlContext.read.format("org.apache.spark.sql.cassandra") .options(Map("keyspace" -> "test", "table" -> "test")) .load() and then rdd.filter("cdate ='2016-06-07'").filter("country='USA'").count() vs using Cassandra (where cdate is part of primary key and country as cluster key). SELECT count(*) FROM test WHERE cdate ='2016-06-07' AND country='USA' I would like to know when should we use Cassandra simple query vs dataframe in terms of performance with billion of rows. Thanks arun -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/transformation-spark-vs-cassandra-tp26647.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org