Hi,

I am working with Cassandra and Spark, would like to know what is best
performance using Cassandra filter based on primary key and cluster key vs
using spark data frame transformation/filters.

for example in spark:

 val rdd = sqlContext.read.format("org.apache.spark.sql.cassandra")
      .options(Map("keyspace" -> "test", "table" -> "test"))
      .load()

and then rdd.filter("cdate ='2016-06-07'").filter("country='USA'").count()

vs

using Cassandra (where cdate is part of primary key and country as cluster
key).

SELECT count(*) FROM test WHERE cdate ='2016-06-07' AND country='USA'

I would like to know when should we use Cassandra simple query vs dataframe
in terms of performance with billion of rows.

Thanks
arun 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/transformation-spark-vs-cassandra-tp26647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to