Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-10 Thread Rumph, Frens Jan
2015 at 18:10, DuyHai Doan doanduy...@gmail.com wrote: First idea to eliminate any issue with regards to staled data: issue the same count query with RF=QUORUM and check whether there are still inconsistencies On Tue, Mar 10, 2015 at 9:13 AM, Rumph, Frens Jan m...@frensjan.nl wrote: Hi Jens

Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-12 Thread Rumph, Frens Jan
, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan m...@frensjan.nl wrote: Hi, Is it to be expected that select count(*) from ... and select distinct partition-key-columns from ... to yield inconsistent results between executions even though the table at hand isn't written to? I have a table

RDD partitions per executor in Cassandra Spark Connector

2015-03-02 Thread Rumph, Frens Jan
Hi all, I didn't find the *issues* button on https://github.com/datastax/spark-cassandra-connector/ so posting here. Any one have an idea why token ranges are grouped into one partition per executor? I expected at least one per core. Any suggestions on how to work around this? Doing a

Inconsistent count(*) and distinct results from Cassandra

2015-03-04 Thread Rumph, Frens Jan
Hi, Is it to be expected that select count(*) from ... and select distinct partition-key-columns from ... to yield inconsistent results between executions even though the table at hand isn't written to? I have a table in a keyspace with replication_factor = 1 which is something like: CREATE

PySpark and Cassandra integration

2015-02-20 Thread Rumph, Frens Jan
Hi all, Wanted to let you know I've forked PySpark Cassandra on https://github.com/TargetHolding/pyspark-cassandra. Unfortunately the original code didn't work for me and I couldn't figure out how it could work. But it inspired! so I rewrote the majority of the project. The rewrite implements

Cassandra time series + Spark

2015-03-23 Thread Rumph, Frens Jan
Hi, I'm working on a system which has to deal with time series data. I've been happy using Cassandra for time series and Spark looks promising as a computational platform. I consider chunking time series in Cassandra necessary, e.g. by 3 weeks as kairosdb does it. This allows an 8 byte chunk