Re: [Cassandra-Connector] No Such Method Error despite correct versions

2016-02-22 Thread Jan Algermissen
Doh - minutes after my question I saw the same from a couple of days ago Indeed, using C* driver 3.0.0-rc1 seems to solve the issue Jan > On 22 Feb 2016, at 12:13, Jan Algermissen <algermissen1...@icloud.com> wrote: > > Hi, > > I am using > > Cassandra 2.1.5 >

[Cassandra-Connector] No Such Method Error despite correct versions

2016-02-22 Thread Jan Algermissen
Hi, I am using Cassandra 2.1.5 Spark 1.5.2 Cassandra java-drive 3.0.0 Cassandra-Connector 1.5.0-RC1 All with scala 2.11.7 Nevertheless, I get the following error from my Spark job: java.lang.NoSuchMethodError: com.datastax.driver.core.TableMetadata.getIndexes()Ljava/util/List; at

Problems with too many checkpoint files with Spark Streaming

2016-01-06 Thread Jan Algermissen
Hi, we are running a streaming job that processes about 500 events per 20s batches and uses updateStateByKey to accumulate Web sessions (with a 30 Minute live time). The checkpoint intervall is set to 20xBatchInterval, that is 400s. Cluster size is 8 nodes. We are having trouble with the

Sharding vs. Per-Timeframe Tables

2015-09-29 Thread Jan Algermissen
Hi, I am using Spark and the Cassandra-connector to save customer events for later batch analysis. Primary access pattern later on will be by time-slice One way to save the events would be to create a C* row per day, for example, and within that row store the events in decreasing time order.

Re: How to set environment of worker applications

2015-08-29 Thread Jan Algermissen
Finally, I found the solution: on the spark context you can set spark.executorEnv.[EnvironmentVariableName] and these will be available in the environment of the executors This is in fact documented, but somehow I missed it.

Strange ClassNotFoundException in spark-shell

2015-08-24 Thread Jan Algermissen
Hi, I am using spark 1.4 M1 with the Cassandra Connector and run into a strange error when using the spark shell. This works: sc.cassandraTable(events, bid_events).select(bid,type).take(10).foreach(println) But as soon as I put a map() in there (or filter): sc.cassandraTable(events,

How to set environment of worker applications

2015-08-23 Thread Jan Algermissen
Hi, I am starting a spark streaming job in standalone mode with spark-submit. Is there a way to make the UNIX environment variables with which spark-submit is started available to the processes started on the worker nodes? Jan