Running driver app as a daemon

2015-07-21 Thread algermissen1971
Hi, I am trying to start a driver app as a daemon using Linux' start-stop-daemon script (I need console detaching, unbuffered STDOUT/STDERR to logfile and start/stop using a PID file). I am doing this like this (which works great for the other apps we have) /sbin/start-stop-daemon -c $USER --b

Re: Joda Time best practice?

2015-07-20 Thread algermissen1971
, 2015 at 1:19 PM, algermissen1971 > wrote: > Hi Harish, > > On 20 Jul 2015, at 20:37, Harish Butani wrote: > > > Hey Jan, > > > > Can you provide more details on the serialization and cache issues. > > My symptom is that I have a Joda DateTime on wh

Re: Joda Time best practice?

2015-07-20 Thread algermissen1971
ctionality with spark-sql please consider: > https://github.com/SparklineData/spark-datetime It provides a simple way to > combine joda datetime expressions with spark sql. > > regards, > Harish. > > On Mon, Jul 20, 2015 at 7:37 AM, algermissen1971 > wrote: > Hi, &g

Joda Time best practice?

2015-07-20 Thread algermissen1971
Hi, I am having trouble with Joda Time in a Spark application and saw by now that I am not the only one (generally seems to have to do with serialization and internal caches of the Joda Time objects). Is there a known best practice to work around these issues? Jan -

Re: Sessionization using updateStateByKey

2015-07-15 Thread algermissen1971
ating 'entities' of which a limited number exists (the users of the visits or the products sold). Yes? Jan > > Maybe someone has a better idea, I'd like to hear it. > > On Wed, Jul 15, 2015 at 8:54 AM, algermissen1971 > wrote: > Hi Cody, > > oh ... I

Re: Sessionization using updateStateByKey

2015-07-15 Thread algermissen1971
Hi Cody, oh ... I though that was one of *the* use cases for it. Do you have a suggestion / best practice how to achieve the same thing with better scaling characteristics? Jan On 15 Jul 2015, at 15:33, Cody Koeninger wrote: > I personally would try to avoid updateStateByKey for sessionizati

Re: Master vs. Slave Nodes Clarification

2015-07-14 Thread algermissen1971
You need > C* and Spark workers co-located. Master can be on one of the C* node or a > non-C* node. > > Mohammed > > > -Original Message- > From: algermissen1971 [mailto:algermissen1...@icloud.com] > Sent: Sunday, July 12, 2015 12:35 PM > To: Spark User &g

Master vs. Slave Nodes Clarification

2015-07-12 Thread algermissen1971
Hi, I have a question that I really have problems with figuring out for myself: Does the master node in a spark cluster need to be a node similar to the slave nodes or should I rather view it as a coordinating node, that does not need much computing or storage power? For example, when using Sp

Re: Spark Streaming and using Swift object store for checkpointing

2015-07-11 Thread algermissen1971
On 10 Jul 2015, at 23:10, algermissen1971 wrote: > Hi, > > initially today when moving my streaming application to the cluster the first > time I ran in to newbie error of using a local file system for checkpointing > and the RDD partition count differences (see exception belo

Spark Streaming and using Swift object store for checkpointing

2015-07-10 Thread algermissen1971
Hi, initially today when moving my streaming application to the cluster the first time I ran in to newbie error of using a local file system for checkpointing and the RDD partition count differences (see exception below). Having neither HDFS nor S3 (and the Cassandra-Connector not yet supportin

Starting Spark-Application without explicit submission to cluster?

2015-07-10 Thread algermissen1971
Hi, I am a bit confused about the steps I need to take to start a Spark application on a cluster. So far I had this impression from the documentation that I need to explicitly submit the application using for example spark-submit. However, from the SparkContext constructur signature I get the

Re: Spark Streaming, updateStateByKey and mapPartitions() - and lazy "DatabaseConnection"

2015-06-12 Thread algermissen1971
r about the relationship of partition / executor / stage, but I get the idea.) Jan > On Fri, Jun 12, 2015 at 4:11 PM, algermissen1971 > wrote: > > On 12 Jun 2015, at 22:59, Cody Koeninger wrote: > > > Close. the mapPartitions call doesn't need to do

Re: Spark Streaming, updateStateByKey and mapPartitions() - and lazy "DatabaseConnection"

2015-06-12 Thread algermissen1971
kka-HTTP request to store the data), not a DB connection - I presume this does not changethe concept? Jan > > On Fri, Jun 12, 2015 at 3:55 PM, algermissen1971 > wrote: > Cody, > > On 12 Jun 2015, at 17:26, Cody Koeninger wrote: > > > There are sever

Re: Spark Streaming, updateStateByKey and mapPartitions() - and lazy "DatabaseConnection"

2015-06-12 Thread algermissen1971
t; On Fri, Jun 12, 2015 at 10:07 AM, algermissen1971 > wrote: > Hi, > > I have a scenario with spark streaming, where I need to write to a database > from within updateStateByKey[1]. > > That means that inside my update function I need a connection. > > I have so far unders

Spark Streaming, updateStateByKey and mapPartitions() - and lazy "DatabaseConnection"

2015-06-12 Thread algermissen1971
Hi, I have a scenario with spark streaming, where I need to write to a database from within updateStateByKey[1]. That means that inside my update function I need a connection. I have so far understood that I should create a new (lazy) connection for every partition. But since I am not working

How to obtain ActorSystem and/or ActorFlowMaterializer in updateStateByKey

2015-06-07 Thread algermissen1971
Hi, I am writing some code inside an update function for updateStateByKey that flushes data to a remote system using akk-http. For the akka-http request I need an ActorSystem and an ActorFlowMaterializer. Can anyone share a pattern or insights that address the following questions: - Where and

Re: Roadmap for Spark with Kafka on Scala 2.11?

2015-06-04 Thread algermissen1971
Hi Iulian, On 26 May 2015, at 13:04, Iulian DragoČ™ wrote: > > On Tue, May 26, 2015 at 10:09 AM, algermissen1971 > wrote: > Hi, > > I am setting up a project that requires Kafka support and I wonder what the > roadmap is for Scala 2.11 Support (including Kafka). >

Roadmap for Spark with Kafka on Scala 2.11?

2015-05-26 Thread algermissen1971
Hi, I am setting up a project that requires Kafka support and I wonder what the roadmap is for Scala 2.11 Support (including Kafka). Can we expect to see 2.11 support anytime soon? Jan - To unsubscribe, e-mail: user-unsubscr...