Re: Spark Streaming and database access (e.g. MySQL)

2014-09-10 Thread Mayur Rustagi
I think she is checking for blanks? But if the RDD is blank then nothing will happen, no db connections etc. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Mon, Sep 8, 2014 at 1:32 PM, Tobias Pfeiffer t...@preferred.jp

Re: Spark Streaming and database access (e.g. MySQL)

2014-09-08 Thread Sean Owen
That should be OK, since the iterator is definitely consumed, and therefore the connection actually done with, at the end of a 'foreach' method. You might put the close in a finally block. On Mon, Sep 8, 2014 at 12:29 AM, Soumitra Kumar kumar.soumi...@gmail.com wrote: I have the following code:

Re: Spark Streaming and database access (e.g. MySQL)

2014-09-08 Thread Tobias Pfeiffer
Hi, On Mon, Sep 8, 2014 at 4:39 PM, Sean Owen so...@cloudera.com wrote: if (rdd.take (1).size == 1) { rdd foreachPartition { iterator = I was wondering: Since take() is an output operation, isn't it computed twice (once for the take(1), once during the

Spark Streaming and database access (e.g. MySQL)

2014-09-07 Thread jchen
Hi, Has someone tried using Spark Streaming with MySQL (or any other database/data store)? I can write to MySQL at the beginning of the driver application. However, when I am trying to write the result of every streaming processing window to MySQL, it fails with the following error:

Re: Spark Streaming and database access (e.g. MySQL)

2014-09-07 Thread Mayur Rustagi
Standard pattern is to initialize the mysql jdbc driver in your mappartition call , update database then close off the driver. Couple of gotchas 1. New driver initiated for all your partitions 2. If the effect(inserts updates) is not idempotent, so if your server crashes, Spark will replay

Re: Spark Streaming and database access (e.g. MySQL)

2014-09-07 Thread Sean Owen
... I'd call out that last bit as actually tricky: close off the driver See this message for the right-est way to do that, along with the right way to open DB connections remotely instead of trying to serialize them:

Re: Spark Streaming and database access (e.g. MySQL)

2014-09-07 Thread Soumitra Kumar
I have the following code: stream foreachRDD { rdd = if (rdd.take (1).size == 1) { rdd foreachPartition { iterator = initDbConnection () iterator foreach { write to db