I think she is checking for blanks?
But if the RDD is blank then nothing will happen, no db connections etc.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Mon, Sep 8, 2014 at 1:32 PM, Tobias Pfeiffer t...@preferred.jp
That should be OK, since the iterator is definitely consumed, and
therefore the connection actually done with, at the end of a 'foreach'
method. You might put the close in a finally block.
On Mon, Sep 8, 2014 at 12:29 AM, Soumitra Kumar
kumar.soumi...@gmail.com wrote:
I have the following code:
Hi,
On Mon, Sep 8, 2014 at 4:39 PM, Sean Owen so...@cloudera.com wrote:
if (rdd.take (1).size == 1) {
rdd foreachPartition { iterator =
I was wondering: Since take() is an output operation, isn't it computed
twice (once for the take(1), once during the
Hi,
Has someone tried using Spark Streaming with MySQL (or any other
database/data store)? I can write to MySQL at the beginning of the driver
application. However, when I am trying to write the result of every
streaming processing window to MySQL, it fails with the following error:
Standard pattern is to initialize the mysql jdbc driver in your
mappartition call , update database then close off the driver.
Couple of gotchas
1. New driver initiated for all your partitions
2. If the effect(inserts updates) is not idempotent, so if your server
crashes, Spark will replay
... I'd call out that last bit as actually tricky: close off the driver
See this message for the right-est way to do that, along with the
right way to open DB connections remotely instead of trying to
serialize them:
I have the following code:
stream foreachRDD { rdd =
if (rdd.take (1).size == 1) {
rdd foreachPartition { iterator =
initDbConnection ()
iterator foreach {
write to db