On 12 Jun 2015, at 23:19, Cody Koeninger wrote:
> A. No, it's called once per partition. Usually you have more partitions than
> executors, so it will end up getting called multiple times per executor. But
> you can use a lazy val, singleton, etc to make sure the setup only takes
> place o
A. No, it's called once per partition. Usually you have more partitions
than executors, so it will end up getting called multiple times per
executor. But you can use a lazy val, singleton, etc to make sure the
setup only takes place once per JVM.
B. I cant speak to the specifics there ... but a
On 12 Jun 2015, at 22:59, Cody Koeninger wrote:
> Close. the mapPartitions call doesn't need to do anything at all to the iter.
>
> mapPartitions { iter =>
> SomeDb.conn.init
> iter
> }
Yes, thanks!
Maybe you can confirm two more things and then you helped me make a giant leap
today:
a
Close. the mapPartitions call doesn't need to do anything at all to the
iter.
mapPartitions { iter =>
SomeDb.conn.init
iter
}
On Fri, Jun 12, 2015 at 3:55 PM, algermissen1971 wrote:
> Cody,
>
> On 12 Jun 2015, at 17:26, Cody Koeninger wrote:
>
> > There are several database apis that use
Cody,
On 12 Jun 2015, at 17:26, Cody Koeninger wrote:
> There are several database apis that use a thread local or singleton
> reference to a connection pool (we use ScalikeJDBC currently, but there are
> others).
>
> You can use mapPartitions earlier in the chain to make sure the connection
There are several database apis that use a thread local or singleton
reference to a connection pool (we use ScalikeJDBC currently, but there are
others).
You can use mapPartitions earlier in the chain to make sure the connection
pool is set up on that executor, then use it inside updateStateByKey
Hi,
I have a scenario with spark streaming, where I need to write to a database
from within updateStateByKey[1].
That means that inside my update function I need a connection.
I have so far understood that I should create a new (lazy) connection for every
partition. But since I am not working