Instead of foreach try to use forEachPartitions, that will initialize the
connector per partition rather than per record.
Thanks
Best Regards
On Fri, Aug 14, 2015 at 1:13 PM, Dawid Wysakowicz <
wysakowicz.da...@gmail.com> wrote:
> No the connector does not need to be serializable cause it is con
No the connector does not need to be serializable cause it is constructed
on the worker. Only objects shuffled across partitions needs to be
serializable.
2015-08-14 9:40 GMT+02:00 mark :
> I guess I'm looking for a more general way to use complex graphs of
> objects that cannot be serialized in
-- Forwarded message --
From: Dawid Wysakowicz
Date: 2015-08-14 9:32 GMT+02:00
Subject: Re: Using unserializable classes in tasks
To: mark
I am not an expert but first of all check if there is no ready connector
(you mentioned Cassandra - check: spark-cassandra-connector
<ht
I have a Spark job that computes some values and needs to write those
values to a data store. The classes that write to the data store are not
serializable (eg, Cassandra session objects etc).
I don't want to collect all the results at the driver, I want each worker
to write the data - what is the