---------- Forwarded message ----------
From: Dawid Wysakowicz <wysakowicz.da...@gmail.com>
Date: 2015-08-14 9:32 GMT+02:00
Subject: Re: Using unserializable classes in tasks
To: mark <manwoodv...@googlemail.com>


I am not an expert but first of all check if there is no ready connector
(you mentioned Cassandra - check: spark-cassandra-connector
<https://github.com/datastax/spark-cassandra-connector> ).

If you really want to do sth on your own all objects constructed in the
passed function will be allocated on the worker.
Example given:

sc.parrallelize((1 to 100)).forEach(x => new Connector().save(x))
 but this way you allocate resources frequently

2015-08-14 9:05 GMT+02:00 mark <manwoodv...@googlemail.com>:

> I have a Spark job that computes some values and needs to write those
> values to a data store. The classes that write to the data store are not
> serializable (eg, Cassandra session objects etc).
>
> I don't want to collect all the results at the driver, I want each worker
> to write the data - what is the suggested approach for using code that
> can't be serialized in a task?
>

Reply via email to