---------- Forwarded message ---------- From: Dawid Wysakowicz <wysakowicz.da...@gmail.com> Date: 2015-08-14 9:32 GMT+02:00 Subject: Re: Using unserializable classes in tasks To: mark <manwoodv...@googlemail.com>
I am not an expert but first of all check if there is no ready connector (you mentioned Cassandra - check: spark-cassandra-connector <https://github.com/datastax/spark-cassandra-connector> ). If you really want to do sth on your own all objects constructed in the passed function will be allocated on the worker. Example given: sc.parrallelize((1 to 100)).forEach(x => new Connector().save(x)) but this way you allocate resources frequently 2015-08-14 9:05 GMT+02:00 mark <manwoodv...@googlemail.com>: > I have a Spark job that computes some values and needs to write those > values to a data store. The classes that write to the data store are not > serializable (eg, Cassandra session objects etc). > > I don't want to collect all the results at the driver, I want each worker > to write the data - what is the suggested approach for using code that > can't be serialized in a task? >