Persistent Local Node variables

2014-06-22 Thread Daedalus
*TL;DR:* I want to run a pre-processing step on the data from each partition (such as parsing) and retain the parsed object on each node for future processing calls to avoid repeated parsing. /More detail:/ I have a server and two nodes in my cluster, and data partitioned using hdfs. I am trying

Re: Persistent Local Node variables

2014-06-22 Thread Daedalus
Will using mapPartitions and creating a new RDD of ParsedData objects avoid multiple parsing? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104p8107.html Sent from the Apache Spark User List mailing list archive at

Re: Serialization problem in Spark

2014-06-19 Thread Daedalus
I'm not sure if this is a Hadoop-centric issue or not. I had similar issues with non-serializable external library classes. I used a Kryo config (as illustrated here https://spark.apache.org/docs/latest/tuning.html#data-serialization ) and registered the one troublesome class. It seemed to work

Repeated Broadcasts

2014-06-19 Thread Daedalus
I'm trying to use Spark (Java) for an optimization algorithm that needs repeated server-node exchanges of information. (The ADMM algorithm for whoever is familiar). In each iteration, I need to update a set of values on the nodes, and collect them on the server, which will update it's own set of