*TL;DR:* I want to run a pre-processing step on the data from each partition
(such as parsing) and retain the parsed object on each node for future
processing calls to avoid repeated parsing.
/More detail:/
I have a server and two nodes in my cluster, and data partitioned using
hdfs.
I am trying
Will using mapPartitions and creating a new RDD of ParsedData objects avoid
multiple parsing?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104p8107.html
Sent from the Apache Spark User List mailing list archive at
I'm not sure if this is a Hadoop-centric issue or not. I had similar issues
with non-serializable external library classes.
I used a Kryo config (as illustrated here
https://spark.apache.org/docs/latest/tuning.html#data-serialization ) and
registered the one troublesome class. It seemed to work
I'm trying to use Spark (Java) for an optimization algorithm that needs
repeated server-node exchanges of information. (The ADMM algorithm for
whoever is familiar). In each iteration, I need to update a set of values on
the nodes, and collect them on the server, which will update it's own set of