any word on this one? I would like to get this done as well.
Although, my real use case is to do something on each executor right up in the
beginning - and I was trying to hack it using broadcasts by broadcasting an
object of my own and do whatever I want in the readObject method.
Any other way out?
On Oct 4, 2014, at 7:36 PM, Peng Cheng wrote:
> While Spark already offers support for asynchronous reduce (collect data from
> workers, while not interrupting execution of a parallel transformation)
> through accumulator, I have made little progress on making this process
> reciprocal, namely, to broadcast data from driver to workers to be used by
> all executors in the middle of a transformation. This primarily intended to
> be used in downpour SGD/adagrad, a non-blocking concurrent machine learning
> optimizer that performs better than existing synchronous GD in MLlib, and
> have vast application in training of many models.
>
> My attempt so far is to stick to out-of-the-box, immutable broadcast, open a
> new thread on driver, in which I broadcast a thin data wrapper that when
> deserialized, will insert into a mutable singleton that is already
> replicated to all workers in the fat jar, this customized deserialization is
> not hard, just overwrite readObject like this:
>
> class AutoInsert(var value: Int) extends Serializable{
>
> WorkerReplica.last = value
>
> private def readObject(in: ObjectInputStream): Unit = {
>in.defaultReadObject()
>WorkerContainer.last = this.value
> }
> }
>
> Unfortunately it looks like the deserializtion is called lazily and won't do
> anything before a worker use it (Broadcast[AutoInsert]), this is impossible
> without waiting for workers' stage to be finished and broadcast again. I'm
> wondering if I can 'hack' this thing into working? Or I'll have to write a
> serious extension to broadcast component to enable changing the value.
>
> Hope I can find like-minded on this forum because ML is a selling point of
> Spark.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-tp15758.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org