Re: Asynchronous Broadcast from driver to workers, is it possible?

2014-10-21 Thread Peng Cheng
Looks like the only way is to implement that feature. There is no way of
hacking it into working



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-tp15758p16985.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Asynchronous Broadcast from driver to workers, is it possible?

2014-10-21 Thread Vipul Pandey
any word on this one? I would like to get this done as well. 
Although, my real use case is to do something on each executor right up in the 
beginning - and I was trying to hack it using broadcasts by broadcasting an 
object of my own and do whatever I want in the readObject method.

Any other way out?


On Oct 4, 2014, at 7:36 PM, Peng Cheng  wrote:

> While Spark already offers support for asynchronous reduce (collect data from
> workers, while not interrupting execution of a parallel transformation)
> through accumulator, I have made little progress on making this process
> reciprocal, namely, to broadcast data from driver to workers to be used by
> all executors in the middle of a transformation. This primarily intended to
> be used in downpour SGD/adagrad, a non-blocking concurrent machine learning
> optimizer that performs better than existing synchronous GD in MLlib, and
> have vast application in training of many models.
> 
> My attempt so far is to stick to out-of-the-box, immutable broadcast, open a
> new thread on driver, in which I broadcast a thin data wrapper that when
> deserialized, will insert into a mutable singleton that is already
> replicated to all workers in the fat jar, this customized deserialization is
> not hard, just overwrite readObject like this:
> 
> class AutoInsert(var value: Int) extends Serializable{
> 
>  WorkerReplica.last = value
> 
>  private def readObject(in: ObjectInputStream): Unit = {
>in.defaultReadObject()
>WorkerContainer.last = this.value
>  }
> }
> 
> Unfortunately it looks like the deserializtion is called lazily and won't do
> anything before a worker use it (Broadcast[AutoInsert]), this is impossible
> without waiting for workers' stage to be finished and broadcast again. I'm
> wondering if I can 'hack' this thing into working? Or I'll have to write a
> serious extension to broadcast component to enable changing the value.
> 
> Hope I can find like-minded on this forum because ML is a selling point of
> Spark.
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-tp15758.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Asynchronous Broadcast from driver to workers, is it possible?

2014-10-06 Thread Peng Cheng
Any suggestions? I'm thinking of submitting a feature request for mutable
broadcast. Is it doable?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-tp15758p15807.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org