What do you suggest? Should I send you the script so you can run it yourself? Yes, my broadcast variables are fairly large (1.7 MBytes).
On Wed, Jan 21, 2015 at 8:20 PM, Davies Liu <dav...@databricks.com> wrote: > Because that you have large broadcast, they need to be loaded into > Python worker for each tasks, if the worker is not reused. > > We will really appreciate that if you could provide a short script to > reproduce the freeze, then we can investigate the root cause and fix > it. Also, fire a JIRA for it, thanks! > > On Wed, Jan 21, 2015 at 4:56 PM, Tassilo Klein <tjkl...@gmail.com> wrote: > > I set spark.python.worker.reuse = false and now it seems to run longer > than > > before (it has not crashed yet). However, it is very very slow. How to > > proceed? > > > > On Wed, Jan 21, 2015 at 2:21 AM, Davies Liu <dav...@databricks.com> > wrote: > >> > >> Could you try to disable the new feature of reused worker by: > >> spark.python.worker.reuse = false > >> > >> On Tue, Jan 20, 2015 at 11:12 PM, Tassilo Klein < > tjkl...@bwh.harvard.edu> > >> wrote: > >> > Hi, > >> > > >> > It's a bit of a longer script that runs some deep learning training. > >> > Therefore it is a bit hard to wrap up easily. > >> > > >> > Essentially I am having a loop, in which a gradient is computed on > each > >> > node > >> > and collected (this is where it freezes at some point). > >> > > >> > grads = > >> > zipped_trainData.map(distributed_gradient_computation).collect() > >> > > >> > > >> > The distributed_gradient_computation mainly contains a Theano derived > >> > function. The theano function itself is a broadcast variable. > >> > > >> > Let me know if you need more information. > >> > > >> > Best, > >> > Tassilo > >> > > >> > On Wed, Jan 21, 2015 at 1:17 AM, Davies Liu <dav...@databricks.com> > >> > wrote: > >> >> > >> >> Could you provide a short script to reproduce this issue? > >> >> > >> >> On Tue, Jan 20, 2015 at 9:00 PM, TJ Klein <tjkl...@gmail.com> wrote: > >> >> > Hi, > >> >> > > >> >> > I just recently tried to migrate from Spark 1.1 to Spark 1.2 - > using > >> >> > PySpark. Initially, I was super glad, noticing that Spark 1.2 is > way > >> >> > faster > >> >> > than Spark 1.1. However, the initial joy faded quickly when I > noticed > >> >> > that > >> >> > all my stuff didn't successfully terminate operations anymore. > Using > >> >> > Spark > >> >> > 1.1 it still works perfectly fine, though. > >> >> > Specifically, the execution just freezes without any error output > at > >> >> > one > >> >> > point, when calling a joint map() and collect() statement (after > >> >> > having > >> >> > it > >> >> > called many times successfully before in a loop). > >> >> > > >> >> > Any clue? Or do I have to wait for the next version? > >> >> > > >> >> > Best, > >> >> > Tassilo > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > View this message in context: > >> >> > > >> >> > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-slow-working-Spark-1-2-fast-freezing-tp21278.html > >> >> > Sent from the Apache Spark User List mailing list archive at > >> >> > Nabble.com. > >> >> > > >> >> > > --------------------------------------------------------------------- > >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> >> > For additional commands, e-mail: user-h...@spark.apache.org > >> >> > > >> > > >> > > >> > >> > >> The information in this e-mail is intended only for the person to whom > it > >> is > >> addressed. If you believe this e-mail was sent to you in error and the > >> e-mail > >> contains patient information, please contact the Partners Compliance > >> HelpLine at > >> http://www.partners.org/complianceline . If the e-mail was sent to you > in > >> error > >> but does not contain patient information, please contact the sender and > >> properly > >> dispose of the e-mail. > > > > >