What do you suggest? Should I send you the script so you can run it
yourself?
 Yes, my broadcast variables are fairly large (1.7 MBytes).

On Wed, Jan 21, 2015 at 8:20 PM, Davies Liu <dav...@databricks.com> wrote:

> Because that you have large broadcast, they need to be loaded into
> Python worker for each tasks, if the worker is not reused.
>
> We will really appreciate that if you could provide a short script to
> reproduce the freeze, then we can investigate the root cause and fix
> it. Also, fire a JIRA for it, thanks!
>
> On Wed, Jan 21, 2015 at 4:56 PM, Tassilo Klein <tjkl...@gmail.com> wrote:
> > I set spark.python.worker.reuse = false and now it seems to run longer
> than
> > before (it has not crashed yet). However, it is very very slow. How to
> > proceed?
> >
> > On Wed, Jan 21, 2015 at 2:21 AM, Davies Liu <dav...@databricks.com>
> wrote:
> >>
> >> Could you try to disable the new feature of reused worker by:
> >> spark.python.worker.reuse = false
> >>
> >> On Tue, Jan 20, 2015 at 11:12 PM, Tassilo Klein <
> tjkl...@bwh.harvard.edu>
> >> wrote:
> >> > Hi,
> >> >
> >> > It's a bit of a longer script that runs some deep learning training.
> >> > Therefore it is a bit hard to wrap up easily.
> >> >
> >> > Essentially I am having a loop, in which a gradient is computed on
> each
> >> > node
> >> > and collected (this is where it freezes at some point).
> >> >
> >> >  grads =
> >> > zipped_trainData.map(distributed_gradient_computation).collect()
> >> >
> >> >
> >> > The distributed_gradient_computation mainly contains a Theano derived
> >> > function. The theano function itself is a broadcast variable.
> >> >
> >> > Let me know if you need more information.
> >> >
> >> > Best,
> >> >  Tassilo
> >> >
> >> > On Wed, Jan 21, 2015 at 1:17 AM, Davies Liu <dav...@databricks.com>
> >> > wrote:
> >> >>
> >> >> Could you provide a short script to reproduce this issue?
> >> >>
> >> >> On Tue, Jan 20, 2015 at 9:00 PM, TJ Klein <tjkl...@gmail.com> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I just recently tried to migrate from Spark 1.1 to Spark 1.2 -
> using
> >> >> > PySpark. Initially, I was super glad, noticing that Spark 1.2 is
> way
> >> >> > faster
> >> >> > than Spark 1.1. However, the initial joy faded quickly when I
> noticed
> >> >> > that
> >> >> > all my stuff didn't successfully terminate operations anymore.
> Using
> >> >> > Spark
> >> >> > 1.1 it still works perfectly fine, though.
> >> >> > Specifically, the execution just freezes without any error output
> at
> >> >> > one
> >> >> > point, when calling a joint map() and collect() statement (after
> >> >> > having
> >> >> > it
> >> >> > called many times successfully before in a loop).
> >> >> >
> >> >> > Any clue? Or do I have to wait for the next version?
> >> >> >
> >> >> > Best,
> >> >> >  Tassilo
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > View this message in context:
> >> >> >
> >> >> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-slow-working-Spark-1-2-fast-freezing-tp21278.html
> >> >> > Sent from the Apache Spark User List mailing list archive at
> >> >> > Nabble.com.
> >> >> >
> >> >> >
> ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> >> > For additional commands, e-mail: user-h...@spark.apache.org
> >> >> >
> >> >
> >> >
> >>
> >>
> >> The information in this e-mail is intended only for the person to whom
> it
> >> is
> >> addressed. If you believe this e-mail was sent to you in error and the
> >> e-mail
> >> contains patient information, please contact the Partners Compliance
> >> HelpLine at
> >> http://www.partners.org/complianceline . If the e-mail was sent to you
> in
> >> error
> >> but does not contain patient information, please contact the sender and
> >> properly
> >> dispose of the e-mail.
> >
> >
>

Reply via email to