Dear Liu:

Thank you for your replay. I will set up an experimental environment for
spark-1.1 and test it.

On Wed, Nov 12, 2014 at 2:30 PM, Davies Liu-2 [via Apache Spark User List] <
ml-node+s1001560n1868...@n3.nabble.com> wrote:

> Yes, your broadcast should be about 300M, much smaller than 2G, I
> didn't read your post carefully.
>
> The broadcast in Python had been improved much since 1.1, I think it
> will work in 1.1 or upcoming 1.2 release, could you upgrade to 1.1?
>
> Davies
>
> On Tue, Nov 11, 2014 at 8:37 PM, bliuab <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=18684&i=0>> wrote:
>
> > Dear Liu:
> >
> > Thank you very much for your help. I will update that patch. By the way,
> as
> > I have succeed to broadcast an array of size(30M) the log said that such
> > array takes around 230MB memory. As a result, I think the numpy array
> that
> > leads to error is much smaller than 2G.
> >
> > On Wed, Nov 12, 2014 at 12:29 PM, Davies Liu-2 [via Apache Spark User
> List]
> > <[hidden email]> wrote:
> >>
> >> This PR fix the problem: https://github.com/apache/spark/pull/2659
> >>
> >> cc @josh
> >>
> >> Davies
> >>
> >> On Tue, Nov 11, 2014 at 7:47 PM, bliuab <[hidden email]> wrote:
> >>
> >> > In spark-1.0.2, I have come across an error when I try to broadcast a
> >> > quite
> >> > large numpy array(with 35M dimension). The error information except
> the
> >> > java.lang.NegativeArraySizeException error and details is listed
> below.
> >> > Moreover, when broadcast a relatively smaller numpy array(30M
> >> > dimension),
> >> > everything works fine. And 30M dimension numpy array takes 230M
> memory
> >> > which, in my opinion, not very large.
> >> > As far as I have surveyed, it seems related with py4j. However, I
> have
> >> > no
> >> > idea how to fix  this. I would be appreciated if I can get some hint.
> >> > ------------
> >> > py4j.protocol.Py4JError: An error occurred while calling
> o23.broadcast.
> >> > Trace:
> >> > java.lang.NegativeArraySizeException
> >> >         at py4j.Base64.decode(Base64.java:292)
> >> >         at py4j.Protocol.getBytes(Protocol.java:167)
> >> >         at py4j.Protocol.getObject(Protocol.java:276)
> >> >         at
> >> > py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:81)
> >> >         at py4j.commands.CallCommand.execute(CallCommand.java:77)
> >> >         at py4j.GatewayConnection.run(GatewayConnection.java:207)
> >> > -------------
> >> > And the test code is a follows:
> >> > conf =
> >> >
> >> > SparkConf().setAppName('brodyliu_LR').setMaster('spark://
> 10.231.131.87:5051')
> >> > conf.set('spark.executor.memory', '4000m')
> >> > conf.set('spark.akka.timeout', '100000')
> >> > conf.set('spark.ui.port','8081')
> >> > conf.set('spark.cores.max','150')
> >> > #conf.set('spark.rdd.compress', 'True')
> >> > conf.set('spark.default.parallelism', '300')
> >> > #configure the spark environment
> >> > sc = SparkContext(conf=conf, batchSize=1)
> >> >
> >> > vec = np.random.rand(35000000)
> >> > a = sc.broadcast(vec)
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html
> >> > Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [hidden email]
> >> > For additional commands, e-mail: [hidden email]
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >>
> >> ________________________________
> >> If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18673.html
> >> To unsubscribe from Pyspark Error when broadcast numpy array, click
> here.
> >> NAML
> >
> >
> >
> >
> > --
> > My Homepage: www.cse.ust.hk/~bliuab
> > MPhil student in Hong Kong University of Science and Technology.
> > Clear Water Bay, Kowloon, Hong Kong.
> > Profile at LinkedIn.
> >
> > ________________________________
> > View this message in context: Re: Pyspark Error when broadcast numpy
> array
> >
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18684&i=1>
> For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18684&i=2>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18684.html
>  To unsubscribe from Pyspark Error when broadcast numpy array, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=18662&code=YmxpdWFiQGNzZS51c3QuaGt8MTg2NjJ8NTUwMDMxMjYz>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
My Homepage: www.cse.ust.hk/~bliuab
MPhil student in Hong Kong University of Science and Technology.
Clear Water Bay, Kowloon, Hong Kong.
Profile at LinkedIn <http://www.linkedin.com/pub/liu-bo/55/52b/10b>.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18695.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to