Sorry for the delay. I'll try to add some more details on Monday.

Unfortunately, I don't have a script to reproduce the error. Actually, it
seemed to be more about the data set than the script. The same code on
different data sets lead to different results; only larger data sets on the
order of 40 GB seemed to crash with the described error. Also, I believe
our cluster was recently updated to CDH 5.2, which uses Spark 1.1. I'll
check to see if the issue was resolved.

On Fri, Nov 7, 2014 at 6:03 PM, Davies Liu-2 [via Apache Spark User List] <
ml-node+s1001560n18393...@n3.nabble.com> wrote:

> Could you tell how large is the data set? It will help us to debug this
> issue.
>
> On Thu, Nov 6, 2014 at 10:39 AM, skane <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=18393&i=0>> wrote:
>
> > I don't have any insight into this bug, but on Spark version 1.0.0 I ran
> into
> > the same bug running the 'sort.py' example. On a smaller data set, it
> worked
> > fine. On a larger data set I got this error:
> >
> > Traceback (most recent call last):
> >   File "/home/skane/spark/examples/src/main/python/sort.py", line 30, in
> > <module>
> >     .sortByKey(lambda x: x)
> >   File "/usr/lib/spark/python/pyspark/rdd.py", line 480, in sortByKey
> >     bounds.append(samples[index])
> > IndexError: list index out of range
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18288.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18393&i=1>
> > For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18393&i=2>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18393&i=3>
> For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18393&i=4>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18393.html
>  To unsubscribe from PySpark issue with sortByKey: "IndexError: list index
> out of range", click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=16445&code=c3RldmVuLm0uYW50b25AZ21haWwuY29tfDE2NDQ1fDEzNTcxOTI5>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18442.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to