Thanks for the thoughts. I've been testing on Spark 1.1 and haven't seen the IndexError yet. I've run into some other errors ("too many open files"), but these issues seem to have been discussed already. The dataset, by the way, was about 40 Gb and 188 million lines; I'm running a sort on 3 worker nodes with a total of about 80 cores.
Thanks again for the tips! On Fri, Nov 7, 2014 at 6:03 PM, Davies Liu-2 [via Apache Spark User List] < ml-node+s1001560n18393...@n3.nabble.com> wrote: > Could you tell how large is the data set? It will help us to debug this > issue. > > On Thu, Nov 6, 2014 at 10:39 AM, skane <[hidden email] > <http://user/SendEmail.jtp?type=node&node=18393&i=0>> wrote: > > > I don't have any insight into this bug, but on Spark version 1.0.0 I ran > into > > the same bug running the 'sort.py' example. On a smaller data set, it > worked > > fine. On a larger data set I got this error: > > > > Traceback (most recent call last): > > File "/home/skane/spark/examples/src/main/python/sort.py", line 30, in > > <module> > > .sortByKey(lambda x: x) > > File "/usr/lib/spark/python/pyspark/rdd.py", line 480, in sortByKey > > bounds.append(samples[index]) > > IndexError: list index out of range > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18288.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=18393&i=1> > > For additional commands, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=18393&i=2> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=18393&i=3> > For additional commands, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=18393&i=4> > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18393.html > To unsubscribe from PySpark issue with sortByKey: "IndexError: list index > out of range", click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=16445&code=c3RldmVuLm0uYW50b25AZ21haWwuY29tfDE2NDQ1fDEzNTcxOTI5> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-issue-with-sortByKey-IndexError-list-index-out-of-range-tp16445p18871.html Sent from the Apache Spark User List mailing list archive at Nabble.com.