I came across this issue while running the program in my lapop with a small data set (around 3.5 MB).
Code is straight forward as follows. data = sc.textFile("inputfile.txt") mappedRdd = data.map(*mapFunction*).cache() model = ALS.train(mappedRdd , 10, 15) ... mapFunction - is a simple map function where I split the line and create Rating() Regards, Surendran On Thu, Dec 17, 2015 at 9:35 AM, Vijay Gharge <vijay.gha...@gmail.com> wrote: > Can you elaborate your problem further ? Looking at the error looks like > you are running on cluster. Also share relevant code for better > understanding. > > > On Wednesday 16 December 2015, Surendran Duraisamy <surendra...@gmail.com> > wrote: > >> Hi, >> >> >> >> I am running ALS to train a data set of around 150000 lines in my local >> machine. When I call train I am getting following exception. >> >> >> >> *print *mappedRDDs.count() # this prints correct RDD count >> model = ALS.train(mappedRDDs, 10, 15) >> >> >> >> 15/12/16 18:43:18 ERROR PythonRDD: Python worker exited unexpectedly >> (crashed) >> >> java.net.SocketException: Connection reset by peer: socket write error >> >> at java.net.SocketOutputStream.socketWrite0(Native Method) >> >> at >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) >> >> at >> java.net.SocketOutputStream.write(SocketOutputStream.java:159) >> >> at >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) >> >> at >> java.io.BufferedOutputStream.write(BufferedOutputStream.java:121) >> >> at >> java.io.DataOutputStream.write(DataOutputStream.java:107) >> >> at >> java.io.FilterOutputStream.write(FilterOutputStream.java:97) >> >> at >> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413) >> >> at >> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425) >> >> at >> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425) >> >> at >> scala.collection.Iterator$class.foreach(Iterator.scala:727) >> >> at >> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) >> >> at >> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425) >> >> at >> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248) >> >> at >> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) >> >> at >> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208) >> >> >> >> I am getting this error only when I use MLlib with pyspark. how to >> resolve this issue? >> >> >> >> Regards, >> >> Surendran >> >> >> > > > -- > Regards, > Vijay Gharge > > > >