I came across this issue while running the program in my lapop with a small
data set (around 3.5 MB).

Code is straight forward as follows.

data = sc.textFile("inputfile.txt")

mappedRdd = data.map(*mapFunction*).cache()

model = ALS.train(mappedRdd , 10, 15)

...

mapFunction - is a simple map function where I split the line and
create Rating()


Regards,

Surendran


On Thu, Dec 17, 2015 at 9:35 AM, Vijay Gharge <vijay.gha...@gmail.com>
wrote:

> Can you elaborate your problem further ? Looking at the error looks like
> you are running on cluster. Also share relevant code for better
> understanding.
>
>
> On Wednesday 16 December 2015, Surendran Duraisamy <surendra...@gmail.com>
> wrote:
>
>> Hi,
>>
>>
>>
>> I am running ALS to train a data set of around 150000 lines in my local
>> machine. When I call train I am getting following exception.
>>
>>
>>
>> *print *mappedRDDs.count() # this prints correct RDD count
>> model = ALS.train(mappedRDDs, 10, 15)
>>
>>
>>
>> 15/12/16 18:43:18 ERROR PythonRDD: Python worker exited unexpectedly
>> (crashed)
>>
>> java.net.SocketException: Connection reset by peer: socket write error
>>
>>                 at java.net.SocketOutputStream.socketWrite0(Native Method)
>>
>>                 at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
>>
>>                 at
>> java.net.SocketOutputStream.write(SocketOutputStream.java:159)
>>
>>                 at
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>
>>                 at
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:121)
>>
>>                 at
>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>
>>                 at
>> java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>>
>>                 at
>> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:413)
>>
>>                 at
>> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>>
>>                 at
>> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:425)
>>
>>                 at
>> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>
>>                 at
>> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>>
>>                 at
>> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
>>
>>                 at
>> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
>>
>>                 at
>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>>
>>                 at
>> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
>>
>>
>>
>> I am getting this error only when I use MLlib with pyspark. how to
>> resolve this issue?
>>
>>
>>
>> Regards,
>>
>> Surendran
>>
>>
>>
>
>
> --
> Regards,
> Vijay Gharge
>
>
>
>

Reply via email to