I have a dataset comprised of ~200k labeled points whose features are
SparseVectors with ~20M features. I take 5% of the data for a training set. 

> model = LogisticRegressionWithSGD.train(training_set)

fails with 

ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
  File
"/cluster/home/roskarr/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 472, in send_command
    self.socket.sendall(command.encode('utf-8'))
  File "/cluster/home/roskarr/miniconda/lib/python2.7/socket.py", line 224,
in meth
    return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe

I'm at a loss as to where to begin to debug this... any suggestions? Thanks,

Rok



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to