I have a dataset comprised of ~200k labeled points whose features are SparseVectors with ~20M features. I take 5% of the data for a training set.
> model = LogisticRegressionWithSGD.train(training_set) fails with ERROR:py4j.java_gateway:Error while sending or receiving. Traceback (most recent call last): File "/cluster/home/roskarr/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 472, in send_command self.socket.sendall(command.encode('utf-8')) File "/cluster/home/roskarr/miniconda/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 32] Broken pipe I'm at a loss as to where to begin to debug this... any suggestions? Thanks, Rok -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org