Looking for the method executors uses to write to HDFS

2015-11-04 Thread Tóth Zoltán
Hi, I'd like to write a parquet file from the driver. I could use the HDFS API but I am worried that it won't work on a secure cluster. I assume that the method the executors use to write to HDFS takes care of managing Hadoop security. However, I can't find the place where HDFS write happens in

Using ML KMeans without hardcoded feature vector creation

2015-09-15 Thread Tóth Zoltán
Hi, I'm wondering if there is a concise way to run ML KMeans on a DataFrame if I have the features in multiple numeric columns. I.e. as in the Iris dataset: (a1=5.1, a2=3.5, a3=1.4, a4=0.2, id=u'id_1', label=u'Iris-setosa', binomial_label=1) I'd like to use KMeans without recreating the DataSet

Re: OutOfMemory error with Spark ML 1.5 logreg example

2015-09-07 Thread Tóth Zoltán
Unfortunately I'm getting the same error: The other interesting things are that: - the parquet files got actually written to HDFS (also with .write.parquet() ) - the application gets stuck in the RUNNING state for good even after the error is thrown 15/09/07 10:01:10 INFO spark.ContextCleaner: