Hi,
I'd like to write a parquet file from the driver. I could use the HDFS API
but I am worried that it won't work on a secure cluster. I assume that the
method the executors use to write to HDFS takes care of managing Hadoop
security. However, I can't find the place where HDFS write happens in
Hi,
I'm wondering if there is a concise way to run ML KMeans on a DataFrame if
I have the features in multiple numeric columns.
I.e. as in the Iris dataset:
(a1=5.1, a2=3.5, a3=1.4, a4=0.2, id=u'id_1', label=u'Iris-setosa',
binomial_label=1)
I'd like to use KMeans without recreating the DataSet
Unfortunately I'm getting the same error:
The other interesting things are that:
- the parquet files got actually written to HDFS (also with
.write.parquet() )
- the application gets stuck in the RUNNING state for good even after the
error is thrown
15/09/07 10:01:10 INFO spark.ContextCleaner: