an RDD cannot contain elements of type RDD. (i.e. you can't nest RDDs within
RDDs, in fact, I don't think it makes any sense)
I suggest rather than having an RDD of file names, collect those file name
strings back on to the driver as a Scala array of file names, and then from
there, make an array
Oh, I've only seen SVMWithSGD, hadn't realized LBFGS was implemented. I'll
try it out when I have time. Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-LIBLINEAR-tp5546p17240.html
Sent from the Apache Spark User List mailing list archive at Nab
Just wondering, any update on this? Is there a plan to integrate CJ's work
with mllib? I'm asking since SVM impl in MLLib did not give us good results
and we have to resort to training our svm classifier in a serial manner on
the driver node with liblinear.
Also, it looks like CJ Lin is coming to
What do people usually do for this?
It looks like Yarn might be the simplest since the Cloudera distribution
already installs this for you when you install hadoop.
Any advantages of using Mesos instead?
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.
I'm just wondering what's the general recommendation for data pipeline
automation.
Say, I want to run Spark Job A, then B, then invoke script C, then do D, and
if D fails, do E, and if Job A fails, send email F, etc...
It looks like Oozie might be the best choice. But I'd like some
advice/suggest
I see, thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-s-saveAsParquetFile-throws-java-lang-IncompatibleClassChangeError-tp6837p6848.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I've read through that thread, and it seems for him, he needed to add a
particular hadoop-client dependency.
However, I don't think I should be required to do that as I'm not reading
from HDFS.
I'm just running a straight up minimal example, in local mode, and out of
the box.
Here's an example m
Oh, I missed that thread. Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-s-saveAsParquetFile-throws-java-lang-IncompatibleClassChangeError-tp6837p6839.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I'm trying to save an RDD as a parquet file through the saveAsParquestFile()
api,
With code that looks something like:
val sc = ...
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val someRDD: RDD[SomeCaseClass] = ...
someRDD.saveAsParquetFile("someRDD.parquet")
How