Spark random forest - string data

2015-01-16 Thread Asaf Lahav
Hi, I have been playing around with the new version of Spark MLlib Random forest implementation, and while in the process, tried it with a file with String Features. While training, it fails with: java.lang.NumberFormatException: For input string. Is MBLib Random forest adapted to run on top of

Using a Database to persist and load data from

2014-10-30 Thread Asaf Lahav
Hi Ladies and Gents, I would like to know what are the options I have if I would like to leverage Spark code I already have written to use a DB (Vertica) as its store/datasource. The data is of tabular nature. So any relational DB can essentially be used. Do I need to develop a context? If yes,

Spark clustered client

2014-07-22 Thread Asaf Lahav
Hi Folks, I have been trying to dig up some information in regards to what are the possibilities when wanting to deploy more than one client process that consumes Spark. Let's say I have a Spark Cluster of 10 servers, and would like to setup 2 additional servers which are sending requests to it

Re: Executing spark jobs with predefined Hadoop user

2014-04-12 Thread Asaf Lahav
will be the user to communicate with HDFS. *val* sparkUser *=* *Option* *{* *Option**(**System**.*getProperty*(*user.name*)).*getOrElse*(* *System**.*getenv*(*SPARK_USER*))* *}.*getOrElse *{* *SparkContext**.**SPARK_UNKNOWN_USER* *}* Thanks Jerry *From:* Asaf Lahav