spark.ml : eval model outside sparkContext

2016-03-15 Thread Emmanuel
Hello, In MLLib with Spark 1.4, I was able to eval a model by loading it and using `predict` on a vector of features. I would train on Spark but use my model on my workflow. In `spark.ml` it seems like the only way to eval is to use `transform` which only takes a DataFrame.To build a DataFrame

bug? using withColumn with colName with dot can't replace column

2016-03-15 Thread Emmanuel
In Spark 1.6 if I do (column name has dot in it, but is not a nested column): df = df.withColumn("raw.hourOfDay", df.col("`raw.hourOfDay`"))scala> df = df.withColumn("raw.hourOfDay", df.col("`raw.hourOfDay`"))org.apache.spark.sql.AnalysisException: cannot resolve 'raw.minOfDay' given input colu

Spark-submit, Spark 1.6, how to get status of Job?

2016-03-14 Thread Emmanuel
Hello,When I used to submit a job with spark 1.4, it would return a job ID and a status RUNNING, FAILED or something like this.I just upgraded to 1.6 and there is no status returned by spark-submitIs there a way to get this information back? When I submit a job I want to know which one it is

spark-submit returns nothing with spark 1.6

2016-03-12 Thread Emmanuel
Hello,When i used to submit a job with spark 1.4, it would return a job ID and a status RUNNING, FAILED or something like this.I just upgraded to 1.6 and there is no status returned by spark-submitIs there a way to get this information back? When submit a job i want to know which one it is

Re: Spark Streaming Checkpointing solutions

2015-07-21 Thread Emmanuel Fortin
uffice. > > My $0.02, > dean > > Dean Wampler, Ph.D. > Author: Programming Scala, 2nd Edition > <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://p

Spark Streaming Checkpointing solutions

2015-07-21 Thread Emmanuel
Hi, I'm working on a Spark Streaming application and I would like to know what is the best storage to use for checkpointing. For testing purposes we're are using NFS between the worker, the master and the driver program (in client mode), but we have some issues with the CheckpointWriter (1 thread

Re: Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
It did the job. Thanks. :) Le 19 août 2014 à 10:20, Sean Owen a écrit : > In that case, why not collectAsMap() and have the whole result as a > simple Map in memory? then lookups are trivial. RDDs aren't > distributed maps. > > On Tue, Aug 19, 2014 at 9:17 AM, Emmanue

Re: Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
> It will never be efficient like a database lookup since this is > implemented by scanning through all of the data. There is no index or > anything. > > On Tue, Aug 19, 2014 at 8:43 AM, Emmanuel Castanier > wrote: >> Hi all, >> >> I’m totally newbie on Spark,

Performance problem on collect

2014-08-19 Thread Emmanuel Castanier
this : myRdd.filter(t => t._1.equals(param)) If I make a collect to get the only « tuple » , It takes about 12 seconds to execute, I imagine that’s because Spark may be used differently... Best regards, Emmanuel - To unsubscribe