[MLlib] Extensibility of MLlib classes (Word2VecModel etc.)

2015-09-09 Thread Maandy
Hey, I'm trying to implement doc2vec (http://cs.stanford.edu/~quocle/paragraph_vector.pdf, mainly for sport/research purpose due to all it's limitations so I would probably not even try to PR it into MLlib itself) but to do that it would be highly useful to have access to MLlib's Word2VecModel

Re: Did the 1.5 release complete?

2015-09-09 Thread Reynold Xin
Dev/user announcement was made just now. For Maven, I did publish it this afternoon (so it's been a few hours). If it is still not there tomorrow morning, I will look into it. On Wed, Sep 9, 2015 at 2:42 AM, Sean Owen wrote: > I saw the end of the RC3 vote: > >

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread Yu Ishikawa
Great work, everyone! - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Spark-1-5-0-tp14013p14015.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: looking for a technical reviewer to review a book on Spark

2015-09-09 Thread Gurumurthy Yeleswarapu
Hi Mohammad: I'm interested. ThanksGuru Yeleswarapu From: Mohammed Guller To: "dev@spark.apache.org" Sent: Wednesday, September 9, 2015 8:36 AM Subject: looking for a technical reviewer to review a book on Spark Hi Spark developers,  

Re: looking for a technical reviewer to review a book on Spark

2015-09-09 Thread Gurumurthy Yeleswarapu
My Apologies for broadcast! That email was meant for Mohammad. From: Gurumurthy Yeleswarapu To: Mohammed Guller ; "dev@spark.apache.org" Sent: Wednesday, September 9, 2015 8:50 AM Subject: Re: looking for a

RE: (Spark SQL) partition-scoped UDF

2015-09-09 Thread Eron Wright
Follow-up: solved this problem by overriding the model's `transform` method, and using `mapPartitions` to produce a new DataFrame rather than using `udf`. Source

Re: Deserializing JSON into Scala objects in Java code

2015-09-09 Thread Kevin Chen
Marcelo and Christopher, Thanks for your help! The problem turned out to arise from a different part of the code (we have multiple ObjectMappers), but because I am not very familiar with Jackson I had thought there was a problem with the Scala module. Thank you again, Kevin From: Christopher

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread Jerry Lam
Hi Spark Developers, I'm eager to try it out! However, I got problems in resolving dependencies: [warn] [NOT FOUND ] org.apache.spark#spark-core_2.10;1.5.0!spark-core_2.10.jar (0ms) [warn] jcenter: tried When the package will be available? Best Regards, Jerry On Wed, Sep 9, 2015 at

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread andy petrella
You can try it out really quickly by "building" a Spark Notebook from http://spark-notebook.io/. Just choose the master branch and 1.5.0, a correct hadoop version (default to 2.2.0 though) and there you go :-) On Wed, Sep 9, 2015 at 6:39 PM Ted Yu wrote: > Jerry: > I just

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread Ted Yu
Jerry: I just tried building hbase-spark module with 1.5.0 and I see: ls -l ~/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0 total 21712 -rw-r--r-- 1 tyu staff 196 Sep 9 09:37 _maven.repositories -rw-r--r-- 1 tyu staff 11081542 Sep 9 09:37 spark-core_2.10-1.5.0.jar -rw-r--r--

Re: Code generation for GPU

2015-09-09 Thread lonikar
I am already looking at the dataframes APIs and the implementation. In fact, the columnar representation https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala is what gave me the idea of my talk proposal. It is ideally suited for

Spark 1.5: How to trigger expression execution through UnsafeRow/TungstenProject

2015-09-09 Thread lonikar
The tungsten, cogegen etc options are enabled by default. But I am not able to get the execution through the UnsafeRow/TungstenProject. It still executes using InternalRow/Project. I see this in the SparkStrategies.scala: If unsafe mode is enabled and we support these data types in Unsafe, use

Re: Spark 1.5: How to trigger expression execution through UnsafeRow/TungstenProject

2015-09-09 Thread Ted Yu
Here is the example from Reynold ( http://search-hadoop.com/m/q3RTtfvs1P1YDK8d) : scala> val data = sc.parallelize(1 to size, 5).map(x => (util.Random.nextInt(size / repetitions),util.Random.nextDouble)).toDF("key", "value") data: org.apache.spark.sql.DataFrame = [key: int, value: double] scala>