RandomForest evaluator for grid search

2015-07-13 Thread Olivier Girardot
Hi everyone, Using spark-ml there seems to be only BinaryClassificationEvaluator and RegressionEvaluator, is there any way or plan to provide a ROC-based or PR-based or F-Measure based for multi-class, I would be interested especially in evaluating and doing a grid search for a RandomForest model.

Contributiona nd choice of langauge

2015-07-13 Thread srinivasraghavansr71
Hello everyone, I am interested to contribute to apache spark. I am more inclined towards algorithms and computational methods for matrices,etc. I took one course in edx where spark was taught through python interface. So My doubts are as follows: 1. Place from where I can

Re: pyspark.sql.tests: is test_time_with_timezone a flaky test?

2015-07-13 Thread Cheolsoo Park
Thank you! On Sun, Jul 12, 2015 at 10:59 PM, Davies Liu dav...@databricks.com wrote: Will be fixed by https://github.com/apache/spark/pull/7363 On Sun, Jul 12, 2015 at 7:45 PM, Davies Liu dav...@databricks.com wrote: Thanks for reporting this, I'm working on it. It turned out that it's a

RDD checkpoint

2015-07-13 Thread 牛兆捷
The checkpointed RDD computed twice, why not do the checkpoint for the RDD once it is computed? Is there any special reason for this? -- *Regards,* *Zhaojie*

Spark Core and ways of talking to it for enhancing application language support

2015-07-13 Thread Vasili I. Galchin
Hello, So far I think there are at two ways (maybe more) to interact from various programming languages with the Spark Core: PySpark API and R API. From reading code it seems that PySpark approach and R approach are very disparate ... with the latter using the R-Java bridge.

Re: jenkins downtime 7/13/15, 7am PDT

2015-07-13 Thread shane knapp
this is happening now. On Sun, Jul 12, 2015 at 8:49 PM, shane knapp skn...@berkeley.edu wrote: reminder: this is happening tomorrow morning! On Thu, Jul 9, 2015 at 1:07 PM, shane knapp skn...@berkeley.edu wrote: i'll be taking jenkins down for system and jenkins app updates. this should be

Re: How to Read Excel file in Spark 1.4

2015-07-13 Thread Sandy Ryza
Hi Su, Spark can't read excel files directly. Your best best is probably to export the contents as a CSV and use the csvFile API. -Sandy On Mon, Jul 13, 2015 at 9:22 AM, spark user spark_u...@yahoo.com.invalid wrote: Hi I need your help to save excel data in hive . 1. how to read

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
There is MulticlassMetrics in MLlib; unfortunately a pipelined version hasn't yet been made for spark-ml. SPARK-7690 https://issues.apache.org/jira/browse/SPARK-7690 is tracking work on this if you are interested in following the development. On Mon, Jul 13, 2015 at 2:16 AM, Olivier Girardot

RE: Model parallelism with RDD

2015-07-13 Thread Ulanov, Alexander
Below are the average timings for one iteration of model update with RDD (with cache, as Shivaram suggested): Model size, RDD[Double].count / time, s 10M 0.585336926 100M 1.767947506 1B 125.6078817 There is a ~100x increase in time while 10x increase in model size (from 100 million to 1

Re: How to Read Excel file in Spark 1.4

2015-07-13 Thread Reynold Xin
What Sandy meant was there was no out-of-the-box support in Spark for reading excel files. However, you can still read excel: If you are using Python, you can use Pandas to load an excel file and then convert it into a Spark DataFrame. If you are using the JVM, you can find any excel library for

Re: Should spark-ec2 get its own repo?

2015-07-13 Thread Shivaram Venkataraman
I think moving the repo-location and re-organizing the python code to handle dependencies, testing etc. sounds good to me. However, I think there are a couple of things which I am not sure about 1. I strongly believe that we should preserve existing command-line in ec2/spark-ec2 (i.e. the shell

Re: jenkins downtime 7/13/15, 7am PDT

2015-07-13 Thread shane knapp
this is still ongoing... 30 minutes after applying the system updates and running init 6, the jenkins master hasn't yet recovered from the reboot. i've opened a ticket w/our sysadmin group and am crossing my fingers that this doesn't mean a trip down to the datacenter. more updates as they

How to Read Excel file in Spark 1.4

2015-07-13 Thread spark user
Hi  I need your help to save excel data in hive . - how to read excel file in spark using spark 1.4  - How to save using data frame  If you have some sample code pls send  Thanks  su

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
Joseph may be already working on it; I would continue the discussion on JIRA and first ask if anyone is working on it before starting your own PR. On Mon, Jul 13, 2015 at 1:47 PM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Thanks ! that's great ! any way to help on that ?

Re: RandomForest evaluator for grid search

2015-07-13 Thread Olivier Girardot
thx for the info. I'd be interested in getting the full predict_proba like in scikit learn ( http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.predict_proba) for the random forest model. There doesn't seem to be a

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
That is currently tracked by SPARK-3727 https://issues.apache.org/jira/browse/SPARK-3727. On Mon, Jul 13, 2015 at 1:16 PM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: thx for the info. I'd be interested in getting the full predict_proba like in scikit learn (

Re: Joining Apache Spark

2015-07-13 Thread Animesh Tripathy
Awesome thank you for the swift reply! I have been thinking of this for the longest time while using Apache Spark. Do you think the future holds any hope for Swift, C#, or Ruby users to be able to integrate Apache Spark? I am really excited of the newest 1.4 release finally Apache Spark supports

Joining Apache Spark

2015-07-13 Thread Animesh Tripathy
I would like to join the Apache Spark Development Team in order to contribute code for further improvement of Apache Spark. I was referred here from EECS professor Anthony after the completion of Big Data with Apache Spark. Sincerely, Animesh Tripathy

Re: Joining Apache Spark

2015-07-13 Thread Marcelo Vanzin
Hello, welcome, and please start by going through the web site ( http://spark.apache.org/), especially the Contributors section at the bottom. On Mon, Jul 13, 2015 at 3:58 PM, Animesh Tripathy a.tripathy...@gmail.com wrote: I would like to join the Apache Spark Development Team in order to

Re: Joining Apache Spark

2015-07-13 Thread Josh Rosen
Also, check out https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Mon, Jul 13, 2015 at 4:08 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello, welcome, and please start by going through the web site ( http://spark.apache.org/), especially the Contributors section at

Re: Should spark-ec2 get its own repo?

2015-07-13 Thread Nicholas Chammas
At a high level I see the spark-ec2 scripts as an effort to provide a reference implementation for launching EC2 clusters with Apache Spark On a side note, this is precisely how I used spark-ec2 for a personal project that does something similar: reference implementation. Nick 2015년 7월 13일 (월)

BlockMatrix multiplication

2015-07-13 Thread Ulanov, Alexander
Dear Spark developers, I am trying to perform BlockMatrix multiplication in Spark. My test is as follows: 1)create a matrix of N blocks, so that each row of block matrix contains only 1 block and each block resides in separate partition on separate node, 2)transpose the block matrix and

Re: ./dev/run-tests fail on master

2015-07-13 Thread Xiaoyu Ma
Hi Ted, Seems maven build/test part works fine for me. Thanks! Forget to provide more info: I’m using python 2.7.6, MacOS 10.10.3, JDK 1.7.0_79, Maven 3.3.1 马晓宇 / Xiaoyu Ma hzmaxia...@corp.netease.com On Jul 13, 2015, at 11:34 AM, Ted Yu yuzhih...@gmail.com wrote: When I ran

[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-13 Thread Patrick Wendell
This vote passes with 14 +1 (7 binding) votes and no 0 or -1 votes. +1 (14): Patrick Wendell Reynold Xin Sean Owen Burak Yavuz Mark Hamstra Michael Armbrust Andrew Or York, Brennon Krishna Sankar Luciano Resende Holden Karau Tom Graves Denny Lee Sean McNamara - Patrick On Wed, Jul 8, 2015 at