Re: One element per node

2015-09-18 Thread Feynman Liang
Thank you! How can I guarantee that I have only one element per executor > (per worker, or per physical node)? > > > > *From:* Feynman Liang [mailto:fli...@databricks.com] > *Sent:* Friday, September 18, 2015 4:06 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org >

Re: One element per node

2015-09-18 Thread Feynman Liang
rdd.mapPartitions(x => new Iterator(x.head)) On Fri, Sep 18, 2015 at 3:57 PM, Ulanov, Alexander wrote: > Dear Spark developers, > > > > Is it possible (and how to do it if possible) to pick one element per > physical node from an RDD? Let’s say the first element of

Re: [MLlib] BinaryLogisticRegressionSummary on test set

2015-09-17 Thread Feynman Liang
We have kept that private because we need to decide on a name for the method which evaluates on a test set (see the TODO comment ); perhaps you could push for this to happen by creating a Jira and pinging

Re: Enum parameter in ML

2015-09-14 Thread Feynman Liang
Since PipelineStages are serializable, the params must also be serializable. We also have to keep the Java API in mind. Introducing a new enum Param type may work, but we will have to ensure that Java users can use it without dealing with ClassTags (I believe Scala will create new types for each

Re: Enum parameter in ML

2015-09-14 Thread Feynman Liang
for suggestion. How can I ensure that there will be no problems > for Java users? (I only use Scala API) > > > > Best regards, Alexander > > > > *From:* Feynman Liang [mailto:fli...@databricks.com] > *Sent:* Monday, September 14, 2015 5:27 PM > *To:* Ulanov, Alexander

Re: Data frame with one column

2015-09-14 Thread Feynman Liang
For an example, see the ml-feature word2vec user guide <https://spark.apache.org/docs/latest/ml-features.html#word2vec> On Mon, Sep 14, 2015 at 11:03 AM, Feynman Liang <fli...@databricks.com> wrote: > You could use `Tuple1(x)` instead of `Hack` > > On Mon, Sep 14, 201

Re: Data frame with one column

2015-09-14 Thread Feynman Liang
You could use `Tuple1(x)` instead of `Hack` On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander < alexander.ula...@hpe.com> wrote: > Dear Spark developers, > > > > I would like to create a dataframe with one column. However, the > createDataFrame method accepts at least a Product: > > > > val

Re: ML: embed a transformer

2015-09-14 Thread Feynman Liang
Where did you read that it should be public? The traits in ml.param.shared are meant to be used across internal spark.ml transformer implementations. If your transformer could be included in spark.ml, then I would recommend implementing it there so these package private traits can be reused.

Re:

2015-08-05 Thread Feynman Liang
qualifying_function() will be executed on each partition in parallel; stopping all parallel execution after the first instance satisfying qualifying_function() would mean that you would have to effectively make the computation sequential. On Wed, Aug 5, 2015 at 9:05 AM, Sandeep Giri

Re: Contributiona nd choice of langauge

2015-07-14 Thread Feynman Liang
I would suggest starting with some starter tasks

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
There is MulticlassMetrics in MLlib; unfortunately a pipelined version hasn't yet been made for spark-ml. SPARK-7690 https://issues.apache.org/jira/browse/SPARK-7690 is tracking work on this if you are interested in following the development. On Mon, Jul 13, 2015 at 2:16 AM, Olivier Girardot

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
-07-13 22:39 GMT+02:00 Feynman Liang fli...@databricks.com: That is currently tracked by SPARK-3727 https://issues.apache.org/jira/browse/SPARK-3727. On Mon, Jul 13, 2015 at 1:16 PM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: thx for the info. I'd be interested in getting

Re: RandomForest evaluator for grid search

2015-07-13 Thread Feynman Liang
, Feynman Liang fli...@databricks.com a écrit : There is MulticlassMetrics in MLlib; unfortunately a pipelined version hasn't yet been made for spark-ml. SPARK-7690 https://issues.apache.org/jira/browse/SPARK-7690 is tracking work on this if you are interested in following the development

Re: Are These Issues Suitable for our Senior Project?

2015-07-09 Thread Feynman Liang
Exciting, thanks for the contribution! I'm currently aware of: - SPARK-8499 is currently in progress (in a duplicate issue); I updated the JIRA to reflect that. - SPARK-5992 has a spark package http://spark-packages.org/package/mrsqueeze/spark-hash linked but I'm unclear on whether