[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 I am too busy recently to fix those failed R tests. Anyone who has spare time can take over this PR and I will help review. Thanks! ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-19 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19621 I think we need to address that too. Sounds to me these tests aren’t stable before. --- - To unsubscribe,

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung Another failed testcase, spark.mlp in sparkR, it also use `RFormula` and it will also generate indeterministic result, see class `MultilayerPerceptronClassifierWrapper` line 78:

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19621 You can change the dataset used in testing. Will be good if you could test with the same data before and after your change to make sure that’s not broken. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84955/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84955/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84955/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung "iris" is a built-in dataset in R, used in many algo testing, so is it proper to change it ? --- - To

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-07 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19621 maybe we could also change the test itself to make it more deterministic? we could first create a new test dataset that avoid having frequency values, run it through the original

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung Yes, the spark.mlp test result changed because of indexer order changed. That's because, StringIndexer when item frequency equal, there's no definite rule for index order. And, in

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-06 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19621 I think I understand what you are saying but the latest test failure I see it from spark.mlp instead and be results are different from the existing ones. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @felixcheung There is no breaking change. But, we meet some trouble thing about indeterministic behavior. When frequency equal, the indexer result is indeterministic. I already fix those in

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-05 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19621 stringindexer is set automatically for index column. are we having breaking API change here?

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 Any one can provide some suggestion ? for fixing sparkR glm test failure here. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 I checked the failed tests in sparkR. There's some trouble in the failed `glm` sparkR tests. These tests compare sparkR glm and R-lib glm results on test data "iris", but, what's the

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84125/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84125/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84125/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @viirya @MLnick Code updated. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @MLnick Ah, I don't express it exactly, the first case, what I mean is, sort by frequency, but if the case frequency equal, sort by alphabet. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19621 The first case you mention wouldn’t actually end up sorting by freq, no? It would have to be the other way around? For second case, yes equality must mean it is the same string / key

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @MLnick How about this way: The case "fequencyAsc/Desc", sort first by frequency and then by alphabet, The case "alphabetAsc/Desc", sort by alphabet (and if alphabetically equal, the

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84101/ Test PASSed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84101/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19621 It won't be deterministic in the case of different RDDs / partitions / shuffle etc. For a given input RDD it _should_ be deterministic? But perhaps we could ensure it by first sorting

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @MLnick Will RDD "count by value" aggregation be deterministic ? e.g., 2 RDD with the same elements, but with different element order and different partition number, will

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19621 @WeichenXu123 with reference to https://github.com/apache/spark/pull/19621#issuecomment-344530228 - the sort is

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84101/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84093/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84093/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84093/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84066/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84066/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #84066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84066/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-15 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19621 Seems in the frequency-based string orders, the order of labels with same frequency is non-deterministic. --- - To unsubscribe,

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 I want to ask, for option `StringIndexer.frequencyDesc`, in the case existing two labels which have the same frequency, which of them will be put in the front ? If this is not specified,

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83878/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #83878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83878/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #83878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83878/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83872/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #83872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83872/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19621 @WeichenXu123 I will try to look into this today. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @viirya @MLnick Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #83872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83872/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83396/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83392/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83323/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 @viirya Code updated. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83265/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83263/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #83263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83263/testReport)** for PR 19621 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19621 **[Test build #83263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83263/testReport)** for PR 19621 at commit