[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
I am too busy recently to fix those failed R tests. Anyone who has spare 
time can take over this PR and I will help review. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-19 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19621
  
I think we need to address that too. Sounds to me these tests aren’t 
stable before.





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@felixcheung Another failed testcase, spark.mlp in sparkR, it also use 
`RFormula` and it will also generate indeterministic result, see class 
`MultilayerPerceptronClassifierWrapper` line 78:
```
val rFormula = new RFormula()
  .setFormula(formula)
  .setForceIndexLabel(true)
  .setHandleInvalid(handleInvalid)
```
It can not set the string order and the default `frequencyDesc` order will 
bring indeterministic result.

If I only modify the testcase in `spark.mlp`, in the future if the 
`StringIndexer` implementation being changed, those tests will probably be 
broken again. What do you think of this ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19621
  
You can change the dataset used in testing.
Will be good if you could test with the same data before and after your 
change to make sure that’s not broken.







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84955/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84955/testReport)**
 for PR 19621 at commit 
[`bb209c8`](https://github.com/apache/spark/commit/bb209c80395cce84466a8fb8e0c58ca151b791ab).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84955/testReport)**
 for PR 19621 at commit 
[`bb209c8`](https://github.com/apache/spark/commit/bb209c80395cce84466a8fb8e0c58ca151b791ab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@felixcheung "iris" is a built-in dataset in R, used in many algo testing, 
so is it proper to change it ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19621
  
maybe we could also change the test itself to make it more deterministic?
we could first create a new test dataset that avoid having frequency 
values, run it through the original implementation, then run through this new 
change to make sure they match.

@actuaryzhang do you have any thought on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-06 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@felixcheung Yes, the spark.mlp test result changed because of indexer 
order changed. That's because, StringIndexer when item frequency equal, there's 
no definite rule for index order. And, in this PR, I change the code logic in 
StringIndexer, **but it cannot make sure to generate exactly the same indexer 
order, because when item frequency equal, there's no definite rule for index 
order.**


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19621
  
I think I understand what you are saying but the latest test failure I see 
it from spark.mlp instead and be results are different from the existing ones.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-05 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@felixcheung There is no breaking change. But, we meet some trouble thing 
about indeterministic behavior. When frequency equal, the indexer result is 
indeterministic. I already fix those in RFormula test. But, I don't know how to 
fix the sparkR `glm` test. Because it depends on R-glm library and I don't know 
how to set the indexer order for R-glm library. Do you know anyone who is 
familiar with this ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-05 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19621
  
stringindexer is set automatically for index column. are we having breaking 
API change here?

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala#L216



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-12-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
Any one can provide some suggestion ? for fixing sparkR glm test failure 
here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-24 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
I checked the failed tests in sparkR. There's some trouble in the failed 
`glm` sparkR tests.
These tests compare sparkR glm and R-lib glm results on test data "iris", 
but, what's the string indexer order for R-lib glm ? I check the dataset 
"iris", the "Species" column has three value "setosa", "versicolor", 
"virginica", **their frequency are all 50**, and only when `RFormula` index 
them as: "setosa"->2, "versicolor"->0, "virginica"->1, the result will be the 
same with R-lib glm. This is a strange indexer order.
How to set string indexer order for R-lib glm ?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84125/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84125/testReport)**
 for PR 19621 at commit 
[`66d054a`](https://github.com/apache/spark/commit/66d054a7daca8a82fd1022fe05e766e7f7285028).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84125/testReport)**
 for PR 19621 at commit 
[`66d054a`](https://github.com/apache/spark/commit/66d054a7daca8a82fd1022fe05e766e7f7285028).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-23 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@viirya @MLnick Code updated. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@MLnick Ah, I don't express it exactly, the first case, what I mean is, 
sort by frequency, but if the case frequency equal, sort by alphabet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19621
  
The first case you mention wouldn’t actually end up sorting by freq, no? 
It
would have to be the other way around?

For second case, yes equality must mean it is the same string / key so
shouldn’t actually occur
On Wed, 22 Nov 2017 at 16:01, WeichenXu  wrote:

> @MLnick  How about this way:
> The case "fequencyAsc/Desc", sort first by frequency and then by alphabet,
> The case "alphabetAsc/Desc", sort by alphabet (and if alphabetically
> equal, the two label should be the same ?)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@MLnick How about this way:
The case "fequencyAsc/Desc", sort first by frequency and then by alphabet,
The case "alphabetAsc/Desc", sort by alphabet (and if alphabetically equal, 
the two label should be the same ?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84101/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84101/testReport)**
 for PR 19621 at commit 
[`031f53f`](https://github.com/apache/spark/commit/031f53fbd1c112d8f0b37bb29e847cd3184498c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19621
  
It won't be deterministic in the case of different RDDs / partitions / 
shuffle etc. For a given input RDD it _should_ be deterministic? 

But perhaps we could ensure it by first sorting alphabetically and then by 
frequency?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@MLnick Will RDD "count by value" aggregation be deterministic ? e.g., 2 
RDD with the same elements, but with different element order and different 
partition number, will `rdd.countByValue().toSeq` keep deterministic ? The 
shuffling in `countByValue` seems also possible to break determinacy.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/19621
  
@WeichenXu123 with reference to 
https://github.com/apache/spark/pull/19621#issuecomment-344530228  - the sort 
is 
[stable](https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/SeqLike.scala#L627)
 with respect to the input collection. So as long as the result of the "count 
by value" aggregation is deterministic so will the sort order be in the case of 
equalities. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84101 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84101/testReport)**
 for PR 19621 at commit 
[`031f53f`](https://github.com/apache/spark/commit/031f53fbd1c112d8f0b37bb29e847cd3184498c6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
Jenkins retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84093 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84093/testReport)**
 for PR 19621 at commit 
[`031f53f`](https://github.com/apache/spark/commit/031f53fbd1c112d8f0b37bb29e847cd3184498c6).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84093/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84093/testReport)**
 for PR 19621 at commit 
[`031f53f`](https://github.com/apache/spark/commit/031f53fbd1c112d8f0b37bb29e847cd3184498c6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84066/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84066/testReport)**
 for PR 19621 at commit 
[`e5db190`](https://github.com/apache/spark/commit/e5db190f9b76e22ea8f665456cba60fd31cc9cf0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #84066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84066/testReport)**
 for PR 19621 at commit 
[`e5db190`](https://github.com/apache/spark/commit/e5db190f9b76e22ea8f665456cba60fd31cc9cf0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-15 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19621
  
Seems in the frequency-based string orders, the order of labels with same 
frequency is non-deterministic. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-15 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
I want to ask, for option `StringIndexer.frequencyDesc`, in the case 
existing two labels which have the same frequency, which of them will be put in 
the front ?
If this is not specified, then we should modify some test cases in 
`RFomular`, which will generate nondeterministic result.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83878/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #83878 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83878/testReport)**
 for PR 19621 at commit 
[`77bea32`](https://github.com/apache/spark/commit/77bea32984b167894be79736f56601a44b99).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #83878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83878/testReport)**
 for PR 19621 at commit 
[`77bea32`](https://github.com/apache/spark/commit/77bea32984b167894be79736f56601a44b99).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83872/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #83872 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83872/testReport)**
 for PR 19621 at commit 
[`b0b14b0`](https://github.com/apache/spark/commit/b0b14b0971a7b941abbadf52d03dbb7d77e93adc).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19621
  
@WeichenXu123 I will try to look into this today.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@viirya @MLnick Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #83872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83872/testReport)**
 for PR 19621 at commit 
[`b0b14b0`](https://github.com/apache/spark/commit/b0b14b0971a7b941abbadf52d03dbb7d77e93adc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83396/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83392/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
Jenkins, test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83323/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-02 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@viirya Code updated. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83265/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83263/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #83263 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83263/testReport)**
 for PR 19621 at commit 
[`faa8390`](https://github.com/apache/spark/commit/faa839052ebe0949882b7b8c0ca3557f26beb8e8).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19621
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-10-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19621
  
**[Test build #83263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83263/testReport)**
 for PR 19621 at commit 
[`faa8390`](https://github.com/apache/spark/commit/faa839052ebe0949882b7b8c0ca3557f26beb8e8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org