[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-13 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18902
  
@MLnick Thanks for pinging me.

I go through this quickly. The basic idea is the same, performing the 
operations on multiple inputs columns at one single Dataset/DataFrame operation.

Unlike `Bucketizer`, `Imputer` has no compatibility concern because it 
already supports multiple input columns (`HasInputCols`). In `Bucketizer`, we 
don't want to break its current API so it makes thing more complicated a bit.

Actually I'm noticed by `ImputerModel` which also applies `withColumn` 
sequentially on each input column. I'd like to address this part with the 
`withColumns` API proposed in #17819. What do you think @MLnick?







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-13 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/18902
  
cc @viirya on the multt-column generation issue - could be similar general 
solution to #17819?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-13 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged into master. Thanks for all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-12 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
Any more comments on this PR? It have been about one month since the last 
modification.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-04 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/18902
  
Sure. I will create JIRA after this perf gap is confirmed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-04 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/18902
  
Seems fine to me to use the DF version even though it's slower. But we 
should open a JIRA issue to track where the gap is on the SQL side of things 
and try to improve the performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/18902
  
hmm... that's interesting. So I found performance gap between dataframe 
codegen aggregation and the simple RDD aggregation. I will discuss with SQL 
team for this later. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
@WeichenXu123 No, I only cache the DataFrame. And the RDD-Version is 
[here](https://github.com/apache/spark/pull/18902/commits/8daffc9007c65f04e005ffe5dcfbeca634480465).
 
I use the same testsuit above to test those impls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/18902
  
+1 for using Dataframe-based version code.

@zhengruifeng One thing I want to confirm is that, I check your testing 
code, both RDD-based version and Dataframe-based version code will both cost on 
deserialization:
```
...
val df = spark.createDataFrame(rows, struct)
df.persist()
df.count()
...
// do `imputer.fit`
```
when running `imputer.fit`, it will extract the required columns from the 
cached input dataframe, and then you compare the perf between `RDD.aggregate` 
and `dataframe avg`, they both need to deserialize data from input and then do 
computation, and `dataframe avg` will take advantage of codegen and should be 
faster. But here the test show that RDD version is slower than Dataframe 
version, it is not very reasonable, so I want to confirm:
in your RDD version testing, do you cache again when get `RDD` from the 
input `Dataframe`?
If not, your testing has no problem, I will guess there exists other 
performance issue in SQL layer and cc @cloud-fan to take a look.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-28 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
@yanboliang Although dispointed by DF's performance, I also approve the 
choice of DF just for less code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-25 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18902
  
@zhengruifeng DataFrame-based operation is 2~3x slower than RDD-based 
operation is a known issue, because of the deserialization cost. If we switch 
to RDD-based method, we need to implement our own aggregator to calculate 
_mean_ and _median_, this need much more code than calling DataFrame API. BTW, 
DF using more compact structure that can reduce memory footprint.
From my perspective, I'd suggest to keep the current DF-based solution. As 
it will 5~10 faster than the original implementation. @hhbyyh @MLnick What do 
you think about it? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
@yanboliang RDD-based impl the (former 
commit)[https://github.com/apache/spark/pull/18902/commits/8daffc9007c65f04e005ffe5dcfbeca634480465]
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18902
  
@zhengruifeng What _the RDD-based one_ means? It's the code on master or 
the code in your former commit? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
@MLnick  @yanboliang  I update the performance comparison. 
The DF-based impl is a little slower than the RDD-based one when num of 
column is small. 
When num of column is large (100), DF-based impl is about 2~3 X slower than 
RDD-based one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80780 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80780/testReport)**
 for PR 18902 at commit 
[`495d701`](https://github.com/apache/spark/commit/495d70127b31f111362a9774da0eefba2b657e63).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80780/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/18902
  
@zhengruifeng  Could you verify & compare the performance of this new 
DF-based approach vs your original RDD-based one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80780 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80780/testReport)**
 for PR 18902 at commit 
[`495d701`](https://github.com/apache/spark/commit/495d70127b31f111362a9774da0eefba2b657e63).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-17 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18902
  
@hhbyyh @zhengruifeng I'm ok with the _convert to null_ method, I think 
there is no extra pass for data if we handle it with this way, and the 
DataFrame/RDD functions to compute _mean/median_ will ignore _null_ . Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-16 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
I test on dataframes containing `null`, both `avg` and 
`stat.approxQuantile` will ignore `null`. And if one column only contain 
`null`, `null` and `Array.empty[Double]` will be returned respectively.
Agree that we add more tests for this dependency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-16 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/18902
  
Thanks for the quick update. The implementation may be improved on some 
details. But first I'd want to confirm the "convert to null" method does not 
have any defect. 
@MLnick @srowen @yanboliang 

And we may need more unit tests to constantly monitor  the SQL behavior 
(avg and stat) on null. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
@hhbyyh  I rewrite the impl, and now all `NaN` and `missingValue` will be 
transform to `null` at first, then current methods are used.
For columns only containing `null`,  `null` is returned for `avg(col)`, and 
`Array.empty[Double]` is returned for `median`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80675/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80675/testReport)**
 for PR 18902 at commit 
[`5921f51`](https://github.com/apache/spark/commit/5921f514390420c331929c91fe7c8e89b708f7db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80675/testReport)**
 for PR 18902 at commit 
[`5921f51`](https://github.com/apache/spark/commit/5921f514390420c331929c91fe7c8e89b708f7db).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80667/testReport)**
 for PR 18902 at commit 
[`df7a0a3`](https://github.com/apache/spark/commit/df7a0a33002e9341c15820d6856b436df3e5ede2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80667/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80666/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80666/testReport)**
 for PR 18902 at commit 
[`2cca623`](https://github.com/apache/spark/commit/2cca623702e599d4b96ab093dfad22d228cfb6d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80663/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80663/testReport)**
 for PR 18902 at commit 
[`fd1eb43`](https://github.com/apache/spark/commit/fd1eb43d26bb08806ec2deefd86014caf7dcefdd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80667/testReport)**
 for PR 18902 at commit 
[`df7a0a3`](https://github.com/apache/spark/commit/df7a0a33002e9341c15820d6856b436df3e5ede2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80666/testReport)**
 for PR 18902 at commit 
[`2cca623`](https://github.com/apache/spark/commit/2cca623702e599d4b96ab093dfad22d228cfb6d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80663 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80663/testReport)**
 for PR 18902 at commit 
[`fd1eb43`](https://github.com/apache/spark/commit/fd1eb43d26bb08806ec2deefd86014caf7dcefdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80660/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80660 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80660/testReport)**
 for PR 18902 at commit 
[`fd1eb43`](https://github.com/apache/spark/commit/fd1eb43d26bb08806ec2deefd86014caf7dcefdd).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80660 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80660/testReport)**
 for PR 18902 at commit 
[`fd1eb43`](https://github.com/apache/spark/commit/fd1eb43d26bb08806ec2deefd86014caf7dcefdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-15 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
@hhbyyh Good Idea! We can also use this trick to compute median, because 
method 
`multipleApproxQuantiles`[https://github.com/apache/spark/blob/0e80ecae300f3e2033419b2d98da8bf092c105bb/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala#L65]
  can handle both `null` and `NaN`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-14 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/18902
  
Eh, I meant that it may be possible to get the mean values purely using 
DataFrame API. (convert missingValue/NaN to null) in one pass, so we may need 
to check the performance comparison. But I guess it looks a little hack.

For the median value, it may be harder so we can use the RDD API. (to be 
confirmed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80653/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80653 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80653/testReport)**
 for PR 18902 at commit 
[`8283411`](https://github.com/apache/spark/commit/82834117d7e587e335a599f8d5153e751b524862).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80653 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80653/testReport)**
 for PR 18902 at commit 
[`8283411`](https://github.com/apache/spark/commit/82834117d7e587e335a599f8d5153e751b524862).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
I test the performance on a small data, the value in the following table is 
the average duration in seconds:

|numColums| Old Mean | Old Median | New Mean | New Median |
|--|--||--||
|1|0.0771394713|0.0658712813|0.080779802|0.04816598149996|
|10|0.723434063099|0.5954440414|0.0867935197|0.1326342865998|
|100|7.3756451568|6.2196631259|0.1911931552|0.862537681701|

We can see that, even on a small data, the speedup is significant.
On big dataset that do not fit in memory, we should obtain better speedup.

and the test code is here:

```
import org.apache.spark.ml.feature._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import spark.implicits._
import scala.util.Random

val seed = 123l
val random = new Random(seed)
val n = 1
val m = 100
val rows = sc.parallelize(1 to n).map(i=> 
Row(Array.fill(m)(random.nextDouble): _*))
val struct = new StructType(Array.range(0,m,1).map(i => 
StructField(s"c$i",DoubleType,true)))
val df = spark.createDataFrame(rows, struct)
df.persist()
df.count()

for (strategy <- Seq("mean", "median"); k <- Seq(1,10,100)) {
val imputer = new 
Imputer().setStrategy(strategy).setInputCols(Array.range(0,k,1).map(i=>s"c$i")).setOutputCols(Array.range(0,k,1).map(i=>s"o$i"))
var duration = 0.0
for (i<- 0 until 10) {
val start = System.nanoTime()
imputer.fit(df)
val end = System.nanoTime()
duration += (end - start) / 1e9
}
println((strategy, k, duration/10))
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
@hhbyyh Yes, I will test the performance. 
Btw, the median computation by call `stat.approxQuantile` will also 
transform df into rdd before aggregation. see 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala#L102


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/18902
  
Hi @zhengruifeng Thanks for the idea and implementation. Definitely 
something worth exploring.
 
As I understand, the new implementation improves the locality yet it 
leverages RDD API instead of Dataset API. Since overall this targets a 
performance improvement, I'd be interested to see the performance comparison. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80479/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80479/testReport)**
 for PR 18902 at commit 
[`660c2db`](https://github.com/apache/spark/commit/660c2dbc3e800a8f8fe4bc1b36a72ccdc37a778e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80479/testReport)**
 for PR 18902 at commit 
[`660c2db`](https://github.com/apache/spark/commit/660c2dbc3e800a8f8fe4bc1b36a72ccdc37a778e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/18902
  
Jenkis, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80478/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80478 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80478/testReport)**
 for PR 18902 at commit 
[`660c2db`](https://github.com/apache/spark/commit/660c2dbc3e800a8f8fe4bc1b36a72ccdc37a778e).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80477 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80477/testReport)**
 for PR 18902 at commit 
[`f6f166f`](https://github.com/apache/spark/commit/f6f166fef4e17db7e36ccecf41aebe3443e9fef5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80478/testReport)**
 for PR 18902 at commit 
[`660c2db`](https://github.com/apache/spark/commit/660c2dbc3e800a8f8fe4bc1b36a72ccdc37a778e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80477/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18902
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18902
  
**[Test build #80477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80477/testReport)**
 for PR 18902 at commit 
[`f6f166f`](https://github.com/apache/spark/commit/f6f166fef4e17db7e36ccecf41aebe3443e9fef5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org