date:20161203

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #69618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69618/consoleFull)**
 for PR 16129 at commit 
[`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
The failure seems to be not related to this pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16129
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #69618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69618/consoleFull)**
 for PR 16129 at commit 
[`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16129
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69618/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16120
  
**[Test build #69615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69615/consoleFull)**
 for PR 16120 at commit 
[`a5594f7`](https://github.com/apache/spark/commit/a5594f7ffcbdc9ab2e83008a99d5878fa9fae2b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16120
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16102: [SPARK-18586][BUILD] netty-3.8.0.Final.jar has vu...

2016-12-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16102


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16037
  
Yes I'm pretty OK with merging this. If you can dig up any results, that's 
all the better. Will check in with you next week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69616/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69617/consoleFull)**
 for PR 16030 at commit 
[`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16114
  
**[Test build #69620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69620/consoleFull)**
 for PR 16114 at commit 
[`f381ac2`](https://github.com/apache/spark/commit/f381ac26cfd14420dbe21b1d58be54c201542357).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16098
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16116: [SPARK-18685][TESTS] Fix URI and release resource...

2016-12-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16116


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16116: [SPARK-18685][TESTS] Fix URI and release resources after...

2016-12-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16116
  
Thank you !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69617/consoleFull)**
 for PR 16030 at commit 
[`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #69616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69616/consoleFull)**
 for PR 13909 at commit 
[`b29d7cf`](https://github.com/apache/spark/commit/b29d7cf11a6b13f979ad96e1f1879409daf3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #69616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69616/consoleFull)**
 for PR 13909 at commit 
[`b29d7cf`](https://github.com/apache/spark/commit/b29d7cf11a6b13f979ad96e1f1879409daf3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16098
  
**[Test build #69619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69619/consoleFull)**
 for PR 16098 at commit 
[`4804862`](https://github.com/apache/spark/commit/48048622067f092ed247bc555e5461c073894a9c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16120
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16031: [SPARK-18606][HISTORYSERVER]remove useless elemen...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16031#discussion_r90754812
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/historypage.js 
---
@@ -78,6 +78,12 @@ jQuery.extend( jQuery.fn.dataTableExt.oSort, {
 }
 } );
 
+jQuery.extend( jQuery.fn.dataTableExt.ofnSearch, {
+"appid-numeric": function ( a ) {
+return a.replace(/[\r\n]/g, " ").replace(/<.*?>/g, "");
--- End diff --

@WangTaoTheTonic does that make sense / do you have time to look into this 
alternative?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16103: [SPARK-18374][ML]Incorrect words in StopWords/eng...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16103#discussion_r90754782
  
--- Diff: 
mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/english.txt ---
@@ -149,5 +149,58 @@ shan
 shouldn
 wasn
 weren
-won
 wouldn
--- End diff --

You would then remove the other stems like "wasn" "weren" etc right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90754922
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,31 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
+  }
+
+  /**
+   * Limit the number of processed records from Kinesis stream. This is 
because the KCL cannot
+   * control the number of aggregated records to be fetched even if we set 
`MaxRecords`
+   * in `KinesisClientLibConfiguration`. For example, if we set 10 to the 
number of max records
+   * in a worker and a producer aggregates two records into one message, 
the worker possibly
+   * 20 records every callback function called.
+   */
+  private def processRecordsWithLimit(
+  batch: List[Record], checkpointer: IRecordProcessorCheckpointer): 
Unit = {
+val maxRecords = receiver.getCurrentLimit
+if (batch.size() <= maxRecords) {
+  addRecords(batch, checkpointer)
--- End diff --

Aha, I see. I'll fix, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16069: [SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven...

2016-12-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16069


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16114
  
@srowen Do u know qualified maintainers on this component?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16114
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69620/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16120
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16098
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69614/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69611/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16120
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69610/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16098
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16129: [SPARK-18678][ML] Skewed feature subsampling in R...

2016-12-03 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/16129

[SPARK-18678][ML] Skewed feature subsampling in Random forest

## What changes were proposed in this pull request?

Fix reservoir sampling bias for small k. An off-by-one error meant that the 
probability of replacement was slightly too high -- k/(l-1) after l element 
instead of k/l, which matters for small k.

## How was this patch tested?

Existing test plus new test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-18678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16129


commit 8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec
Author: Sean Owen 
Date:   2016-12-03T09:32:00Z

Fix reservoir sampling bias for small k




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16102: [SPARK-18586][BUILD] netty-3.8.0.Final.jar has vulnerabi...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16102
  
Merged to master, though as I say I don't think the CVE actually impacted 
Spark to begin with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69617/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-12-03 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/12004
  
Test failure due to new artifacts
```
+++ b/dev/pr-deps/spark-deps-hadoop-2.7
@@ -16,8 +16,6 @@ arpack_combined_all-0.1.jar
 avro-1.7.7.jar
 avro-ipc-1.7.7.jar
 avro-mapred-1.7.7-hadoop2.jar
-aws-java-sdk-1.7.4.jar
-azure-storage-2.0.0.jar
 base64-2.3.8.jar
 bcprov-jdk15on-1.51.jar
 bonecp-0.8.0.RELEASE.jar
@@ -63,8 +61,6 @@ guice-3.0.jar
 guice-servlet-3.0.jar
 hadoop-annotations-2.7.3.jar
 hadoop-auth-2.7.3.jar
-hadoop-aws-2.7.3.jar
-hadoop-azure-2.7.3.jar
 hadoop-client-2.7.3.jar
 hadoop-common-2.7.3.jar
 hadoop-hdfs-2.7.3.jar
@@ -73,7 +69,6 @@ hadoop-mapreduce-client-common-2.7.3.jar
 hadoop-mapreduce-client-core-2.7.3.jar
 hadoop-mapreduce-client-jobclient-2.7.3.jar
 hadoop-mapreduce-client-shuffle-2.7.3.jar
-hadoop-hadoop-openstack-2.7.3.jar
 hadoop-yarn-api-2.7.3.jar
 hadoop-yarn-client-2.7.3.jar
 hadoop-yarn-common-2.7.3.jar
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...

2016-12-03 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11105#discussion_r90755993
  
--- Diff: 
core/src/test/scala/org/apache/spark/DataPropertyAccumulatorSuite.scala ---
@@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import scala.concurrent.ExecutionContext.Implicits.global
+import scala.ref.WeakReference
+
+import org.scalatest.Matchers
+
+import org.apache.spark.scheduler._
+import org.apache.spark.util.{AccumulatorContext, AccumulatorMetadata, 
AccumulatorV2, LongAccumulator}
+
+
+class DataPropertyAccumulatorSuite extends SparkFunSuite with Matchers 
with LocalSparkContext {
--- End diff --

That sounds like a good plan, I'll try and give the tests some more 
descriptive names (or where that isn't enough explain in comments some more 
about the functionality they are testing).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-12-03 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/11105
  
I'm down the idea of having add and merge not be final with huge warning 
signs and we could switch it up in 3.X to be final.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-03 Thread eyalfa

Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r90752975
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, CreateArray, 
CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field extraction
+  case GetStructField( createNamedStructLike : CreateNamedStructLike, 
ordinal, _ ) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+def getStructField( elem : Expression ) = {
+  GetStructField( elem, ordinal, Some(field.name) )
+}
+CreateArray( elems.map(getStructField) )
+  // push down item selection.
+  case ga @ GetArrayItem( CreateArray(elems), IntegerLiteral( idx ) ) 
=>
+if ( idx >= 0 && idx < elems.size ) {
+  elems(idx)
+} else {
+  Cast( Literal( null), ga.dataType )
+}
+}
+  }
+}
+
+/**
+* push down operations into [[CreateMap]].
+*/
+object SimplifyCreateMapOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
--- End diff --

@gatorsmile I've run a small regex on the spark source tree:
`git grep -En '[a-zA-Z][{]' -- *.scala`

this returns 277 places where this space is missing, am I missing anything?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16116: [SPARK-18685][TESTS] Fix URI and release resources after...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16116
  
Merged to master/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16120
  
**[Test build #69615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69615/consoleFull)**
 for PR 16120 at commit 
[`a5594f7`](https://github.com/apache/spark/commit/a5594f7ffcbdc9ab2e83008a99d5878fa9fae2b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90754731
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,31 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
+  }
+
+  /**
+   * Limit the number of processed records from Kinesis stream. This is 
because the KCL cannot
+   * control the number of aggregated records to be fetched even if we set 
`MaxRecords`
+   * in `KinesisClientLibConfiguration`. For example, if we set 10 to the 
number of max records
+   * in a worker and a producer aggregates two records into one message, 
the worker possibly
+   * 20 records every callback function called.
+   */
+  private def processRecordsWithLimit(
+  batch: List[Record], checkpointer: IRecordProcessorCheckpointer): 
Unit = {
+val maxRecords = receiver.getCurrentLimit
+if (batch.size() <= maxRecords) {
+  addRecords(batch, checkpointer)
--- End diff --

I think the for loop even takes care of this case, but no big deal either 
way. It seems like a reasonable change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16120
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69615/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16069: [SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven plugin...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16069
  
Merged to master. It's a build change and probably fine for 2.1 but it's 
non-trivial.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69621/consoleFull)**
 for PR 16030 at commit 
[`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16114
  
**[Test build #69620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69620/consoleFull)**
 for PR 16114 at commit 
[`f381ac2`](https://github.com/apache/spark/commit/f381ac26cfd14420dbe21b1d58be54c201542357).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16114
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16122
  
**[Test build #69622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69622/consoleFull)**
 for PR 16122 at commit 
[`f8955df`](https://github.com/apache/spark/commit/f8955dfc966ae41fbe2086168d62d44d61e15576).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15995
  
**[Test build #69623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69623/consoleFull)**
 for PR 15995 at commit 
[`b5f4394`](https://github.com/apache/spark/commit/b5f43946fd72932f7e23ac1f1b3866b150fe745b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16068#discussion_r90756326
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala 
---
@@ -144,7 +144,7 @@ private[hive] case class HiveGenericUDF(
   @transient
   private lazy val isUDFDeterministic = {
 val udfType = function.getClass.getAnnotation(classOf[HiveUDFType])
-udfType != null && udfType.deterministic()
+udfType != null && udfType.deterministic() && !udfType.stateful()
--- End diff --

an unrelated question, what's the difference between 
`udfType.deterministic` and `udfType.stateful`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...

2016-12-03 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16068#discussion_r90763121
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -487,6 +488,29 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils {
 assert(count4 == 1)
 sql("DROP TABLE parquet_tmp")
   }
+
+  test("Hive Stateful UDF") {
+sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS 
'${classOf[StatefulUDF].getName}'")
+sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS 
'${classOf[StatelessUDF].getName}'")
+val testData = spark.sparkContext.parallelize(
+  (0 until 10) map(x => IntegerCaseClass(1)), 2).toDF()
+testData.createOrReplaceTempView("inputTable")
+val max1 =
+  sql("SELECT MAX(s) FROM (" +
+"SELECT statefulUDF() as s FROM (SELECT i from inputTable 
DISTRIBUTE by i) a" +
+") b").head().getLong(0)
--- End diff --

will rewrite it after gathering feedback from others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-03 Thread zhzhan

Github user zhzhan commented on the issue:

https://github.com/apache/spark/pull/16068
  
@gatorsmile  we cannot use deterministic = true/false, as there are 
existing udf with deterministic as true, but stateful as true as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #3467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3467/consoleFull)**
 for PR 16129 at commit 
[`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16046: [SPARK-18582][SQL] Whitelist LogicalPlan operator...

2016-12-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16046


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16130: Update location of Spark YARN shuffle jar

2016-12-03 Thread nchammas

GitHub user nchammas opened a pull request:

https://github.com/apache/spark/pull/16130

Update location of Spark YARN shuffle jar

Looking at the distributions provided on spark.apache.org, I see that the 
Spark YARN shuffle jar is under `yarn/` and not `lib/`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nchammas/spark yarn-doc-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16130.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16130


commit 979a8a1811f471cd333bdde459649974626e612e
Author: Nicholas Chammas 
Date:   2016-12-03T20:11:18Z

update location of Spark shuffle jar




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar

2016-12-03 Thread nchammas

Github user nchammas commented on the issue:

https://github.com/apache/spark/pull/16130
  
cc @vanzin?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility - crea...

2016-12-03 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16119
  
Since the current tests pass without this change I'd say that we should add 
a test for the behaviour we are planning to support that isn't currently 
supported (would also make the purpose of the change a bit clearer).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16130
  
**[Test build #69628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69628/consoleFull)**
 for PR 16130 at commit 
[`979a8a1`](https://github.com/apache/spark/commit/979a8a1811f471cd333bdde459649974626e612e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16068
  

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L1373-L1378

Copied the code from Hive `FunctionRegistry.java`:
```JAVA
  /**
   * Returns whether a GenericUDF is deterministic or not.
   */
  public static boolean isDeterministic(GenericUDF genericUDF) {
if (isStateful(genericUDF)) {
  // stateful implies non-deterministic, regardless of whatever
  // the deterministic annotation declares
  return false;
}
...

  }
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16130
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69628/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16130
  
**[Test build #69628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69628/consoleFull)**
 for PR 16130 at commit 
[`979a8a1`](https://github.com/apache/spark/commit/979a8a1811f471cd333bdde459649974626e612e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16130
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16103: [SPARK-18374][ML]Incorrect words in StopWords/eng...

2016-12-03 Thread hhbyyh

Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/16103#discussion_r90765451
  
--- Diff: 
mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/english.txt ---
@@ -149,5 +149,58 @@ shan
 shouldn
 wasn
 weren
-won
 wouldn
--- End diff --

I'm fine with both options, leaving them or removing them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-03 Thread zhzhan

Github user zhzhan commented on the issue:

https://github.com/apache/spark/pull/16068
  
My understanding is that the non-deterministic udf does not need to be 
stageful, but a stateful udf has to be non-deterministic. 

Here is the comments in hive regarding this property

/**
If a UDF stores state based on the sequence of records it has processed, it
is stateful. A stateful UDF cannot be used in certain expressions such as
case statement and certain optimizations such as AND/OR short circuiting
don't apply for such UDFs, as they need to be invoked for each record.
row_sequence is an example of stateful UDF. A stateful UDF is considered to
be non-deterministic, irrespective of what deterministic() returns.
*
@return true
*/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16068
  
Could we directly use `@UDFType(deterministic = true/false)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...

2016-12-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16068
  
Found the link: [HIVE-1994: Support new annotation @UDFType(stateful = 
true)](https://issues.apache.org/jira/browse/HIVE-1994 )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16129
  
@felixcheung maybe you can advise me on this. I think this is a correct 
fix, but ends up changing the results of decision forests a little bit. The 
SparkR test you wrote fails:

```
Failed 
-
1. Failure: spark.randomForest (@test_mllib.R#937) 
-
predictions$prediction not equal to c(...).
16/16 mismatches (average diff: 0.108)
[1] 60.3 - 60.4 == -0.0508
[2] 61.2 - 61.1 ==  0.1272
[3] 60.7 - 60.6 ==  0.0543
[4] 62.1 - 62.3 == -0.1473
[5] 63.5 - 63.7 == -0.2044
[6] 64.1 - 64.3 == -0.2413
[7] 65.1 - 64.9 ==  0.2591
[8] 64.3 - 64.3 ==  0.0045
[9] 66.7 - 66.7 ==  0.0001
...
```

Of course I can just paste in the new values, as I expect a small change in 
the result, but wanted to sense-check it. The new answers are closer to the 
answers in the nearly-identical case above with 1 tree, which seems a little 
positive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16046: [SPARK-18582][SQL] Whitelist LogicalPlan operators allow...

2016-12-03 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16046
  
Merging to master/2.1/2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16094: [SPARK-18541][Python]Add metadata parameter to py...

2016-12-03 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16094#discussion_r90764328
  
--- Diff: python/pyspark/sql/column.py ---
@@ -298,19 +299,34 @@ def isin(self, *cols):
 isNotNull = _unary_op("isNotNull", "True if the current expression is 
not null.")
 
 @since(1.3)
-def alias(self, *alias):
+def alias(self, *alias, **kwargs):
 """
 Returns this column aliased with a new name or names (in the case 
of expressions that
 return more than one column, such as explode).
 
+Optional ``metadata`` keyword argument can be passed when aliasing 
a single column.
--- End diff --

2.2 is probably right, although the current 2.1 RC is more a of a strawman 
so it is possible (but up to @davies / @marmbrus if this warrants going into 
2.1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16121: [SPARK-16589][PYTHON] Chained cartesian produces incorre...

2016-12-03 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16121
  
I was hesistant with the previous PR since it seemed like we didn't fully 
understand why we were changing what we were at the time, I can try and take a 
closer look at this over the next few days if it is in a good place for that to 
happen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69621/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69623/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16114
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16114
  
**[Test build #69627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69627/consoleFull)**
 for PR 16114 at commit 
[`8cc24ec`](https://github.com/apache/spark/commit/8cc24ec516978931335b0b585a6dd2a7aff99663).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16122
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69625/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16122
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16098
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69619/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16098
  
**[Test build #69619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69619/consoleFull)**
 for PR 16098 at commit 
[`4804862`](https://github.com/apache/spark/commit/48048622067f092ed247bc555e5461c073894a9c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16098
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...

2016-12-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16043#discussion_r90757729
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Cast, CreateArray, 
CreateMap, CreateNamedStructLike, Expression, GetArrayItem, 
GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+
+/**
+* push down operations into [[CreateNamedStructLike]].
+*/
+object SimplifyCreateStructOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field extraction
+  case GetStructField( createNamedStructLike : CreateNamedStructLike, 
ordinal, _ ) =>
+createNamedStructLike.valExprs(ordinal)
+}
+  }
+}
+
+/**
+* push down operations into [[CreateArray]].
+*/
+object SimplifyCreateArrayOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
+  // push down field selection (array of structs)
+  case GetArrayStructFields(CreateArray(elems), field, ordinal, 
numFields, containsNull) =>
+def getStructField( elem : Expression ) = {
+  GetStructField( elem, ordinal, Some(field.name) )
+}
+CreateArray( elems.map(getStructField) )
+  // push down item selection.
+  case ga @ GetArrayItem( CreateArray(elems), IntegerLiteral( idx ) ) 
=>
+if ( idx >= 0 && idx < elems.size ) {
+  elems(idx)
+} else {
+  Cast( Literal( null), ga.dataType )
+}
+}
+  }
+}
+
+/**
+* push down operations into [[CreateMap]].
+*/
+object SimplifyCreateMapOps extends Rule[LogicalPlan]{
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformExpressionsUp{
--- End diff --

Oh @eyalfa, I understand it might be up to a personal preference if it is 
not documented and there are same instances with this but I believe the space 
between them is more common. Maybe you could leave `[WIP]` in the title in 
order to prevent the review if you are workinh on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16122
  
**[Test build #69625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69625/consoleFull)**
 for PR 16122 at commit 
[`19c7611`](https://github.com/apache/spark/commit/19c7611d07d63abefc221e551874ca630597c5c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16114
  
**[Test build #69624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69624/consoleFull)**
 for PR 16114 at commit 
[`b625b8f`](https://github.com/apache/spark/commit/b625b8f590756311993086ede07d1fb2f3295bf1).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16114
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69624/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16114
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90758322
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,27 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
--- End diff --

yea, you're right and this code overwrites `checkpointer` every the 
callback function called (maybe, every 1 sec.). I'm not sure what an original 
author thinks about though, it seems this is waste of codes. But, I also not 
sure that it is worth fixing this and this fix is out of scope in this jira. If 
necessary, I'm pleased to fix in follow-up activities.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #3466 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3466/consoleFull)**
 for PR 16129 at commit 
[`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90756693
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,27 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
+  }
+
+  /**
+   * Limit the number of processed records from Kinesis stream. This is 
because the KCL cannot
+   * control the number of aggregated records to be fetched even if we set 
`MaxRecords`
+   * in `KinesisClientLibConfiguration`. For example, if we set 10 to the 
number of max records
+   * in a worker and a producer aggregates two records into one message, 
the worker possibly
+   * 20 records every callback function called.
+   */
+  private def processRecordsWithLimit(
+  batch: List[Record], checkpointer: IRecordProcessorCheckpointer): 
Unit = {
+val maxRecords = receiver.getCurrentLimit
+for (start <- 0 until batch.size by maxRecords) {
--- End diff --

Hm, it just occurred to me that you would have a problem here if batch.size 
and maxRecords were both over Int.MaxValue / 2, and maxRecords were a bit 
smaller than batch.size. The addition below overflows.

It seems like a corner case but I note above you already defensively capped 
the maxRecords at Int.MaxValue so maybe it's less unlikely than it sounds.

You can fix it by letting the addition and min comparison take place over 
longs and then convert back to int.

Alternatively I think this is even simpler in Scala, though I imagine 
there's some extra overhead here:

```
batch.grouped(maxRecords).foreach(batch => addRecords(batch, checkpointer))
```

I don't know of a good reviewer for this component but I think I'm 
comfortable merging a straightforward change like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #69621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69621/consoleFull)**
 for PR 16030 at commit 
[`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-03 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/16122
  
This patch fails because hive-0.12 and hive-0.13 doesn't has `getMetaConf` 
method.
see [HIVE-7532](https://issues.apache.org/jira/browse/HIVE-7532),


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...

2016-12-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16114
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90758182
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,27 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
+  }
+
+  /**
+   * Limit the number of processed records from Kinesis stream. This is 
because the KCL cannot
+   * control the number of aggregated records to be fetched even if we set 
`MaxRecords`
+   * in `KinesisClientLibConfiguration`. For example, if we set 10 to the 
number of max records
+   * in a worker and a producer aggregates two records into one message, 
the worker possibly
+   * 20 records every callback function called.
+   */
+  private def processRecordsWithLimit(
+  batch: List[Record], checkpointer: IRecordProcessorCheckpointer): 
Unit = {
+val maxRecords = receiver.getCurrentLimit
+for (start <- 0 until batch.size by maxRecords) {
--- End diff --

Actually, since each kinesis shard has strict read limits of throughput 
(http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html), 
`batch.size` hardly exceeds `Int.MaxValue / 2`. But, since I like your idea in 
terms of code clearness, I fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15995
  
**[Test build #69623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69623/consoleFull)**
 for PR 15995 at commit 
[`b5f4394`](https://github.com/apache/spark/commit/b5f43946fd72932f7e23ac1f1b3866b150fe745b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...

2016-12-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16114#discussion_r90756702
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala
 ---
@@ -56,6 +56,27 @@ private[kinesis] class 
KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w
 logInfo(s"Initialized workerId $workerId with shardId $shardId")
   }
 
+  private def addRecords(batch: List[Record], checkpointer: 
IRecordProcessorCheckpointer): Unit = {
+receiver.addRecords(shardId, batch)
+logDebug(s"Stored: Worker $workerId stored ${batch.size} records for 
shardId $shardId")
+receiver.setCheckpointer(shardId, checkpointer)
--- End diff --

BTW is this supposed to be called on every batch or once at the end? I 
don't know how it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...

2016-12-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16129
  
**[Test build #3466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3466/consoleFull)**
 for PR 16129 at commit 
[`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...

2016-12-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16122
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69622/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 160 matches

Mail list logo