[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16030 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16129 **[Test build #69618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69618/consoleFull)** for PR 16129 at commit [`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16030 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16030 The failure seems to be not related to this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16129 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16129 **[Test build #69618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69618/consoleFull)** for PR 16129 at commit [`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16129 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69618/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16120 **[Test build #69615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69615/consoleFull)** for PR 16120 at commit [`a5594f7`](https://github.com/apache/spark/commit/a5594f7ffcbdc9ab2e83008a99d5878fa9fae2b8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16120 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16102: [SPARK-18586][BUILD] netty-3.8.0.Final.jar has vu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16037 Yes I'm pretty OK with merging this. If you can dig up any results, that's all the better. Will check in with you next week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69616/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #69617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69617/consoleFull)** for PR 16030 at commit [`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16114 **[Test build #69620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69620/consoleFull)** for PR 16114 at commit [`f381ac2`](https://github.com/apache/spark/commit/f381ac26cfd14420dbe21b1d58be54c201542357). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16098 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16116: [SPARK-18685][TESTS] Fix URI and release resource...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16116 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16116: [SPARK-18685][TESTS] Fix URI and release resources after...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16116 Thank you !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #69617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69617/consoleFull)** for PR 16030 at commit [`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #69616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69616/consoleFull)** for PR 13909 at commit [`b29d7cf`](https://github.com/apache/spark/commit/b29d7cf11a6b13f979ad96e1f1879409daf3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #69616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69616/consoleFull)** for PR 13909 at commit [`b29d7cf`](https://github.com/apache/spark/commit/b29d7cf11a6b13f979ad96e1f1879409daf3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16098 **[Test build #69619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69619/consoleFull)** for PR 16098 at commit [`4804862`](https://github.com/apache/spark/commit/48048622067f092ed247bc555e5461c073894a9c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16120 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16031: [SPARK-18606][HISTORYSERVER]remove useless elemen...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16031#discussion_r90754812 --- Diff: core/src/main/resources/org/apache/spark/ui/static/historypage.js --- @@ -78,6 +78,12 @@ jQuery.extend( jQuery.fn.dataTableExt.oSort, { } } ); +jQuery.extend( jQuery.fn.dataTableExt.ofnSearch, { +"appid-numeric": function ( a ) { +return a.replace(/[\r\n]/g, " ").replace(/<.*?>/g, ""); --- End diff -- @WangTaoTheTonic does that make sense / do you have time to look into this alternative? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16103: [SPARK-18374][ML]Incorrect words in StopWords/eng...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16103#discussion_r90754782 --- Diff: mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/english.txt --- @@ -149,5 +149,58 @@ shan shouldn wasn weren -won wouldn --- End diff -- You would then remove the other stems like "wasn" "weren" etc right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16114#discussion_r90754922 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala --- @@ -56,6 +56,31 @@ private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w logInfo(s"Initialized workerId $workerId with shardId $shardId") } + private def addRecords(batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +receiver.addRecords(shardId, batch) +logDebug(s"Stored: Worker $workerId stored ${batch.size} records for shardId $shardId") +receiver.setCheckpointer(shardId, checkpointer) + } + + /** + * Limit the number of processed records from Kinesis stream. This is because the KCL cannot + * control the number of aggregated records to be fetched even if we set `MaxRecords` + * in `KinesisClientLibConfiguration`. For example, if we set 10 to the number of max records + * in a worker and a producer aggregates two records into one message, the worker possibly + * 20 records every callback function called. + */ + private def processRecordsWithLimit( + batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +val maxRecords = receiver.getCurrentLimit +if (batch.size() <= maxRecords) { + addRecords(batch, checkpointer) --- End diff -- Aha, I see. I'll fix, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16069: [SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16069 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16114 @srowen Do u know qualified maintainers on this component? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16114 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69620/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16120 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16098 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69614/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69611/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16120 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69610/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16098 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16129: [SPARK-18678][ML] Skewed feature subsampling in R...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/16129 [SPARK-18678][ML] Skewed feature subsampling in Random forest ## What changes were proposed in this pull request? Fix reservoir sampling bias for small k. An off-by-one error meant that the probability of replacement was slightly too high -- k/(l-1) after l element instead of k/l, which matters for small k. ## How was this patch tested? Existing test plus new test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-18678 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16129 commit 8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec Author: Sean OwenDate: 2016-12-03T09:32:00Z Fix reservoir sampling bias for small k --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16102: [SPARK-18586][BUILD] netty-3.8.0.Final.jar has vulnerabi...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16102 Merged to master, though as I say I don't think the CVE actually impacted Spark to begin with. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69617/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/12004 Test failure due to new artifacts ``` +++ b/dev/pr-deps/spark-deps-hadoop-2.7 @@ -16,8 +16,6 @@ arpack_combined_all-0.1.jar avro-1.7.7.jar avro-ipc-1.7.7.jar avro-mapred-1.7.7-hadoop2.jar -aws-java-sdk-1.7.4.jar -azure-storage-2.0.0.jar base64-2.3.8.jar bcprov-jdk15on-1.51.jar bonecp-0.8.0.RELEASE.jar @@ -63,8 +61,6 @@ guice-3.0.jar guice-servlet-3.0.jar hadoop-annotations-2.7.3.jar hadoop-auth-2.7.3.jar -hadoop-aws-2.7.3.jar -hadoop-azure-2.7.3.jar hadoop-client-2.7.3.jar hadoop-common-2.7.3.jar hadoop-hdfs-2.7.3.jar @@ -73,7 +69,6 @@ hadoop-mapreduce-client-common-2.7.3.jar hadoop-mapreduce-client-core-2.7.3.jar hadoop-mapreduce-client-jobclient-2.7.3.jar hadoop-mapreduce-client-shuffle-2.7.3.jar -hadoop-hadoop-openstack-2.7.3.jar hadoop-yarn-api-2.7.3.jar hadoop-yarn-client-2.7.3.jar hadoop-yarn-common-2.7.3.jar ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/11105#discussion_r90755993 --- Diff: core/src/test/scala/org/apache/spark/DataPropertyAccumulatorSuite.scala --- @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import scala.concurrent.ExecutionContext.Implicits.global +import scala.ref.WeakReference + +import org.scalatest.Matchers + +import org.apache.spark.scheduler._ +import org.apache.spark.util.{AccumulatorContext, AccumulatorMetadata, AccumulatorV2, LongAccumulator} + + +class DataPropertyAccumulatorSuite extends SparkFunSuite with Matchers with LocalSparkContext { --- End diff -- That sounds like a good plan, I'll try and give the tests some more descriptive names (or where that isn't enough explain in comments some more about the functionality they are testing). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/11105 I'm down the idea of having add and merge not be final with huge warning signs and we could switch it up in 3.X to be final. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user eyalfa commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r90752975 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field extraction + case GetStructField( createNamedStructLike : CreateNamedStructLike, ordinal, _ ) => +createNamedStructLike.valExprs(ordinal) +} + } +} + +/** +* push down operations into [[CreateArray]]. +*/ +object SimplifyCreateArrayOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field selection (array of structs) + case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) => +def getStructField( elem : Expression ) = { + GetStructField( elem, ordinal, Some(field.name) ) +} +CreateArray( elems.map(getStructField) ) + // push down item selection. + case ga @ GetArrayItem( CreateArray(elems), IntegerLiteral( idx ) ) => +if ( idx >= 0 && idx < elems.size ) { + elems(idx) +} else { + Cast( Literal( null), ga.dataType ) +} +} + } +} + +/** +* push down operations into [[CreateMap]]. +*/ +object SimplifyCreateMapOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ --- End diff -- @gatorsmile I've run a small regex on the spark source tree: `git grep -En '[a-zA-Z][{]' -- *.scala` this returns 277 places where this space is missing, am I missing anything? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16116: [SPARK-18685][TESTS] Fix URI and release resources after...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16116 Merged to master/2.1/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16120 **[Test build #69615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69615/consoleFull)** for PR 16120 at commit [`a5594f7`](https://github.com/apache/spark/commit/a5594f7ffcbdc9ab2e83008a99d5878fa9fae2b8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16114#discussion_r90754731 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala --- @@ -56,6 +56,31 @@ private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w logInfo(s"Initialized workerId $workerId with shardId $shardId") } + private def addRecords(batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +receiver.addRecords(shardId, batch) +logDebug(s"Stored: Worker $workerId stored ${batch.size} records for shardId $shardId") +receiver.setCheckpointer(shardId, checkpointer) + } + + /** + * Limit the number of processed records from Kinesis stream. This is because the KCL cannot + * control the number of aggregated records to be fetched even if we set `MaxRecords` + * in `KinesisClientLibConfiguration`. For example, if we set 10 to the number of max records + * in a worker and a producer aggregates two records into one message, the worker possibly + * 20 records every callback function called. + */ + private def processRecordsWithLimit( + batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +val maxRecords = receiver.getCurrentLimit +if (batch.size() <= maxRecords) { + addRecords(batch, checkpointer) --- End diff -- I think the for loop even takes care of this case, but no big deal either way. It seems like a reasonable change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16120: [SPARK-18634][PySpark][SQL][WIP] Corruption and Correctn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16120 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69615/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16069: [SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven plugin...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16069 Merged to master. It's a build change and probably fine for 2.1 but it's non-trivial. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #69621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69621/consoleFull)** for PR 16030 at commit [`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16114 **[Test build #69620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69620/consoleFull)** for PR 16114 at commit [`f381ac2`](https://github.com/apache/spark/commit/f381ac26cfd14420dbe21b1d58be54c201542357). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16114 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16122 **[Test build #69622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69622/consoleFull)** for PR 16122 at commit [`f8955df`](https://github.com/apache/spark/commit/f8955dfc966ae41fbe2086168d62d44d61e15576). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15995 **[Test build #69623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69623/consoleFull)** for PR 15995 at commit [`b5f4394`](https://github.com/apache/spark/commit/b5f43946fd72932f7e23ac1f1b3866b150fe745b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r90756326 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -144,7 +144,7 @@ private[hive] case class HiveGenericUDF( @transient private lazy val isUDFDeterministic = { val udfType = function.getClass.getAnnotation(classOf[HiveUDFType]) -udfType != null && udfType.deterministic() +udfType != null && udfType.deterministic() && !udfType.stateful() --- End diff -- an unrelated question, what's the difference between `udfType.deterministic` and `udfType.stateful`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16068: [SPARK-18637][SQL]Stateful UDF should be consider...
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/16068#discussion_r90763121 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala --- @@ -487,6 +488,29 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleton with SQLTestUtils { assert(count4 == 1) sql("DROP TABLE parquet_tmp") } + + test("Hive Stateful UDF") { +sql(s"CREATE TEMPORARY FUNCTION statefulUDF AS '${classOf[StatefulUDF].getName}'") +sql(s"CREATE TEMPORARY FUNCTION statelessUDF AS '${classOf[StatelessUDF].getName}'") +val testData = spark.sparkContext.parallelize( + (0 until 10) map(x => IntegerCaseClass(1)), 2).toDF() +testData.createOrReplaceTempView("inputTable") +val max1 = + sql("SELECT MAX(s) FROM (" + +"SELECT statefulUDF() as s FROM (SELECT i from inputTable DISTRIBUTE by i) a" + +") b").head().getLong(0) --- End diff -- will rewrite it after gathering feedback from others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16068 @gatorsmile we cannot use deterministic = true/false, as there are existing udf with deterministic as true, but stateful as true as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16129 **[Test build #3467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3467/consoleFull)** for PR 16129 at commit [`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16046: [SPARK-18582][SQL] Whitelist LogicalPlan operator...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16046 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16130: Update location of Spark YARN shuffle jar
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/16130 Update location of Spark YARN shuffle jar Looking at the distributions provided on spark.apache.org, I see that the Spark YARN shuffle jar is under `yarn/` and not `lib/`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nchammas/spark yarn-doc-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16130.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16130 commit 979a8a1811f471cd333bdde459649974626e612e Author: Nicholas ChammasDate: 2016-12-03T20:11:18Z update location of Spark shuffle jar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/16130 cc @vanzin? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility - crea...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16119 Since the current tests pass without this change I'd say that we should add a test for the behaviour we are planning to support that isn't currently supported (would also make the purpose of the change a bit clearer). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16130 **[Test build #69628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69628/consoleFull)** for PR 16130 at commit [`979a8a1`](https://github.com/apache/spark/commit/979a8a1811f471cd333bdde459649974626e612e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16068 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L1373-L1378 Copied the code from Hive `FunctionRegistry.java`: ```JAVA /** * Returns whether a GenericUDF is deterministic or not. */ public static boolean isDeterministic(GenericUDF genericUDF) { if (isStateful(genericUDF)) { // stateful implies non-deterministic, regardless of whatever // the deterministic annotation declares return false; } ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16130 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69628/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16130 **[Test build #69628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69628/consoleFull)** for PR 16130 at commit [`979a8a1`](https://github.com/apache/spark/commit/979a8a1811f471cd333bdde459649974626e612e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16130 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16103: [SPARK-18374][ML]Incorrect words in StopWords/eng...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/16103#discussion_r90765451 --- Diff: mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/english.txt --- @@ -149,5 +149,58 @@ shan shouldn wasn weren -won wouldn --- End diff -- I'm fine with both options, leaving them or removing them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/16068 My understanding is that the non-deterministic udf does not need to be stageful, but a stateful udf has to be non-deterministic. Here is the comments in hive regarding this property /** If a UDF stores state based on the sequence of records it has processed, it is stateful. A stateful UDF cannot be used in certain expressions such as case statement and certain optimizations such as AND/OR short circuiting don't apply for such UDFs, as they need to be invoked for each record. row_sequence is an example of stateful UDF. A stateful UDF is considered to be non-deterministic, irrespective of what deterministic() returns. * @return true */ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16068 Could we directly use `@UDFType(deterministic = true/false)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16068: [SPARK-18637][SQL]Stateful UDF should be considered as n...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16068 Found the link: [HIVE-1994: Support new annotation @UDFType(stateful = true)](https://issues.apache.org/jira/browse/HIVE-1994 ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16129 @felixcheung maybe you can advise me on this. I think this is a correct fix, but ends up changing the results of decision forests a little bit. The SparkR test you wrote fails: ``` Failed - 1. Failure: spark.randomForest (@test_mllib.R#937) - predictions$prediction not equal to c(...). 16/16 mismatches (average diff: 0.108) [1] 60.3 - 60.4 == -0.0508 [2] 61.2 - 61.1 == 0.1272 [3] 60.7 - 60.6 == 0.0543 [4] 62.1 - 62.3 == -0.1473 [5] 63.5 - 63.7 == -0.2044 [6] 64.1 - 64.3 == -0.2413 [7] 65.1 - 64.9 == 0.2591 [8] 64.3 - 64.3 == 0.0045 [9] 66.7 - 66.7 == 0.0001 ... ``` Of course I can just paste in the new values, as I expect a small change in the result, but wanted to sense-check it. The new answers are closer to the answers in the nearly-identical case above with 1 tree, which seems a little positive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16046: [SPARK-18582][SQL] Whitelist LogicalPlan operators allow...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16046 Merging to master/2.1/2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16094: [SPARK-18541][Python]Add metadata parameter to py...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16094#discussion_r90764328 --- Diff: python/pyspark/sql/column.py --- @@ -298,19 +299,34 @@ def isin(self, *cols): isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") @since(1.3) -def alias(self, *alias): +def alias(self, *alias, **kwargs): """ Returns this column aliased with a new name or names (in the case of expressions that return more than one column, such as explode). +Optional ``metadata`` keyword argument can be passed when aliasing a single column. --- End diff -- 2.2 is probably right, although the current 2.1 RC is more a of a strawman so it is possible (but up to @davies / @marmbrus if this warrants going into 2.1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16121: [SPARK-16589][PYTHON] Chained cartesian produces incorre...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16121 I was hesistant with the previous PR since it seemed like we didn't fully understand why we were changing what we were at the time, I can try and take a closer look at this over the next few days if it is in a good place for that to happen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69621/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15995 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69623/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16114 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16114 **[Test build #69627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69627/consoleFull)** for PR 16114 at commit [`8cc24ec`](https://github.com/apache/spark/commit/8cc24ec516978931335b0b585a6dd2a7aff99663). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16122 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69625/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16122 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16098 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69619/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16098 **[Test build #69619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69619/consoleFull)** for PR 16098 at commit [`4804862`](https://github.com/apache/spark/commit/48048622067f092ed247bc555e5461c073894a9c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16098 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16043: [SPARK-18601][SQL] Simplify Create/Get complex ex...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16043#discussion_r90757729 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala --- @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Cast, CreateArray, CreateMap, CreateNamedStructLike, Expression, GetArrayItem, GetArrayStructFields, GetMapValue, GetStructField, IntegerLiteral, Literal} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule + +/** +* push down operations into [[CreateNamedStructLike]]. +*/ +object SimplifyCreateStructOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field extraction + case GetStructField( createNamedStructLike : CreateNamedStructLike, ordinal, _ ) => +createNamedStructLike.valExprs(ordinal) +} + } +} + +/** +* push down operations into [[CreateArray]]. +*/ +object SimplifyCreateArrayOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ + // push down field selection (array of structs) + case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) => +def getStructField( elem : Expression ) = { + GetStructField( elem, ordinal, Some(field.name) ) +} +CreateArray( elems.map(getStructField) ) + // push down item selection. + case ga @ GetArrayItem( CreateArray(elems), IntegerLiteral( idx ) ) => +if ( idx >= 0 && idx < elems.size ) { + elems(idx) +} else { + Cast( Literal( null), ga.dataType ) +} +} + } +} + +/** +* push down operations into [[CreateMap]]. +*/ +object SimplifyCreateMapOps extends Rule[LogicalPlan]{ + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformExpressionsUp{ --- End diff -- Oh @eyalfa, I understand it might be up to a personal preference if it is not documented and there are same instances with this but I believe the space between them is more common. Maybe you could leave `[WIP]` in the title in order to prevent the review if you are workinh on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16122 **[Test build #69625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69625/consoleFull)** for PR 16122 at commit [`19c7611`](https://github.com/apache/spark/commit/19c7611d07d63abefc221e551874ca630597c5c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16114 **[Test build #69624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69624/consoleFull)** for PR 16114 at commit [`b625b8f`](https://github.com/apache/spark/commit/b625b8f590756311993086ede07d1fb2f3295bf1). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16114 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69624/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16114 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16114#discussion_r90758322 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala --- @@ -56,6 +56,27 @@ private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w logInfo(s"Initialized workerId $workerId with shardId $shardId") } + private def addRecords(batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +receiver.addRecords(shardId, batch) +logDebug(s"Stored: Worker $workerId stored ${batch.size} records for shardId $shardId") +receiver.setCheckpointer(shardId, checkpointer) --- End diff -- yea, you're right and this code overwrites `checkpointer` every the callback function called (maybe, every 1 sec.). I'm not sure what an original author thinks about though, it seems this is waste of codes. But, I also not sure that it is worth fixing this and this fix is out of scope in this jira. If necessary, I'm pleased to fix in follow-up activities. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16129 **[Test build #3466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3466/consoleFull)** for PR 16129 at commit [`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16114#discussion_r90756693 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala --- @@ -56,6 +56,27 @@ private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w logInfo(s"Initialized workerId $workerId with shardId $shardId") } + private def addRecords(batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +receiver.addRecords(shardId, batch) +logDebug(s"Stored: Worker $workerId stored ${batch.size} records for shardId $shardId") +receiver.setCheckpointer(shardId, checkpointer) + } + + /** + * Limit the number of processed records from Kinesis stream. This is because the KCL cannot + * control the number of aggregated records to be fetched even if we set `MaxRecords` + * in `KinesisClientLibConfiguration`. For example, if we set 10 to the number of max records + * in a worker and a producer aggregates two records into one message, the worker possibly + * 20 records every callback function called. + */ + private def processRecordsWithLimit( + batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +val maxRecords = receiver.getCurrentLimit +for (start <- 0 until batch.size by maxRecords) { --- End diff -- Hm, it just occurred to me that you would have a problem here if batch.size and maxRecords were both over Int.MaxValue / 2, and maxRecords were a bit smaller than batch.size. The addition below overflows. It seems like a corner case but I note above you already defensively capped the maxRecords at Int.MaxValue so maybe it's less unlikely than it sounds. You can fix it by letting the addition and min comparison take place over longs and then convert back to int. Alternatively I think this is even simpler in Scala, though I imagine there's some extra overhead here: ``` batch.grouped(maxRecords).foreach(batch => addRecords(batch, checkpointer)) ``` I don't know of a good reviewer for this component but I think I'm comfortable merging a straightforward change like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #69621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69621/consoleFull)** for PR 16030 at commit [`1ab3363`](https://github.com/apache/spark/commit/1ab3363746d9c53fdcdf24564020fe3a784be06a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/16122 This patch fails because hive-0.12 and hive-0.13 doesn't has `getMetaConf` method. see [HIVE-7532](https://issues.apache.org/jira/browse/HIVE-7532), --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16114: [SPARK-18620][Streaming][Kinesis] Flatten input rates in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16114 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16114#discussion_r90758182 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala --- @@ -56,6 +56,27 @@ private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w logInfo(s"Initialized workerId $workerId with shardId $shardId") } + private def addRecords(batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +receiver.addRecords(shardId, batch) +logDebug(s"Stored: Worker $workerId stored ${batch.size} records for shardId $shardId") +receiver.setCheckpointer(shardId, checkpointer) + } + + /** + * Limit the number of processed records from Kinesis stream. This is because the KCL cannot + * control the number of aggregated records to be fetched even if we set `MaxRecords` + * in `KinesisClientLibConfiguration`. For example, if we set 10 to the number of max records + * in a worker and a producer aggregates two records into one message, the worker possibly + * 20 records every callback function called. + */ + private def processRecordsWithLimit( + batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +val maxRecords = receiver.getCurrentLimit +for (start <- 0 until batch.size by maxRecords) { --- End diff -- Actually, since each kinesis shard has strict read limits of throughput (http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html), `batch.size` hardly exceeds `Int.MaxValue / 2`. But, since I like your idea in terms of code clearness, I fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15995 **[Test build #69623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69623/consoleFull)** for PR 15995 at commit [`b5f4394`](https://github.com/apache/spark/commit/b5f43946fd72932f7e23ac1f1b3866b150fe745b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16114: [SPARK-18620][Streaming][Kinesis] Flatten input r...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16114#discussion_r90756702 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala --- @@ -56,6 +56,27 @@ private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], w logInfo(s"Initialized workerId $workerId with shardId $shardId") } + private def addRecords(batch: List[Record], checkpointer: IRecordProcessorCheckpointer): Unit = { +receiver.addRecords(shardId, batch) +logDebug(s"Stored: Worker $workerId stored ${batch.size} records for shardId $shardId") +receiver.setCheckpointer(shardId, checkpointer) --- End diff -- BTW is this supposed to be called on every batch or once at the end? I don't know how it works. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16129: [SPARK-18678][ML] Skewed feature subsampling in Random f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16129 **[Test build #3466 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3466/consoleFull)** for PR 16129 at commit [`8ac5dee`](https://github.com/apache/spark/commit/8ac5dee8f9c0165da7a16d83d79f2f5080edb3ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16122: [SPARK-18681][SQL] Fix filtering to compatible with part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16122 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69622/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org