[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15695 **[Test build #3380 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3380/consoleFull)** for PR 15695 at commit [`0d4461a`](https://github.com/apache/spark/commit/0d4461a9e444008a35cc04c607447dc3d4677b7f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67832/consoleFull)** for PR 15696 at commit [`6af14b5`](https://github.com/apache/spark/commit/6af14b56590a0882800f62a2a2b939ee3715edbb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15675: [SPARK-18144][SQL] logging StreamingQueryListener$QueryS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15675 **[Test build #67833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67833/consoleFull)** for PR 15675 at commit [`8d2b3a6`](https://github.com/apache/spark/commit/8d2b3a63f8d14acb8bd33ce15a9b7a593b703f97). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15673 **[Test build #67834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67834/consoleFull)** for PR 15673 at commit [`4c438c8`](https://github.com/apache/spark/commit/4c438c8b2575880379e2a9a872fe07018cb62402). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user jodersky commented on the issue: https://github.com/apache/spark/pull/15398 @hvanhovell, I just wanted to add that option 2 from my last comment won't work: if we replace `\\` with `` in the unescape function (as is done for `\%` and `\_`), it will be impossible for a user to specify a single backslash. Considering that, another potential solution would be change the ANTLR parser to handle escaping differently in LIKE expressions. This would however introduce a special case which complicates the overall rules. Maybe leaving the current behaviour but documenting that backslashes are escaped on the parser-level is the best option? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/15701 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15696: [SPARK-18024][SQL] Introduce an internal commit p...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15696#discussion_r85847969 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala --- @@ -141,15 +139,14 @@ class SimpleTextOutputWriter( } } -class AppendingTextOutputFormat(stagingDir: Path, fileNamePrefix: String) - extends TextOutputFormat[NullWritable, Text] { +class AppendingTextOutputFormat(path: String) extends TextOutputFormat[NullWritable, Text] { val numberFormat = NumberFormat.getInstance() numberFormat.setMinimumIntegerDigits(5) numberFormat.setGroupingUsed(false) override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { -new Path(stagingDir, fileNamePrefix) +new Path(path) --- End diff -- `+ extension`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15695 **[Test build #3380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3380/consoleFull)** for PR 15695 at commit [`0d4461a`](https://github.com/apache/spark/commit/0d4461a9e444008a35cc04c607447dc3d4677b7f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67840/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15703 **[Test build #67844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67844/consoleFull)** for PR 15703 at commit [`5a23a97`](https://github.com/apache/spark/commit/5a23a979c5e6a61f847b146a1cb656418054d955). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15701: [SPARK-18167] [SQL] Also log all partitions when ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15701 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/15302 @dongjoon-hyun I'll take a look tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67835/consoleFull)** for PR 15696 at commit [`6166093`](https://github.com/apache/spark/commit/6166093d511e833587d32e398338e2f47ccbcc8a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15675: [SPARK-18144][SQL] logging StreamingQueryListener...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15675#discussion_r85830450 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala --- @@ -41,6 +41,8 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus) def post(event: StreamingQueryListener.Event) { event match { case s: QueryStartedEvent => +sparkListenerBus.post(s) --- End diff -- Since this is hacky, it's better to have some tests to make sure we won't break things in future when removing the hacky codes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #67825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67825/consoleFull)** for PR 11105 at commit [`8c560ca`](https://github.com/apache/spark/commit/8c560ca6dd8c28f86630ae42bb50739a8614bec3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67825/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15700: [SPARK-17964][SparkR] Enable SparkR with Mesos cl...
GitHub user susanxhuynh opened a pull request: https://github.com/apache/spark/pull/15700 [SPARK-17964][SparkR] Enable SparkR with Mesos client mode and cluster mode ## What changes were proposed in this pull request? Enabled SparkR with Mesos client mode and cluster mode. Just a few changes were required to get this working on Mesos: (1) removed the SparkR on Mesos error checks and (2) do not require "--class" to be specified for R apps. The logical to check spark.mesos.executor.home was already in there. @sun-rui ## How was this patch tested? 1. SparkSubmitSuite 2. On local mesos cluster (on laptop): ran SparkR shell, spark-submit client mode, and spark-submit cluster mode, with the "examples/src/main/Rdataframe.R" example application. 3. On multi-node mesos cluster: ran SparkR shell, spark-submit client mode, and spark-submit cluster mode, with the "examples/src/main/Rdataframe.R" example application. I tested with the following --conf values set: spark.mesos.executor.docker.image and spark.mesos.executor.home You can merge this pull request into a Git repository by running: $ git pull https://github.com/mesosphere/spark susan-r-branch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15700.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15700 commit 4a8b07464fcb6e4b082e79ff9dd4bc3e41d43bbc Author: Susan X. HuynhDate: 2016-10-31T18:39:04Z Enabled SparkR on Mesos. RPackages from R source is not yet supported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67824/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15698 **[Test build #67828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67828/consoleFull)** for PR 15698 at commit [`b777ee5`](https://github.com/apache/spark/commit/b777ee5bb25f38086bfe2126be26de8f1e14a14d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15698 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] SparkContext.addFile doesn't w...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/15669 If I understand the issue here, the problem is not `--files` but setting `spark.files` through `--conf` (or the config file). The former is translated into `spark.yarn.dist.files` by `SparkSubmit` but it seems the latter isn't, and to me that's the bug. So basically what Tom and Mridul have said. I think the correct fix would be in either `SparkSubmit.scala` or YARN's `Client.scala` - probably the latter to keep the YARN-specific stuff in one place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Even Time Wate...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r85845880 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Attribute, UnsafeProjection} +import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark +import org.apache.spark.sql.execution.SparkPlan +import org.apache.spark.sql.types.MetadataBuilder +import org.apache.spark.unsafe.types.CalendarInterval +import org.apache.spark.util.AccumulatorV2 + +class MaxLong(protected var currentValue: Long = 0) + extends AccumulatorV2[Long, Long] + with Serializable { + + override def isZero: Boolean = value == 0 + override def value: Long = currentValue + override def copy(): AccumulatorV2[Long, Long] = new MaxLong(currentValue) + + override def reset(): Unit = { +currentValue = 0 + } + + override def add(v: Long): Unit = { +if (value < v) { currentValue = v } + } + + override def merge(other: AccumulatorV2[Long, Long]): Unit = { +if (currentValue < other.value) { + currentValue = other.value +} + } +} + +/** + * Used to mark a column as the containing the event time for a given record. In addition to + * adding appropriate metadata to this column, this operator also tracks the maximum observed event + * time. Based on the maximum observed time and a user specified delay, we can calculate the + * `watermark` after which we assume we will no longer see late records for a particular time + * period. + */ +case class EventTimeWatermarkExec( +eventTime: Attribute, +delay: CalendarInterval, +child: SparkPlan) extends SparkPlan { + + // TODO: Use Spark SQL Metrics? + val maxEventTime = new MaxLong --- End diff -- @zsxwing am I doing this right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/15704 [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators ## What changes were proposed in this pull request? This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility. **Spark 1.6.2** ``` scala scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [result: string] scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") res1: org.apache.spark.sql.DataFrame = [result: string] ``` **Spark 2.0** ``` scala scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)") res0: org.apache.spark.sql.DataFrame = [] scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {')', ','}(line 1, pos 42) ``` After this PR, it's supported. ## How was this patch tested? Pass the Jenkins test with a newly added testcase. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-17732-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15704.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15704 commit 84f2315501b2f34d247e8750d1e01fff6ff9fb55 Author: Dongjoon HyunDate: 2016-10-31T00:46:49Z [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 It looks like all the unit tests passed, however one of the forked test java processes exited with nonzero status for some unknown reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15699: [SPARK-18030][Tests]Fix flaky FileStreamSourceSui...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15699 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67838/consoleFull)** for PR 15696 at commit [`040bbba`](https://github.com/apache/spark/commit/040bbba0bdbd647f963b7a61e18b69fd62565201). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67838/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] SparkContext.addFile doesn't w...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/15669 that's correct, it is due to `spark.files`, jira has been updated. Will update the PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67832/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67832/consoleFull)** for PR 15696 at commit [`6af14b5`](https://github.com/apache/spark/commit/6af14b56590a0882800f62a2a2b939ee3715edbb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MapReduceFileCommitterProtocol(path: String, isAppend: Boolean)` * `logInfo(s\"Using user defined output committer class $` * `logInfo(s\"Using output committer class $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15696: [SPARK-18024][SQL] Introduce an internal commit p...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15696#discussion_r85848357 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala --- @@ -141,15 +139,14 @@ class SimpleTextOutputWriter( } } -class AppendingTextOutputFormat(stagingDir: Path, fileNamePrefix: String) - extends TextOutputFormat[NullWritable, Text] { +class AppendingTextOutputFormat(path: String) extends TextOutputFormat[NullWritable, Text] { val numberFormat = NumberFormat.getInstance() numberFormat.setMinimumIntegerDigits(5) numberFormat.setGroupingUsed(false) override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { -new Path(stagingDir, fileNamePrefix) +new Path(path) --- End diff -- (This is now specified explicitly in OutputWriterFactory.getFileExtension) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67836/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15701 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15677: [SPARK-17963][SQL][Documentation] Add examples (extend) ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15677 Alright, then will try to get rid of the arguments part. Thank you all very much sincerely and I apologise the noise I caused. @gatorsmile I will keep in mind you comments too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/15705 [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] TABLE ... PARTITION for Datasource tables ## What changes were proposed in this pull request? There are a couple issues with the current 2.1 behavior when inserting into Datasource tables with partitions managed by Hive. (1) OVERWRITE TABLE ... PARTITION will actually overwrite the entire table instead of just the specified partition. (2) INSERT|OVERWRITE does not work with partitions that have custom locations. This PR fixes both of these issues for Datasource tables managed by Hive. The behavior for legacy tables or when `manageFilesourcePartitions = false` is unchanged. ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark sc-4942 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15705.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15705 commit dee4b9e107cc84f67640523fa277b31947661042 Author: Eric LiangDate: 2016-10-27T21:44:04Z Thu Oct 27 14:44:04 PDT 2016 commit d99afacfc80e731d17af4ae25c548e45d948cd11 Author: Eric Liang Date: 2016-10-28T21:20:28Z Fri Oct 28 14:20:28 PDT 2016 commit 137aeb295bc02e26755d7c18c89bd0117371676e Author: Eric Liang Date: 2016-10-31T19:16:44Z Merge branch 'master' into sc-4942 commit 8da8b0f7f70291cd86bd0a103758c5f88f714a27 Author: Eric Liang Date: 2016-10-31T21:29:58Z wip commit d5a7bd3931dc6f3f1673d9edff0f0aa77ad9d138 Author: Eric Liang Date: 2016-10-31T23:14:56Z implement custom locations commit a833a19d20f6b891cf42cf4a0f544c72abb982cd Author: Eric Liang Date: 2016-10-31T23:32:16Z Mon Oct 31 16:32:15 PDT 2016 commit 587b85e83f569ad8acae3de82c695b0fad4041db Author: Eric Liang Date: 2016-10-31T23:34:47Z Mon Oct 31 16:34:47 PDT 2016 commit 669e6cce48bfe903f473f14f20e1857ababc27be Author: Eric Liang Date: 2016-10-31T23:41:40Z Mon Oct 31 16:41:40 PDT 2016 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15705 cc @cloud-fan @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15701 **[Test build #67836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67836/consoleFull)** for PR 15701 at commit [`767fef8`](https://github.com/apache/spark/commit/767fef82f8dadd1962abe1d93b2c1bf5e926697d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15626 **[Test build #67829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67829/consoleFull)** for PR 15626 at commit [`7beedcc`](https://github.com/apache/spark/commit/7beedcc79f0664115bebda49b62d20ed8965b27f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67835/consoleFull)** for PR 15696 at commit [`6166093`](https://github.com/apache/spark/commit/6166093d511e833587d32e398338e2f47ccbcc8a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67838/consoleFull)** for PR 15696 at commit [`040bbba`](https://github.com/apache/spark/commit/040bbba0bdbd647f963b7a61e18b69fd62565201). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15702: [SPARK-18124] Observed-delay based Even Time Watermarks
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15702 **[Test build #67839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67839/consoleFull)** for PR 15702 at commit [`5b92132`](https://github.com/apache/spark/commit/5b921323092c5730f816795193a6e0d985d7e430). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Even Time Wate...
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15702 [SPARK-18124] Observed-delay based Even Time Watermarks This PR adds a new method `withWatermark` to the `Dataset` API, which can be used specify an _event time watermark_. An event time watermark allows the streaming engine to reason about the point in time after which we no longer expect to see late data. This PR also has augmented `StreamExecution` to use this watermark for several purposes: - To know when a given time window aggregation is finalized and thus results can be emitted when using output modes that do not allow updates (e.g. `Append` mode). - To minimize the amount of state that we need to keep for on-going aggregations, by evicting state for groups that are no longer expected to change. Note that we do still maintain all state if required (i.e. when in `Complete` mode). An example that emits windowed counts of records, waiting up to 5 minutes for late data to arrive. ```scala df.withWatermark($"eventTime", "5 mintues") .groupBy(window($"eventTime", "1 minute) as 'window) .count() .writeStream .format("console") .mode("append") // In append mode, we only output complete aggregations. .start() ``` ### Calculating the watermark. The current event time is computed by looking at the `MAX(eventTime)` seen this epoch across all of the partitions in the query minus some user defined _delayThreshold_. Note that since we must coordinate this value across partitions occasionally, the actual watermark used is only guaranteed to be at least `delay` behind the actual event time. In some cases we may still process records that arrive more than delay late. This mechanism was chosen for the initial implementation over processing time for two reasons: - it is robust to downtime that could affect processing delay - it does not require syncing of time or timezones across ### Other notable implementation details - A new trigger metric `eventTimeWatermark` outputs the current value of the watermark. - We mark the event time column in the `Attribute` metadata using the key `spark.watermarkDelay`. This allows downstream operations to know which column holds the event time. Operations like `window` propagate this metadata. - `explain()` marks the watermark with a suffix of `-T${delayMs}` to ease debugging of how this information is propagated. - Currently, we don't filter out late records, but instead rely on the state store to avoid emitting records that are both added and filtered in the same epoch. ### Remaining in this PR - [ ] The test for recovery is currently failing as we don't record the watermark used in the offset log. We will need to do so to ensure determinism, but this is deferred until #15626 is merged. ### Other follow-ups There are some natural additional features that we should consider for future work: - Ability to write records that arrive too late to some external store in case any out-of-band remediation is required. - `Update` mode so you can get partial results before a group is evicted. - Other mechanisms for calculating the watermark. In particular a watermark based on quantiles would be more robust to outliers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark watermarks Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15702.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15702 commit e6e3bbe9ca2d2081264b5bff68293572af7778a7 Author: Michael ArmbrustDate: 2016-10-28T04:11:19Z first test passing commit 92320720492f192fe6791d0fea90495ea5db94a7 Author: Michael Armbrust Date: 2016-10-28T07:55:57Z cleanup commit 5b921323092c5730f816795193a6e0d985d7e430 Author: Michael Armbrust Date: 2016-10-31T22:00:32Z Merge remote-tracking branch 'origin/master' into watermarks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15699: [SPARK-18030][Tests]Fix flaky FileStreamSourceSuite by n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15699 **[Test build #67830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67830/consoleFull)** for PR 15699 at commit [`64040ee`](https://github.com/apache/spark/commit/64040ee84097a21bc68b9c06c46974559e224930). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15694: [SPARK-18179][SQL] Throws analysis exception with...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15694#discussion_r85848749 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala --- @@ -31,6 +34,47 @@ class MiscFunctionsSuite extends QueryTest with SharedSQLContext { s"java_method('$className', 'method1', a, b)"), Row("m1one", "m1one")) } + + test("reflect and java_method throw an analysis exception for unsupported types") { +val df = Seq((new Timestamp(1), Decimal(10))).toDF("a", "b") +val className = ReflectClass.getClass.getName.stripSuffix("$") +val messageOne = intercept[AnalysisException] { + df.selectExpr( +s"reflect('$className', 'method1', a, b)").collect() +}.getMessage + +assert(messageOne.contains( + "arguments from the third require boolean, byte, short, " + +"integer, long, float, double or string expressions")) + +val messageTwo = intercept[AnalysisException] { + df.selectExpr( +s"java_method('$className', 'method1', a, b)").collect() +}.getMessage + +assert(messageTwo.contains( + "arguments from the third require boolean, byte, short, " + +"integer, long, float, double or string expressions")) + } + + test("reflect and java_method throw an analysis exception for non-existing method/class") { --- End diff -- Oh, yes, it is. Will clean up including the comment below. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15626 **[Test build #67845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67845/consoleFull)** for PR 15626 at commit [`8ee336d`](https://github.com/apache/spark/commit/8ee336d3be8d774af23fa32256f6bc5b63bcfd6f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15675: [SPARK-18144][SQL] logging StreamingQueryListener...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15675#discussion_r85827662 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala --- @@ -41,7 +41,9 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus) def post(event: StreamingQueryListener.Event) { event match { case s: QueryStartedEvent => -postToAll(s) +sparkListenerBus.postToAll(s) --- End diff -- This may break the existing spark listeners because they are not thread-safe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85829905 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { if (path == null) { logWarning("null specified as parameter to addJar") } else { var key = "" - if (path.contains("\\")) { + + val uri = if (path.contains("\\")) { // For local paths with backslashes on Windows, URI throws an exception -key = env.rpcEnv.fileServer.addJar(new File(path)) --- End diff -- this seems to be changing existing behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 Hey @jkbradley the checkParams method already exists in the Python side. It's defined in the tests.py DefaultValuesTests class and is being called by test_java_params. I'm removing the param testing from the Python Doctests now and will be implementing the Unit test in one of the classes for now. Once approved, I will then implement the Unit test in the remaining classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15538 In fact, if no one else is working on the Parquet upgrade it probably makes more sense for me to contribute that then continue working on this PR. I'll check with the dev mailing list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15688 If we try to truncate a partition that does not exist, we just do nothing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15688: [SPARK-18173][SQL] data source tables should supp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15688#discussion_r85834452 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1636,15 +1635,23 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } withTable("rectangles", "rectangles2") { + val data = (1 to 10).map { i => (i % 3, i % 5, i) }.toDF("width", "length", "height") + data.write.saveAsTable("rectangles") - data.write.partitionBy("length").saveAsTable("rectangles2") + data.write.partitionBy("width", "length").saveAsTable("rectangles2") // not supported since the table is not partitioned assertUnsupported("TRUNCATE TABLE rectangles PARTITION (width=1)") // supported since partitions are stored in the metastore + sql("TRUNCATE TABLE rectangles2 PARTITION (width=1, length=1)") --- End diff -- If we try to truncate a partition that does not exist, we just do nothing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/1 **[Test build #67837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67837/consoleFull)** for PR 1 at commit [`b397d04`](https://github.com/apache/spark/commit/b397d045a2e0540800d00ab8171d0a9e7594e45d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...
Github user eyalfa commented on the issue: https://github.com/apache/spark/pull/1 @hvanhovell, can you please relaunch the build? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15696 **[Test build #67841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67841/consoleFull)** for PR 15696 at commit [`2d7d373`](https://github.com/apache/spark/commit/2d7d373fe48d18037653c10424c8b1c978160958). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85829640 --- Diff: R/pkg/R/context.R --- @@ -290,6 +290,33 @@ spark.addFile <- function(path, recursive = FALSE) { invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive)) } + +#' Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. +#' +#' The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported --- End diff -- use `\code{path}` instead of \`path\` in R doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15700: [SPARK-17964][SparkR] Enable SparkR with Mesos client mo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15700 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...
Github user eyalfa commented on a diff in the pull request: https://github.com/apache/spark/pull/1#discussion_r85839282 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -137,7 +137,7 @@ object ColumnStatStruct { private def numTrues(e: Expression): Expression = Sum(If(e, one, zero)) private def numFalses(e: Expression): Expression = Sum(If(Not(e), one, zero)) - private def getStruct(exprs: Seq[Expression]): CreateStruct = { + private def getStruct(exprs: Seq[Expression]): CreateNamedStruct = { --- End diff -- @wzhfy, you seem to be the last committer. care to have a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15704 **[Test build #67843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67843/consoleFull)** for PR 15704 at commit [`84f2315`](https://github.com/apache/spark/commit/84f2315501b2f34d247e8750d1e01fff6ff9fb55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15675: [SPARK-18144][SQL] logging StreamingQueryListener$QueryS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15675 **[Test build #67833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67833/consoleFull)** for PR 15675 at commit [`8d2b3a6`](https://github.com/apache/spark/commit/8d2b3a63f8d14acb8bd33ce15a9b7a593b703f97). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15695 Merging in branch-2.0. Can you close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15697 windows failure seems unrelated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15688 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15608: [SPARK-17838][SparkR] Check named arguments for options ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15608 LGTM. I ran through a few other cases and I think the omitted names are handled properly with this. This should go to master then (handledCallJMethod is not in branch-2.0) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...
Github user mariusvniekerk commented on a diff in the pull request: https://github.com/apache/spark/pull/15666#discussion_r85833766 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends Logging { * Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. * The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported * filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node. + * If addToCurrentClassLoader is true, attempt to add the new class to the current threads' class + * loader. In general adding to the current threads' class loader will impact all other + * application threads unless they have explicitly changed their class loader. */ def addJar(path: String) { +addJar(path, false) + } + + def addJar(path: String, addToCurrentClassLoader: Boolean) { --- End diff -- Keeping it in the Scala makes it simpler for other spark Scala interpeters (eg toree, zeppelin) to make use of this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15659 **[Test build #67824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67824/consoleFull)** for PR 15659 at commit [`3bf961e`](https://github.com/apache/spark/commit/3bf961efbffc9b03eba7053348ac6ef1634d0ade). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15698 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67828/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 We have an AppVeyor build now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15701 @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15673 **[Test build #67834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67834/consoleFull)** for PR 15673 at commit [`4c438c8`](https://github.com/apache/spark/commit/4c438c8b2575880379e2a9a872fe07018cb62402). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/15703 Will add more details in the PR description soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15302 Hi, @hvanhovell . I made another attempt #15704 by using 'Expression' as you commented. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15675: [SPARK-18144][SQL] logging StreamingQueryListener$QueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15675 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15626 **[Test build #67831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67831/consoleFull)** for PR 15626 at commit [`d6fec94`](https://github.com/apache/spark/commit/d6fec9464e5a8638f0b9ac5dd1df289c30da132f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15697 retest it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15673 @ericl I've pushed a commit with the changes you recommended. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15701: [SPARK-18167] [SQL] Also log all partitions when ...
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/15701 [SPARK-18167] [SQL] Also log all partitions when the SQLQuerySuite test flakes ## What changes were proposed in this pull request? One possibility for this test flaking is that we have corrupted the partition schema somehow in the tests, which causes the cast to decimal to fail in the call. This should at least show us the actual partition values. ## How was this patch tested? Run it locally, it prints out something like `ArrayBuffer(test(partcol=0), test(partcol=1), test(partcol=2), test(partcol=3), test(partcol=4))`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark print-more-info Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15701.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15701 commit 767fef82f8dadd1962abe1d93b2c1bf5e926697d Author: Eric LiangDate: 2016-10-31T21:31:20Z Mon Oct 31 14:31:20 PDT 2016 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67835/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15626 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67829/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15696 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15626 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15659 @nchammas Ah, yes it is. It is a known problem. AppVeyor triggers the build when it has the changes in `R` directory but when it's merged (instead of rebased) for example, https://github.com/apache/spark/pull/15659/commits/3bf961efbffc9b03eba7053348ac6ef1634d0ade, it usually includes the chages in `R`. It seems generally okay though because the build will succeed in most of such cases. I asked this problem to AppVeyoer but I haven't got the response yet. We may consider disable this if AppVeyor is being a problem regularly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15696: [SPARK-18024][SQL] Introduce an internal commit p...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15696#discussion_r85848150 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala --- @@ -141,15 +139,14 @@ class SimpleTextOutputWriter( } } -class AppendingTextOutputFormat(stagingDir: Path, fileNamePrefix: String) - extends TextOutputFormat[NullWritable, Text] { +class AppendingTextOutputFormat(path: String) extends TextOutputFormat[NullWritable, Text] { val numberFormat = NumberFormat.getInstance() numberFormat.setMinimumIntegerDigits(5) numberFormat.setGroupingUsed(false) override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { -new Path(stagingDir, fileNamePrefix) +new Path(path) --- End diff -- Nope -- no more extension coming from Hadoop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15703 **[Test build #67842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67842/consoleFull)** for PR 15703 at commit [`c0029f1`](https://github.com/apache/spark/commit/c0029f1a529935c263f9c83691cf84921b343e67). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15677: [SPARK-17963][SQL][Documentation] Add examples (extend) ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15677 Okay, @gatorsmile could you maybe submit a small one first please? Will refer yours and then I can work together (we can divide the files or packages to deal with I guess). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15673 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67834/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15673 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15172#discussion_r85848802 --- Diff: common/network-common/src/main/java/org/apache/spark/network/sasl/aes/AesCipher.java --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.sasl.aes; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.ReadableByteChannel; +import java.nio.channels.WritableByteChannel; +import java.util.Properties; +import javax.crypto.spec.SecretKeySpec; +import javax.crypto.spec.IvParameterSpec; + +import com.google.common.base.Preconditions; +import com.google.common.base.Throwables; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import io.netty.channel.*; +import io.netty.util.AbstractReferenceCounted; +import org.apache.commons.crypto.cipher.CryptoCipherFactory; +import org.apache.commons.crypto.random.CryptoRandom; +import org.apache.commons.crypto.random.CryptoRandomFactory; +import org.apache.commons.crypto.stream.CryptoInputStream; +import org.apache.commons.crypto.stream.CryptoOutputStream; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.util.ByteArrayReadableChannel; +import org.apache.spark.network.util.ByteArrayWritableChannel; +import org.apache.spark.network.util.TransportConf; + +/** + * AES cipher for encryption and decryption. + */ +public class AesCipher { + private static final Logger logger = LoggerFactory.getLogger(AesCipher.class); + public static final String ENCRYPTION_HANDLER_NAME = "AesEncryption"; + public static final String DECRYPTION_HANDLER_NAME = "AesDecryption"; + + private final SecretKeySpec inKeySpec; + private final IvParameterSpec inIvSpec; + private final SecretKeySpec outKeySpec; + private final IvParameterSpec outIvSpec; + private Properties properties; + + public static final int STREAM_BUFFER_SIZE = 1024 * 32; + public static final String TRANSFORM = "AES/CTR/NoPadding"; + + public AesCipher( + Properties properties, + byte[] inKey, + byte[] outKey, + byte[] inIv, + byte[] outIv) throws IOException { +this.properties = properties; +inKeySpec = new SecretKeySpec(inKey, "AES"); +inIvSpec = new IvParameterSpec(inIv); +outKeySpec = new SecretKeySpec(outKey, "AES"); +outIvSpec = new IvParameterSpec(outIv); + } + + public AesCipher(AesConfigMessage configMessage) throws IOException { +this(new Properties(), configMessage.inKey, configMessage.outKey, + configMessage.inIv, configMessage.outIv); + } + + /** + * Create AES crypto output stream + * @param ch The underlying channel to write out. + * @return Return output crypto stream for encryption. + * @throws IOException + */ + public CryptoOutputStream CreateOutputStream(WritableByteChannel ch) throws IOException { +return new CryptoOutputStream(TRANSFORM, properties, ch, outKeySpec, outIvSpec); + } + + /** + * Create AES crypto input stream + * @param ch The underlying channel used to read data. + * @return Return input crypto stream for decryption. + * @throws IOException + */ + public CryptoInputStream CreateInputStream(ReadableByteChannel ch) throws IOException { +return new CryptoInputStream(TRANSFORM, properties, ch, inKeySpec, inIvSpec); + } + + /** + * Add handlers to channel + * @param ch the channel for adding handlers + * @throws IOException + */ + public void addToChannel(Channel ch) throws IOException { +ch.pipeline() + .addFirst(ENCRYPTION_HANDLER_NAME, new AesEncryptHandler(this)) + .addFirst(DECRYPTION_HANDLER_NAME, new AesDecryptHandler(this)); + } + + /** + * Create the configuration
[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15172#discussion_r85848940 --- Diff: common/network-common/src/main/java/org/apache/spark/network/sasl/aes/AesCipher.java --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.sasl.aes; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.ReadableByteChannel; +import java.nio.channels.WritableByteChannel; +import java.util.Properties; +import javax.crypto.spec.SecretKeySpec; +import javax.crypto.spec.IvParameterSpec; + +import com.google.common.base.Preconditions; +import com.google.common.base.Throwables; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import io.netty.channel.*; +import io.netty.util.AbstractReferenceCounted; +import org.apache.commons.crypto.cipher.CryptoCipherFactory; +import org.apache.commons.crypto.random.CryptoRandom; +import org.apache.commons.crypto.random.CryptoRandomFactory; +import org.apache.commons.crypto.stream.CryptoInputStream; +import org.apache.commons.crypto.stream.CryptoOutputStream; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.util.ByteArrayReadableChannel; +import org.apache.spark.network.util.ByteArrayWritableChannel; +import org.apache.spark.network.util.TransportConf; + +/** + * AES cipher for encryption and decryption. + */ +public class AesCipher { + private static final Logger logger = LoggerFactory.getLogger(AesCipher.class); + public static final String ENCRYPTION_HANDLER_NAME = "AesEncryption"; + public static final String DECRYPTION_HANDLER_NAME = "AesDecryption"; + + private final SecretKeySpec inKeySpec; + private final IvParameterSpec inIvSpec; + private final SecretKeySpec outKeySpec; + private final IvParameterSpec outIvSpec; + private Properties properties; + + public static final int STREAM_BUFFER_SIZE = 1024 * 32; + public static final String TRANSFORM = "AES/CTR/NoPadding"; + + public AesCipher( + Properties properties, + byte[] inKey, + byte[] outKey, + byte[] inIv, + byte[] outIv) throws IOException { +this.properties = properties; +inKeySpec = new SecretKeySpec(inKey, "AES"); +inIvSpec = new IvParameterSpec(inIv); +outKeySpec = new SecretKeySpec(outKey, "AES"); +outIvSpec = new IvParameterSpec(outIv); + } + + public AesCipher(AesConfigMessage configMessage) throws IOException { +this(new Properties(), configMessage.inKey, configMessage.outKey, + configMessage.inIv, configMessage.outIv); + } + + /** + * Create AES crypto output stream + * @param ch The underlying channel to write out. + * @return Return output crypto stream for encryption. + * @throws IOException + */ + public CryptoOutputStream CreateOutputStream(WritableByteChannel ch) throws IOException { +return new CryptoOutputStream(TRANSFORM, properties, ch, outKeySpec, outIvSpec); + } + + /** + * Create AES crypto input stream + * @param ch The underlying channel used to read data. + * @return Return input crypto stream for decryption. + * @throws IOException + */ + public CryptoInputStream CreateInputStream(ReadableByteChannel ch) throws IOException { +return new CryptoInputStream(TRANSFORM, properties, ch, inKeySpec, inIvSpec); + } + + /** + * Add handlers to channel + * @param ch the channel for adding handlers + * @throws IOException + */ + public void addToChannel(Channel ch) throws IOException { +ch.pipeline() + .addFirst(ENCRYPTION_HANDLER_NAME, new AesEncryptHandler(this)) + .addFirst(DECRYPTION_HANDLER_NAME, new AesDecryptHandler(this)); + } + + /** + * Create the configuration
[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15172#discussion_r85847613 --- Diff: common/network-common/src/main/java/org/apache/spark/network/sasl/aes/AesCipher.java --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.sasl.aes; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.ReadableByteChannel; +import java.nio.channels.WritableByteChannel; +import java.util.Properties; +import javax.crypto.spec.SecretKeySpec; +import javax.crypto.spec.IvParameterSpec; + +import com.google.common.base.Preconditions; +import com.google.common.base.Throwables; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import io.netty.channel.*; +import io.netty.util.AbstractReferenceCounted; +import org.apache.commons.crypto.cipher.CryptoCipherFactory; +import org.apache.commons.crypto.random.CryptoRandom; +import org.apache.commons.crypto.random.CryptoRandomFactory; +import org.apache.commons.crypto.stream.CryptoInputStream; +import org.apache.commons.crypto.stream.CryptoOutputStream; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.util.ByteArrayReadableChannel; +import org.apache.spark.network.util.ByteArrayWritableChannel; +import org.apache.spark.network.util.TransportConf; + +/** + * AES cipher for encryption and decryption. + */ +public class AesCipher { + private static final Logger logger = LoggerFactory.getLogger(AesCipher.class); + public static final String ENCRYPTION_HANDLER_NAME = "AesEncryption"; + public static final String DECRYPTION_HANDLER_NAME = "AesDecryption"; + + private final SecretKeySpec inKeySpec; + private final IvParameterSpec inIvSpec; + private final SecretKeySpec outKeySpec; + private final IvParameterSpec outIvSpec; + private Properties properties; + + public static final int STREAM_BUFFER_SIZE = 1024 * 32; + public static final String TRANSFORM = "AES/CTR/NoPadding"; + + public AesCipher( + Properties properties, + byte[] inKey, + byte[] outKey, + byte[] inIv, + byte[] outIv) throws IOException { +this.properties = properties; +inKeySpec = new SecretKeySpec(inKey, "AES"); +inIvSpec = new IvParameterSpec(inIv); +outKeySpec = new SecretKeySpec(outKey, "AES"); +outIvSpec = new IvParameterSpec(outIv); + } + + public AesCipher(AesConfigMessage configMessage) throws IOException { +this(new Properties(), configMessage.inKey, configMessage.outKey, + configMessage.inIv, configMessage.outIv); + } + + /** + * Create AES crypto output stream + * @param ch The underlying channel to write out. + * @return Return output crypto stream for encryption. + * @throws IOException + */ + public CryptoOutputStream CreateOutputStream(WritableByteChannel ch) throws IOException { +return new CryptoOutputStream(TRANSFORM, properties, ch, outKeySpec, outIvSpec); + } + + /** + * Create AES crypto input stream + * @param ch The underlying channel used to read data. + * @return Return input crypto stream for decryption. + * @throws IOException + */ + public CryptoInputStream CreateInputStream(ReadableByteChannel ch) throws IOException { --- End diff -- nit: lower case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15172#discussion_r85847089 --- Diff: common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java --- @@ -80,46 +84,71 @@ public void receive(TransportClient client, ByteBuffer message, RpcResponseCallb delegate.receive(client, message, callback); return; } +if (saslServer == null || !saslServer.isComplete()) { + ByteBuf nettyBuf = Unpooled.wrappedBuffer(message); + SaslMessage saslMessage; + try { +saslMessage = SaslMessage.decode(nettyBuf); + } finally { +nettyBuf.release(); + } -ByteBuf nettyBuf = Unpooled.wrappedBuffer(message); -SaslMessage saslMessage; -try { - saslMessage = SaslMessage.decode(nettyBuf); -} finally { - nettyBuf.release(); -} - -if (saslServer == null) { - // First message in the handshake, setup the necessary state. - client.setClientId(saslMessage.appId); - saslServer = new SparkSaslServer(saslMessage.appId, secretKeyHolder, -conf.saslServerAlwaysEncrypt()); -} + if (saslServer == null) { +// First message in the handshake, setup the necessary state. +client.setClientId(saslMessage.appId); +saslServer = new SparkSaslServer(saslMessage.appId, secretKeyHolder, + conf.saslServerAlwaysEncrypt()); + } -byte[] response; -try { - response = saslServer.response(JavaUtils.bufferToArray( -saslMessage.body().nioByteBuffer())); -} catch (IOException ioe) { - throw new RuntimeException(ioe); + byte[] response; + try { +response = saslServer.response(JavaUtils.bufferToArray( + saslMessage.body().nioByteBuffer())); + } catch (IOException ioe) { +throw new RuntimeException(ioe); + } + callback.onSuccess(ByteBuffer.wrap(response)); } -callback.onSuccess(ByteBuffer.wrap(response)); // Setup encryption after the SASL response is sent, otherwise the client can't parse the // response. It's ok to change the channel pipeline here since we are processing an incoming // message, so the pipeline is busy and no new incoming messages will be fed to it before this // method returns. This assumes that the code ensures, through other means, that no outbound // messages are being written to the channel while negotiation is still going on. if (saslServer.isComplete()) { - logger.debug("SASL authentication successful for channel {}", client); - isComplete = true; - if (SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP))) { + if (!SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP))) { +logger.debug("SASL authentication successful for channel {}", client); +complete(true); +return ; + } + + if (!conf.AesEncryptionEnabled()) { logger.debug("Enabling encryption for channel {}", client); SaslEncryption.addToChannel(channel, saslServer, conf.maxSaslEncryptedBlockSize()); -saslServer = null; - } else { -saslServer.dispose(); -saslServer = null; +complete(false); +return; + } + + // Extra negotiation should happen after authentication, so return directly while + // processing authenticate. + if (!isAuthenticated) { +logger.debug("SASL authentication successful for channel {}", client); +isAuthenticated = true; +return; + } + + // Create AES cipher when it is authenticated + try { +AesConfigMessage configMessage = AesConfigMessage.decodeMessage(message); +AesCipher cipher = new AesCipher(configMessage); + +// Send response back to client to confirm that server accept config. +callback.onSuccess(JavaUtils.stringToBytes(AesCipher.TRANSFORM)); --- End diff -- This should really be the last statement (after `complete(true)`). It's a little weird to use the name of the transformation as the success message here... I'd say a reply is not even needed, but it's nice for the client to know that this particular message succeeded or not. So it's ok to use this, but I'd prefer if the client ignored the contents of the reply and instead just handled exceptions for the error case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and
[GitHub] spark issue #15699: [SPARK-18030][Tests]Fix flaky FileStreamSourceSuite by n...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15699 Thanks! Merging to master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15172#discussion_r85850906 --- Diff: common/network-common/src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java --- @@ -374,6 +375,69 @@ public void testDelegates() throws Exception { } } + @Test + public void testSaslEncryptionAes() throws Exception { +final AtomicReference response = new AtomicReference<>(); +final File file = File.createTempFile("sasltest", ".txt"); +SaslTestCtx ctx = null; +try { + final TransportConf conf = new TransportConf("rpc", new SystemPropertyConfigProvider()); + final TransportConf spyConf = spy(conf); + doReturn(true).when(spyConf).AesEncryptionEnabled(); + + StreamManager sm = mock(StreamManager.class); + when(sm.getChunk(anyLong(), anyInt())).thenAnswer(new Answer() { +@Override +public ManagedBuffer answer(InvocationOnMock invocation) { + return new FileSegmentManagedBuffer(spyConf, file, 0, file.length()); +} + }); + + RpcHandler rpcHandler = mock(RpcHandler.class); + when(rpcHandler.getStreamManager()).thenReturn(sm); + + byte[] data = new byte[ 256 * 1024 * 1024]; --- End diff -- nit: no space after `[` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15172#discussion_r85846395 --- Diff: common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java --- @@ -80,46 +84,71 @@ public void receive(TransportClient client, ByteBuffer message, RpcResponseCallb delegate.receive(client, message, callback); return; } +if (saslServer == null || !saslServer.isComplete()) { + ByteBuf nettyBuf = Unpooled.wrappedBuffer(message); + SaslMessage saslMessage; + try { +saslMessage = SaslMessage.decode(nettyBuf); + } finally { +nettyBuf.release(); + } -ByteBuf nettyBuf = Unpooled.wrappedBuffer(message); -SaslMessage saslMessage; -try { - saslMessage = SaslMessage.decode(nettyBuf); -} finally { - nettyBuf.release(); -} - -if (saslServer == null) { - // First message in the handshake, setup the necessary state. - client.setClientId(saslMessage.appId); - saslServer = new SparkSaslServer(saslMessage.appId, secretKeyHolder, -conf.saslServerAlwaysEncrypt()); -} + if (saslServer == null) { +// First message in the handshake, setup the necessary state. +client.setClientId(saslMessage.appId); +saslServer = new SparkSaslServer(saslMessage.appId, secretKeyHolder, + conf.saslServerAlwaysEncrypt()); + } -byte[] response; -try { - response = saslServer.response(JavaUtils.bufferToArray( -saslMessage.body().nioByteBuffer())); -} catch (IOException ioe) { - throw new RuntimeException(ioe); + byte[] response; + try { +response = saslServer.response(JavaUtils.bufferToArray( + saslMessage.body().nioByteBuffer())); + } catch (IOException ioe) { +throw new RuntimeException(ioe); + } + callback.onSuccess(ByteBuffer.wrap(response)); } -callback.onSuccess(ByteBuffer.wrap(response)); // Setup encryption after the SASL response is sent, otherwise the client can't parse the // response. It's ok to change the channel pipeline here since we are processing an incoming // message, so the pipeline is busy and no new incoming messages will be fed to it before this // method returns. This assumes that the code ensures, through other means, that no outbound // messages are being written to the channel while negotiation is still going on. if (saslServer.isComplete()) { - logger.debug("SASL authentication successful for channel {}", client); - isComplete = true; - if (SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP))) { + if (!SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP))) { +logger.debug("SASL authentication successful for channel {}", client); +complete(true); +return ; --- End diff -- nit: no space before `;` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/15172#discussion_r85849868 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/ByteArrayReadableChannel.java --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.util; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.ReadableByteChannel; + +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + --- End diff -- nit: too many empty lines --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org