date:20161031

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15695
  
**[Test build #3380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3380/consoleFull)**
 for PR 15695 at commit 
[`0d4461a`](https://github.com/apache/spark/commit/0d4461a9e444008a35cc04c607447dc3d4677b7f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67832 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67832/consoleFull)**
 for PR 15696 at commit 
[`6af14b5`](https://github.com/apache/spark/commit/6af14b56590a0882800f62a2a2b939ee3715edbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15675: [SPARK-18144][SQL] logging StreamingQueryListener$QueryS...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15675
  
**[Test build #67833 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67833/consoleFull)**
 for PR 15675 at commit 
[`8d2b3a6`](https://github.com/apache/spark/commit/8d2b3a63f8d14acb8bd33ce15a9b7a593b703f97).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15673
  
**[Test build #67834 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67834/consoleFull)**
 for PR 15673 at commit 
[`4c438c8`](https://github.com/apache/spark/commit/4c438c8b2575880379e2a9a872fe07018cb62402).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...

2016-10-31 Thread jodersky

Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/15398
  
@hvanhovell, I just wanted to add that option 2 from my last comment won't 
work: if we replace `\\` with `` in the unescape function (as is done for 
`\%` and `\_`), it will be impossible for a user to specify a single backslash.

Considering that, another potential solution would be change the ANTLR 
parser to handle escaping differently in LIKE expressions. This would however 
introduce a special case which complicates the overall rules.
Maybe leaving the current behaviour but documenting that backslashes are 
escaped on the parser-level is the best option?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...

2016-10-31 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15701
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15696: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/15696#discussion_r85847969
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 
---
@@ -141,15 +139,14 @@ class SimpleTextOutputWriter(
   }
 }
 
-class AppendingTextOutputFormat(stagingDir: Path, fileNamePrefix: String)
-  extends TextOutputFormat[NullWritable, Text] {
+class AppendingTextOutputFormat(path: String) extends 
TextOutputFormat[NullWritable, Text] {
 
   val numberFormat = NumberFormat.getInstance()
   numberFormat.setMinimumIntegerDigits(5)
   numberFormat.setGroupingUsed(false)
 
   override def getDefaultWorkFile(context: TaskAttemptContext, extension: 
String): Path = {
-new Path(stagingDir, fileNamePrefix)
+new Path(path)
--- End diff --

`+ extension`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15695
  
**[Test build #3380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3380/consoleFull)**
 for PR 15695 at commit 
[`0d4461a`](https://github.com/apache/spark/commit/0d4461a9e444008a35cc04c607447dc3d4677b7f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67840/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15703
  
**[Test build #67844 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67844/consoleFull)**
 for PR 15703 at commit 
[`5a23a97`](https://github.com/apache/spark/commit/5a23a979c5e6a61f847b146a1cb656418054d955).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15701: [SPARK-18167] [SQL] Also log all partitions when ...

2016-10-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15701


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-31 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15302
  
@dongjoon-hyun I'll take a look tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67835 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67835/consoleFull)**
 for PR 15696 at commit 
[`6166093`](https://github.com/apache/spark/commit/6166093d511e833587d32e398338e2f47ccbcc8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15675: [SPARK-18144][SQL] logging StreamingQueryListener...

2016-10-31 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/15675#discussion_r85830450
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
 ---
@@ -41,6 +41,8 @@ class StreamingQueryListenerBus(sparkListenerBus: 
LiveListenerBus)
   def post(event: StreamingQueryListener.Event) {
 event match {
   case s: QueryStartedEvent =>
+sparkListenerBus.post(s)
--- End diff --

Since this is hacky, it's better to have some tests to make sure we won't 
break things in future when removing the hacky codes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11105
  
**[Test build #67825 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67825/consoleFull)**
 for PR 11105 at commit 
[`8c560ca`](https://github.com/apache/spark/commit/8c560ca6dd8c28f86630ae42bb50739a8614bec3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67825/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15700: [SPARK-17964][SparkR] Enable SparkR with Mesos cl...

2016-10-31 Thread susanxhuynh

GitHub user susanxhuynh opened a pull request:

https://github.com/apache/spark/pull/15700

[SPARK-17964][SparkR] Enable SparkR with Mesos client mode and cluster mode

## What changes were proposed in this pull request?

Enabled SparkR with Mesos client mode and cluster mode. Just a few changes 
were required to get this working on Mesos: (1) removed the SparkR on Mesos 
error checks and (2) do not require "--class" to be specified for R apps. The 
logical to check spark.mesos.executor.home was already in there.

@sun-rui 

## How was this patch tested?

1. SparkSubmitSuite
2. On local mesos cluster (on laptop): ran SparkR shell, spark-submit 
client mode, and spark-submit cluster mode, with the 
"examples/src/main/Rdataframe.R" example application.
3. On multi-node mesos cluster: ran SparkR shell, spark-submit client mode, 
and spark-submit cluster mode, with the "examples/src/main/Rdataframe.R" 
example application. I tested with the following --conf values set: 
spark.mesos.executor.docker.image and spark.mesos.executor.home



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mesosphere/spark susan-r-branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15700.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15700


commit 4a8b07464fcb6e4b082e79ff9dd4bc3e41d43bbc
Author: Susan X. Huynh 
Date:   2016-10-31T18:39:04Z

Enabled SparkR on Mesos. RPackages from R source is not yet supported.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15659
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15659
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67824/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15698
  
**[Test build #67828 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67828/consoleFull)**
 for PR 15698 at commit 
[`b777ee5`](https://github.com/apache/spark/commit/b777ee5bb25f38086bfe2126be26de8f1e14a14d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15698
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] SparkContext.addFile doesn't w...

2016-10-31 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/15669
  
If I understand the issue here, the problem is not `--files` but setting 
`spark.files` through `--conf` (or the config file). The former is translated 
into `spark.yarn.dist.files` by `SparkSubmit` but it seems the latter isn't, 
and to me that's the bug. So basically what Tom and Mridul have said.

I think the correct fix would be in either `SparkSubmit.scala` or YARN's 
`Client.scala` - probably the latter to keep the YARN-specific stuff in one 
place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Even Time Wate...

2016-10-31 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/15702#discussion_r85845880
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala
 ---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
UnsafeProjection}
+import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.types.MetadataBuilder
+import org.apache.spark.unsafe.types.CalendarInterval
+import org.apache.spark.util.AccumulatorV2
+
+class MaxLong(protected var currentValue: Long = 0)
+  extends AccumulatorV2[Long, Long]
+  with Serializable {
+
+  override def isZero: Boolean = value == 0
+  override def value: Long = currentValue
+  override def copy(): AccumulatorV2[Long, Long] = new 
MaxLong(currentValue)
+
+  override def reset(): Unit = {
+currentValue = 0
+  }
+
+  override def add(v: Long): Unit = {
+if (value < v) { currentValue = v }
+  }
+
+  override def merge(other: AccumulatorV2[Long, Long]): Unit = {
+if (currentValue < other.value) {
+  currentValue = other.value
+}
+  }
+}
+
+/**
+ * Used to mark a column as the containing the event time for a given 
record. In addition to
+ * adding appropriate metadata to this column, this operator also tracks 
the maximum observed event
+ * time. Based on the maximum observed time and a user specified delay, we 
can calculate the
+ * `watermark` after which we assume we will no longer see late records 
for a particular time
+ * period.
+ */
+case class EventTimeWatermarkExec(
+eventTime: Attribute,
+delay: CalendarInterval,
+child: SparkPlan) extends SparkPlan {
+
+  // TODO: Use Spark SQL Metrics?
+  val maxEventTime = new MaxLong
--- End diff --

@zsxwing am I doing this right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...

2016-10-31 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/15704

[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators

## What changes were proposed in this pull request?

This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in 
Apache Spark 2.0 for backward compatibility.

**Spark 1.6.2**

``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, 
quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]
```

**Spark 2.0**

``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, 
quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)
```

After this PR, it's supported.

## How was this patch tested?

Pass the Jenkins test with a newly added testcase.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-17732-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15704


commit 84f2315501b2f34d247e8750d1e01fff6ff9fb55
Author: Dongjoon Hyun 
Date:   2016-10-31T00:46:49Z

[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/15673
  
It looks like all the unit tests passed, however one of the forked test 
java processes exited with nonzero status for some unknown reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15699: [SPARK-18030][Tests]Fix flaky FileStreamSourceSui...

2016-10-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15699


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67838 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67838/consoleFull)**
 for PR 15696 at commit 
[`040bbba`](https://github.com/apache/spark/commit/040bbba0bdbd647f963b7a61e18b69fd62565201).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67838/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] SparkContext.addFile doesn't w...

2016-10-31 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/15669
  
that's correct, it is due to `spark.files`, jira has been updated.  Will 
update the PR soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67832/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67832 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67832/consoleFull)**
 for PR 15696 at commit 
[`6af14b5`](https://github.com/apache/spark/commit/6af14b56590a0882800f62a2a2b939ee3715edbb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MapReduceFileCommitterProtocol(path: String, isAppend: Boolean)`
  * `logInfo(s\"Using user defined output committer class $`
  * `logInfo(s\"Using output committer class $`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15696: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15696#discussion_r85848357
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 
---
@@ -141,15 +139,14 @@ class SimpleTextOutputWriter(
   }
 }
 
-class AppendingTextOutputFormat(stagingDir: Path, fileNamePrefix: String)
-  extends TextOutputFormat[NullWritable, Text] {
+class AppendingTextOutputFormat(path: String) extends 
TextOutputFormat[NullWritable, Text] {
 
   val numberFormat = NumberFormat.getInstance()
   numberFormat.setMinimumIntegerDigits(5)
   numberFormat.setGroupingUsed(false)
 
   override def getDefaultWorkFile(context: TaskAttemptContext, extension: 
String): Path = {
-new Path(stagingDir, fileNamePrefix)
+new Path(path)
--- End diff --

(This is now specified explicitly in OutputWriterFactory.getFileExtension)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15701
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67836/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15701
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15677: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15677
  
Alright, then will try to get rid of the arguments part. Thank you all very 
much sincerely and I apologise the noise I caused. @gatorsmile I will keep in 
mind you comments too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/1
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...

2016-10-31 Thread ericl

GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/15705

[SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] TABLE ... PARTITION 
for Datasource tables

## What changes were proposed in this pull request?

There are a couple issues with the current 2.1 behavior when inserting into 
Datasource tables with partitions managed by Hive.

(1) OVERWRITE TABLE ... PARTITION will actually overwrite the entire table 
instead of just the specified partition.
(2) INSERT|OVERWRITE does not work with partitions that have custom 
locations.

This PR fixes both of these issues for Datasource tables managed by Hive. 
The behavior for legacy tables or when `manageFilesourcePartitions = false` is 
unchanged.

## How was this patch tested?

Unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark sc-4942

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15705


commit dee4b9e107cc84f67640523fa277b31947661042
Author: Eric Liang 
Date:   2016-10-27T21:44:04Z

Thu Oct 27 14:44:04 PDT 2016

commit d99afacfc80e731d17af4ae25c548e45d948cd11
Author: Eric Liang 
Date:   2016-10-28T21:20:28Z

Fri Oct 28 14:20:28 PDT 2016

commit 137aeb295bc02e26755d7c18c89bd0117371676e
Author: Eric Liang 
Date:   2016-10-31T19:16:44Z

Merge branch 'master' into sc-4942

commit 8da8b0f7f70291cd86bd0a103758c5f88f714a27
Author: Eric Liang 
Date:   2016-10-31T21:29:58Z

wip

commit d5a7bd3931dc6f3f1673d9edff0f0aa77ad9d138
Author: Eric Liang 
Date:   2016-10-31T23:14:56Z

implement custom locations

commit a833a19d20f6b891cf42cf4a0f544c72abb982cd
Author: Eric Liang 
Date:   2016-10-31T23:32:16Z

Mon Oct 31 16:32:15 PDT 2016

commit 587b85e83f569ad8acae3de82c695b0fad4041db
Author: Eric Liang 
Date:   2016-10-31T23:34:47Z

Mon Oct 31 16:34:47 PDT 2016

commit 669e6cce48bfe903f473f14f20e1857ababc27be
Author: Eric Liang 
Date:   2016-10-31T23:41:40Z

Mon Oct 31 16:41:40 PDT 2016




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15705
  
cc @cloud-fan @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15701
  
**[Test build #67836 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67836/consoleFull)**
 for PR 15701 at commit 
[`767fef8`](https://github.com/apache/spark/commit/767fef82f8dadd1962abe1d93b2c1bf5e926697d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15626
  
**[Test build #67829 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67829/consoleFull)**
 for PR 15626 at commit 
[`7beedcc`](https://github.com/apache/spark/commit/7beedcc79f0664115bebda49b62d20ed8965b27f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67835 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67835/consoleFull)**
 for PR 15696 at commit 
[`6166093`](https://github.com/apache/spark/commit/6166093d511e833587d32e398338e2f47ccbcc8a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67838 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67838/consoleFull)**
 for PR 15696 at commit 
[`040bbba`](https://github.com/apache/spark/commit/040bbba0bdbd647f963b7a61e18b69fd62565201).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed-delay based Even Time Watermarks

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15702
  
**[Test build #67839 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67839/consoleFull)**
 for PR 15702 at commit 
[`5b92132`](https://github.com/apache/spark/commit/5b921323092c5730f816795193a6e0d985d7e430).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15702: [SPARK-18124] Observed-delay based Even Time Wate...

2016-10-31 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/15702

[SPARK-18124] Observed-delay based Even Time Watermarks

This PR adds a new method `withWatermark` to the `Dataset` API, which can 
be used specify an _event time watermark_.  An event time watermark allows the 
streaming engine to reason about the point in time after which we no longer 
expect to see late data.  This PR also has augmented `StreamExecution` to use 
this watermark for several purposes:
  - To know when a given time window aggregation is finalized and thus 
results can be emitted when using output modes that do not allow updates (e.g. 
`Append` mode).
  - To minimize the amount of state that we need to keep for on-going 
aggregations, by evicting state for groups that are no longer expected to 
change.  Note that we do still maintain all state if required (i.e. when in 
`Complete` mode).

An example that emits windowed counts of records, waiting up to 5 minutes 
for late data to arrive.
```scala
df.withWatermark($"eventTime", "5 mintues")
  .groupBy(window($"eventTime", "1 minute) as 'window)
  .count()
  .writeStream
  .format("console")
  .mode("append") // In append mode, we only output complete aggregations.
  .start()
```

### Calculating the watermark.
The current event time is computed by looking at the `MAX(eventTime)` seen 
this epoch across all of the partitions in the query minus some user defined 
_delayThreshold_.  Note that since we must coordinate this value across 
partitions occasionally, the actual watermark used is only guaranteed to be at 
least `delay` behind the actual event time.  In some cases we may still process 
records that arrive more than delay late.

This mechanism was chosen for the initial implementation over processing 
time for two reasons:   
  - it is robust to downtime that could affect processing delay
  - it does not require syncing of time or timezones across

### Other notable implementation details
 - A new trigger metric `eventTimeWatermark` outputs the current value of 
the watermark.
 - We mark the event time column in the `Attribute` metadata using the key 
`spark.watermarkDelay`.  This allows downstream operations to know which column 
holds the event time.  Operations like `window` propagate this metadata.
 - `explain()` marks the watermark with a suffix of `-T${delayMs}` to ease 
debugging of how this information is propagated.
 - Currently, we don't filter out late records, but instead rely on the 
state store to avoid emitting records that are both added and filtered in the 
same epoch.

### Remaining in this PR
 - [ ] The test for recovery is currently failing as we don't record the 
watermark used in the offset log.  We will need to do so to ensure determinism, 
but this is deferred until #15626 is merged.

### Other follow-ups
There are some natural additional features that we should consider for 
future work:
 - Ability to write records that arrive too late to some external store in 
case any out-of-band remediation is required.
 - `Update` mode so you can get partial results before a group is evicted.
 - Other mechanisms for calculating the watermark.  In particular a 
watermark based on quantiles would be more robust to outliers.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark watermarks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15702


commit e6e3bbe9ca2d2081264b5bff68293572af7778a7
Author: Michael Armbrust 
Date:   2016-10-28T04:11:19Z

first test passing

commit 92320720492f192fe6791d0fea90495ea5db94a7
Author: Michael Armbrust 
Date:   2016-10-28T07:55:57Z

cleanup

commit 5b921323092c5730f816795193a6e0d985d7e430
Author: Michael Armbrust 
Date:   2016-10-31T22:00:32Z

Merge remote-tracking branch 'origin/master' into watermarks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15699: [SPARK-18030][Tests]Fix flaky FileStreamSourceSuite by n...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15699
  
**[Test build #67830 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67830/consoleFull)**
 for PR 15699 at commit 
[`64040ee`](https://github.com/apache/spark/commit/64040ee84097a21bc68b9c06c46974559e224930).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15694: [SPARK-18179][SQL] Throws analysis exception with...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15694#discussion_r85848749
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala ---
@@ -31,6 +34,47 @@ class MiscFunctionsSuite extends QueryTest with 
SharedSQLContext {
 s"java_method('$className', 'method1', a, b)"),
   Row("m1one", "m1one"))
   }
+
+  test("reflect and java_method throw an analysis exception for 
unsupported types") {
+val df = Seq((new Timestamp(1), Decimal(10))).toDF("a", "b")
+val className = ReflectClass.getClass.getName.stripSuffix("$")
+val messageOne = intercept[AnalysisException] {
+  df.selectExpr(
+s"reflect('$className', 'method1', a, b)").collect()
+}.getMessage
+
+assert(messageOne.contains(
+  "arguments from the third require boolean, byte, short, " +
+"integer, long, float, double or string expressions"))
+
+val messageTwo = intercept[AnalysisException] {
+  df.selectExpr(
+s"java_method('$className', 'method1', a, b)").collect()
+}.getMessage
+
+assert(messageTwo.contains(
+  "arguments from the third require boolean, byte, short, " +
+"integer, long, float, double or string expressions"))
+  }
+
+  test("reflect and java_method throw an analysis exception for 
non-existing method/class") {
--- End diff --

Oh, yes, it is. Will clean up including the comment below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15626
  
**[Test build #67845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67845/consoleFull)**
 for PR 15626 at commit 
[`8ee336d`](https://github.com/apache/spark/commit/8ee336d3be8d774af23fa32256f6bc5b63bcfd6f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15675: [SPARK-18144][SQL] logging StreamingQueryListener...

2016-10-31 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/15675#discussion_r85827662
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
 ---
@@ -41,7 +41,9 @@ class StreamingQueryListenerBus(sparkListenerBus: 
LiveListenerBus)
   def post(event: StreamingQueryListener.Event) {
 event match {
   case s: QueryStartedEvent =>
-postToAll(s)
+sparkListenerBus.postToAll(s)
--- End diff --

This may break the existing spark listeners because they are not 
thread-safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85829905
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
 if (path == null) {
   logWarning("null specified as parameter to addJar")
 } else {
   var key = ""
-  if (path.contains("\\")) {
+
+  val uri = if (path.contains("\\")) {
 // For local paths with backslashes on Windows, URI throws an 
exception
-key = env.rpcEnv.fileServer.addJar(new File(path))
--- End diff --

this seems to be changing existing behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...

2016-10-31 Thread evanyc15

Github user evanyc15 commented on the issue:

https://github.com/apache/spark/pull/14653
  
Hey @jkbradley the checkParams method already exists in the Python side. 
It's defined in the tests.py DefaultValuesTests class and is being called by 
test_java_params. I'm removing the param testing from the Python Doctests now 
and will be implementing the Unit test in one of the classes for now. Once 
approved, I will then implement the Unit test in the remaining classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15538: [SPARK-17993][SQL] Fix Parquet log output redirection

2016-10-31 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/15538
  
In fact, if no one else is working on the Parquet upgrade it probably makes 
more sense for me to contribute that then continue working on this PR. I'll 
check with the dev mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...

2016-10-31 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15688
  
If we try to truncate a partition that does not exist, we just do nothing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15688: [SPARK-18173][SQL] data source tables should supp...

2016-10-31 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15688#discussion_r85834452
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1636,15 +1635,23 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
 
 withTable("rectangles", "rectangles2") {
+  val data = (1 to 10).map { i => (i % 3, i % 5, i) }.toDF("width", 
"length", "height")
+
   data.write.saveAsTable("rectangles")
-  data.write.partitionBy("length").saveAsTable("rectangles2")
+  data.write.partitionBy("width", "length").saveAsTable("rectangles2")
 
   // not supported since the table is not partitioned
   assertUnsupported("TRUNCATE TABLE rectangles PARTITION (width=1)")
 
   // supported since partitions are stored in the metastore
+  sql("TRUNCATE TABLE rectangles2 PARTITION (width=1, length=1)")
--- End diff --

If we try to truncate a partition that does not exist, we just do nothing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/1
  
**[Test build #67837 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67837/consoleFull)**
 for PR 1 at commit 
[`b397d04`](https://github.com/apache/spark/commit/b397d045a2e0540800d00ab8171d0a9e7594e45d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread eyalfa

Github user eyalfa commented on the issue:

https://github.com/apache/spark/pull/1
  
@hvanhovell, can you please relaunch the build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15696
  
**[Test build #67841 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67841/consoleFull)**
 for PR 15696 at commit 
[`2d7d373`](https://github.com/apache/spark/commit/2d7d373fe48d18037653c10424c8b1c978160958).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85829640
  
--- Diff: R/pkg/R/context.R ---
@@ -290,6 +290,33 @@ spark.addFile <- function(path, recursive = FALSE) {
   invisible(callJMethod(sc, "addFile", 
suppressWarnings(normalizePath(path)), recursive))
 }
 
+
+#' Adds a JAR dependency for all tasks to be executed on this SparkContext 
in the future.
+#'
+#' The `path` passed can be either a local file, a file in HDFS (or other 
Hadoop-supported
--- End diff --

use `\code{path}` instead of \`path\` in R doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15700: [SPARK-17964][SparkR] Enable SparkR with Mesos client mo...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15700
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14444: [SPARK-16839] [SQL] redundant aliases after clean...

2016-10-31 Thread eyalfa

Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/1#discussion_r85839282
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -137,7 +137,7 @@ object ColumnStatStruct {
   private def numTrues(e: Expression): Expression = Sum(If(e, one, zero))
   private def numFalses(e: Expression): Expression = Sum(If(Not(e), one, 
zero))
 
-  private def getStruct(exprs: Seq[Expression]): CreateStruct = {
+  private def getStruct(exprs: Seq[Expression]): CreateNamedStruct = {
--- End diff --

@wzhfy, you seem to be the last committer. care to have a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15704
  
**[Test build #67843 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67843/consoleFull)**
 for PR 15704 at commit 
[`84f2315`](https://github.com/apache/spark/commit/84f2315501b2f34d247e8750d1e01fff6ff9fb55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15675: [SPARK-18144][SQL] logging StreamingQueryListener$QueryS...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15675
  
**[Test build #67833 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67833/consoleFull)**
 for PR 15675 at commit 
[`8d2b3a6`](https://github.com/apache/spark/commit/8d2b3a63f8d14acb8bd33ce15a9b7a593b703f97).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15695: [SPARK-18143][SQL]Ignore Structured Streaming event logs...

2016-10-31 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15695
  
Merging in branch-2.0. Can you close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15697
  
windows failure seems unrelated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...

2016-10-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15688
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15608: [SPARK-17838][SparkR] Check named arguments for options ...

2016-10-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15608
  
LGTM. I ran through a few other cases and I think the omitted names are 
handled properly with this.
This should go to master then (handledCallJMethod is not in branch-2.0)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15666: [SPARK-11421] [Core][Python][R] Added ability for...

2016-10-31 Thread mariusvniekerk

Github user mariusvniekerk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15666#discussion_r85833766
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1700,19 +1700,34 @@ class SparkContext(config: SparkConf) extends 
Logging {
* Adds a JAR dependency for all tasks to be executed on this 
SparkContext in the future.
* The `path` passed can be either a local file, a file in HDFS (or 
other Hadoop-supported
* filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on 
every worker node.
+   * If addToCurrentClassLoader is true, attempt to add the new class to 
the current threads' class
+   * loader. In general adding to the current threads' class loader will 
impact all other
+   * application threads unless they have explicitly changed their class 
loader.
*/
   def addJar(path: String) {
+addJar(path, false)
+  }
+
+  def addJar(path: String, addToCurrentClassLoader: Boolean) {
--- End diff --

Keeping it in the Scala makes it simpler for other spark Scala interpeters 
(eg toree, zeppelin) to make use of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15659
  
**[Test build #67824 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67824/consoleFull)**
 for PR 15659 at commit 
[`3bf961e`](https://github.com/apache/spark/commit/3bf961efbffc9b03eba7053348ac6ef1634d0ade).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15698
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67828/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread nchammas

Github user nchammas commented on the issue:

https://github.com/apache/spark/pull/15659
  
We have an AppVeyor build now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15701: [SPARK-18167] [SQL] Also log all partitions when the SQL...

2016-10-31 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15701
  
@yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15673
  
**[Test build #67834 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67834/consoleFull)**
 for PR 15673 at commit 
[`4c438c8`](https://github.com/apache/spark/commit/4c438c8b2575880379e2a9a872fe07018cb62402).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/15703
  
Will add more details in the PR description soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-31 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15302
  
Hi, @hvanhovell .
I made another attempt #15704 by using 'Expression' as you commented.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15675: [SPARK-18144][SQL] logging StreamingQueryListener$QueryS...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15675
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15626
  
**[Test build #67831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67831/consoleFull)**
 for PR 15626 at commit 
[`d6fec94`](https://github.com/apache/spark/commit/d6fec9464e5a8638f0b9ac5dd1df289c30da132f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15697
  
retest it please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/15673
  
@ericl I've pushed a commit with the changes you recommended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15701: [SPARK-18167] [SQL] Also log all partitions when ...

2016-10-31 Thread ericl

GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/15701

[SPARK-18167] [SQL] Also log all partitions when the SQLQuerySuite test 
flakes

## What changes were proposed in this pull request?

One possibility for this test flaking is that we have corrupted the 
partition schema somehow in the tests, which causes the cast to decimal to fail 
in the call. This should at least show us the actual partition values.

## How was this patch tested?

Run it locally, it prints out something like `ArrayBuffer(test(partcol=0), 
test(partcol=1), test(partcol=2), test(partcol=3), test(partcol=4))`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark print-more-info

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15701.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15701


commit 767fef82f8dadd1962abe1d93b2c1bf5e926697d
Author: Eric Liang 
Date:   2016-10-31T21:31:20Z

Mon Oct 31 14:31:20 PDT 2016




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67835/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15626
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67829/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15696: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15696
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15626
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15659
  
@nchammas Ah, yes it is. It is a known problem. AppVeyor triggers the build 
when it has the changes in `R` directory but when it's merged (instead of 
rebased) for example, 
https://github.com/apache/spark/pull/15659/commits/3bf961efbffc9b03eba7053348ac6ef1634d0ade,
 it usually includes the chages in `R`.

It seems generally okay though because the build will succeed in most of 
such cases. I asked this problem to AppVeyoer but I haven't got the response 
yet. We may consider disable this if AppVeyor is being a problem regularly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15696: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15696#discussion_r85848150
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 
---
@@ -141,15 +139,14 @@ class SimpleTextOutputWriter(
   }
 }
 
-class AppendingTextOutputFormat(stagingDir: Path, fileNamePrefix: String)
-  extends TextOutputFormat[NullWritable, Text] {
+class AppendingTextOutputFormat(path: String) extends 
TextOutputFormat[NullWritable, Text] {
 
   val numberFormat = NumberFormat.getInstance()
   numberFormat.setMinimumIntegerDigits(5)
   numberFormat.setGroupingUsed(false)
 
   override def getDefaultWorkFile(context: TaskAttemptContext, extension: 
String): Path = {
-new Path(stagingDir, fileNamePrefix)
+new Path(path)
--- End diff --

Nope -- no more extension coming from Hadoop.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15703
  
**[Test build #67842 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67842/consoleFull)**
 for PR 15703 at commit 
[`c0029f1`](https://github.com/apache/spark/commit/c0029f1a529935c263f9c83691cf84921b343e67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15677: [SPARK-17963][SQL][Documentation] Add examples (extend) ...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15677
  
Okay, @gatorsmile could you maybe submit a small one first please? Will 
refer yours and then I can work together (we can divide the files or packages 
to deal with I guess).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15673
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67834/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15673
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-10-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r85848802
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/sasl/aes/AesCipher.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl.aes;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.ReadableByteChannel;
+import java.nio.channels.WritableByteChannel;
+import java.util.Properties;
+import javax.crypto.spec.SecretKeySpec;
+import javax.crypto.spec.IvParameterSpec;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Throwables;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.*;
+import io.netty.util.AbstractReferenceCounted;
+import org.apache.commons.crypto.cipher.CryptoCipherFactory;
+import org.apache.commons.crypto.random.CryptoRandom;
+import org.apache.commons.crypto.random.CryptoRandomFactory;
+import org.apache.commons.crypto.stream.CryptoInputStream;
+import org.apache.commons.crypto.stream.CryptoOutputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.util.ByteArrayReadableChannel;
+import org.apache.spark.network.util.ByteArrayWritableChannel;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * AES cipher for encryption and decryption.
+ */
+public class AesCipher {
+  private static final Logger logger = 
LoggerFactory.getLogger(AesCipher.class);
+  public static final String ENCRYPTION_HANDLER_NAME = "AesEncryption";
+  public static final String DECRYPTION_HANDLER_NAME = "AesDecryption";
+
+  private final SecretKeySpec inKeySpec;
+  private final IvParameterSpec inIvSpec;
+  private final SecretKeySpec outKeySpec;
+  private final IvParameterSpec outIvSpec;
+  private Properties properties;
+
+  public static final int STREAM_BUFFER_SIZE = 1024 * 32;
+  public static final String TRANSFORM = "AES/CTR/NoPadding";
+
+  public AesCipher(
+  Properties properties,
+  byte[] inKey,
+  byte[] outKey,
+  byte[] inIv,
+  byte[] outIv) throws IOException {
+this.properties = properties;
+inKeySpec = new SecretKeySpec(inKey, "AES");
+inIvSpec = new IvParameterSpec(inIv);
+outKeySpec = new SecretKeySpec(outKey, "AES");
+outIvSpec = new IvParameterSpec(outIv);
+  }
+
+  public AesCipher(AesConfigMessage configMessage) throws IOException  {
+this(new Properties(), configMessage.inKey, configMessage.outKey,
+  configMessage.inIv, configMessage.outIv);
+  }
+
+  /**
+   * Create AES crypto output stream
+   * @param ch The underlying channel to write out.
+   * @return Return output crypto stream for encryption.
+   * @throws IOException
+   */
+  public CryptoOutputStream CreateOutputStream(WritableByteChannel ch) 
throws IOException {
+return new CryptoOutputStream(TRANSFORM, properties, ch, outKeySpec, 
outIvSpec);
+  }
+
+  /**
+   * Create AES crypto input stream
+   * @param ch The underlying channel used to read data.
+   * @return Return input crypto stream for decryption.
+   * @throws IOException
+   */
+  public CryptoInputStream CreateInputStream(ReadableByteChannel ch) 
throws IOException {
+return new CryptoInputStream(TRANSFORM, properties, ch, inKeySpec, 
inIvSpec);
+  }
+
+  /**
+   * Add handlers to channel
+   * @param ch the channel for adding handlers
+   * @throws IOException
+   */
+  public void addToChannel(Channel ch) throws IOException {
+ch.pipeline()
+  .addFirst(ENCRYPTION_HANDLER_NAME, new AesEncryptHandler(this))
+  .addFirst(DECRYPTION_HANDLER_NAME, new AesDecryptHandler(this));
+  }
+
+  /**
+   * Create the configuration

[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-10-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r85848940
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/sasl/aes/AesCipher.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl.aes;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.ReadableByteChannel;
+import java.nio.channels.WritableByteChannel;
+import java.util.Properties;
+import javax.crypto.spec.SecretKeySpec;
+import javax.crypto.spec.IvParameterSpec;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Throwables;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.*;
+import io.netty.util.AbstractReferenceCounted;
+import org.apache.commons.crypto.cipher.CryptoCipherFactory;
+import org.apache.commons.crypto.random.CryptoRandom;
+import org.apache.commons.crypto.random.CryptoRandomFactory;
+import org.apache.commons.crypto.stream.CryptoInputStream;
+import org.apache.commons.crypto.stream.CryptoOutputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.util.ByteArrayReadableChannel;
+import org.apache.spark.network.util.ByteArrayWritableChannel;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * AES cipher for encryption and decryption.
+ */
+public class AesCipher {
+  private static final Logger logger = 
LoggerFactory.getLogger(AesCipher.class);
+  public static final String ENCRYPTION_HANDLER_NAME = "AesEncryption";
+  public static final String DECRYPTION_HANDLER_NAME = "AesDecryption";
+
+  private final SecretKeySpec inKeySpec;
+  private final IvParameterSpec inIvSpec;
+  private final SecretKeySpec outKeySpec;
+  private final IvParameterSpec outIvSpec;
+  private Properties properties;
+
+  public static final int STREAM_BUFFER_SIZE = 1024 * 32;
+  public static final String TRANSFORM = "AES/CTR/NoPadding";
+
+  public AesCipher(
+  Properties properties,
+  byte[] inKey,
+  byte[] outKey,
+  byte[] inIv,
+  byte[] outIv) throws IOException {
+this.properties = properties;
+inKeySpec = new SecretKeySpec(inKey, "AES");
+inIvSpec = new IvParameterSpec(inIv);
+outKeySpec = new SecretKeySpec(outKey, "AES");
+outIvSpec = new IvParameterSpec(outIv);
+  }
+
+  public AesCipher(AesConfigMessage configMessage) throws IOException  {
+this(new Properties(), configMessage.inKey, configMessage.outKey,
+  configMessage.inIv, configMessage.outIv);
+  }
+
+  /**
+   * Create AES crypto output stream
+   * @param ch The underlying channel to write out.
+   * @return Return output crypto stream for encryption.
+   * @throws IOException
+   */
+  public CryptoOutputStream CreateOutputStream(WritableByteChannel ch) 
throws IOException {
+return new CryptoOutputStream(TRANSFORM, properties, ch, outKeySpec, 
outIvSpec);
+  }
+
+  /**
+   * Create AES crypto input stream
+   * @param ch The underlying channel used to read data.
+   * @return Return input crypto stream for decryption.
+   * @throws IOException
+   */
+  public CryptoInputStream CreateInputStream(ReadableByteChannel ch) 
throws IOException {
+return new CryptoInputStream(TRANSFORM, properties, ch, inKeySpec, 
inIvSpec);
+  }
+
+  /**
+   * Add handlers to channel
+   * @param ch the channel for adding handlers
+   * @throws IOException
+   */
+  public void addToChannel(Channel ch) throws IOException {
+ch.pipeline()
+  .addFirst(ENCRYPTION_HANDLER_NAME, new AesEncryptHandler(this))
+  .addFirst(DECRYPTION_HANDLER_NAME, new AesDecryptHandler(this));
+  }
+
+  /**
+   * Create the configuration

[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-10-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r85847613
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/sasl/aes/AesCipher.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl.aes;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.ReadableByteChannel;
+import java.nio.channels.WritableByteChannel;
+import java.util.Properties;
+import javax.crypto.spec.SecretKeySpec;
+import javax.crypto.spec.IvParameterSpec;
+
+import com.google.common.base.Preconditions;
+import com.google.common.base.Throwables;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.*;
+import io.netty.util.AbstractReferenceCounted;
+import org.apache.commons.crypto.cipher.CryptoCipherFactory;
+import org.apache.commons.crypto.random.CryptoRandom;
+import org.apache.commons.crypto.random.CryptoRandomFactory;
+import org.apache.commons.crypto.stream.CryptoInputStream;
+import org.apache.commons.crypto.stream.CryptoOutputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.util.ByteArrayReadableChannel;
+import org.apache.spark.network.util.ByteArrayWritableChannel;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * AES cipher for encryption and decryption.
+ */
+public class AesCipher {
+  private static final Logger logger = 
LoggerFactory.getLogger(AesCipher.class);
+  public static final String ENCRYPTION_HANDLER_NAME = "AesEncryption";
+  public static final String DECRYPTION_HANDLER_NAME = "AesDecryption";
+
+  private final SecretKeySpec inKeySpec;
+  private final IvParameterSpec inIvSpec;
+  private final SecretKeySpec outKeySpec;
+  private final IvParameterSpec outIvSpec;
+  private Properties properties;
+
+  public static final int STREAM_BUFFER_SIZE = 1024 * 32;
+  public static final String TRANSFORM = "AES/CTR/NoPadding";
+
+  public AesCipher(
+  Properties properties,
+  byte[] inKey,
+  byte[] outKey,
+  byte[] inIv,
+  byte[] outIv) throws IOException {
+this.properties = properties;
+inKeySpec = new SecretKeySpec(inKey, "AES");
+inIvSpec = new IvParameterSpec(inIv);
+outKeySpec = new SecretKeySpec(outKey, "AES");
+outIvSpec = new IvParameterSpec(outIv);
+  }
+
+  public AesCipher(AesConfigMessage configMessage) throws IOException  {
+this(new Properties(), configMessage.inKey, configMessage.outKey,
+  configMessage.inIv, configMessage.outIv);
+  }
+
+  /**
+   * Create AES crypto output stream
+   * @param ch The underlying channel to write out.
+   * @return Return output crypto stream for encryption.
+   * @throws IOException
+   */
+  public CryptoOutputStream CreateOutputStream(WritableByteChannel ch) 
throws IOException {
+return new CryptoOutputStream(TRANSFORM, properties, ch, outKeySpec, 
outIvSpec);
+  }
+
+  /**
+   * Create AES crypto input stream
+   * @param ch The underlying channel used to read data.
+   * @return Return input crypto stream for decryption.
+   * @throws IOException
+   */
+  public CryptoInputStream CreateInputStream(ReadableByteChannel ch) 
throws IOException {
--- End diff --

nit: lower case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:

[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-10-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r85847089
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java
 ---
@@ -80,46 +84,71 @@ public void receive(TransportClient client, ByteBuffer 
message, RpcResponseCallb
   delegate.receive(client, message, callback);
   return;
 }
+if (saslServer == null || !saslServer.isComplete()) {
+  ByteBuf nettyBuf = Unpooled.wrappedBuffer(message);
+  SaslMessage saslMessage;
+  try {
+saslMessage = SaslMessage.decode(nettyBuf);
+  } finally {
+nettyBuf.release();
+  }
 
-ByteBuf nettyBuf = Unpooled.wrappedBuffer(message);
-SaslMessage saslMessage;
-try {
-  saslMessage = SaslMessage.decode(nettyBuf);
-} finally {
-  nettyBuf.release();
-}
-
-if (saslServer == null) {
-  // First message in the handshake, setup the necessary state.
-  client.setClientId(saslMessage.appId);
-  saslServer = new SparkSaslServer(saslMessage.appId, secretKeyHolder,
-conf.saslServerAlwaysEncrypt());
-}
+  if (saslServer == null) {
+// First message in the handshake, setup the necessary state.
+client.setClientId(saslMessage.appId);
+saslServer = new SparkSaslServer(saslMessage.appId, 
secretKeyHolder,
+  conf.saslServerAlwaysEncrypt());
+  }
 
-byte[] response;
-try {
-  response = saslServer.response(JavaUtils.bufferToArray(
-saslMessage.body().nioByteBuffer()));
-} catch (IOException ioe) {
-  throw new RuntimeException(ioe);
+  byte[] response;
+  try {
+response = saslServer.response(JavaUtils.bufferToArray(
+  saslMessage.body().nioByteBuffer()));
+  } catch (IOException ioe) {
+throw new RuntimeException(ioe);
+  }
+  callback.onSuccess(ByteBuffer.wrap(response));
 }
-callback.onSuccess(ByteBuffer.wrap(response));
 
 // Setup encryption after the SASL response is sent, otherwise the 
client can't parse the
 // response. It's ok to change the channel pipeline here since we are 
processing an incoming
 // message, so the pipeline is busy and no new incoming messages will 
be fed to it before this
 // method returns. This assumes that the code ensures, through other 
means, that no outbound
 // messages are being written to the channel while negotiation is 
still going on.
 if (saslServer.isComplete()) {
-  logger.debug("SASL authentication successful for channel {}", 
client);
-  isComplete = true;
-  if 
(SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP)))
 {
+  if 
(!SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP)))
 {
+logger.debug("SASL authentication successful for channel {}", 
client);
+complete(true);
+return ;
+  }
+
+  if (!conf.AesEncryptionEnabled()) {
 logger.debug("Enabling encryption for channel {}", client);
 SaslEncryption.addToChannel(channel, saslServer, 
conf.maxSaslEncryptedBlockSize());
-saslServer = null;
-  } else {
-saslServer.dispose();
-saslServer = null;
+complete(false);
+return;
+  }
+
+  // Extra negotiation should happen after authentication, so return 
directly while
+  // processing authenticate.
+  if (!isAuthenticated) {
+logger.debug("SASL authentication successful for channel {}", 
client);
+isAuthenticated = true;
+return;
+  }
+
+  // Create AES cipher when it is authenticated
+  try {
+AesConfigMessage configMessage = 
AesConfigMessage.decodeMessage(message);
+AesCipher cipher = new AesCipher(configMessage);
+
+// Send response back to client to confirm that server accept 
config.
+callback.onSuccess(JavaUtils.stringToBytes(AesCipher.TRANSFORM));
--- End diff --

This should really be the last statement (after `complete(true)`).

It's a little weird to use the name of the transformation as the success 
message here... I'd say a reply is not even needed, but it's nice for the 
client to know that this particular message succeeded or not. So it's ok to use 
this, but I'd prefer if the client ignored the contents of the reply and 
instead just handled exceptions for the error case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and

[GitHub] spark issue #15699: [SPARK-18030][Tests]Fix flaky FileStreamSourceSuite by n...

2016-10-31 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15699
  
Thanks! Merging to master and 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-10-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r85850906
  
--- Diff: 
common/network-common/src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java
 ---
@@ -374,6 +375,69 @@ public void testDelegates() throws Exception {
 }
   }
 
+  @Test
+  public void testSaslEncryptionAes() throws Exception {
+final AtomicReference response = new 
AtomicReference<>();
+final File file = File.createTempFile("sasltest", ".txt");
+SaslTestCtx ctx = null;
+try {
+  final TransportConf conf = new TransportConf("rpc", new 
SystemPropertyConfigProvider());
+  final TransportConf spyConf = spy(conf);
+  doReturn(true).when(spyConf).AesEncryptionEnabled();
+
+  StreamManager sm = mock(StreamManager.class);
+  when(sm.getChunk(anyLong(), anyInt())).thenAnswer(new 
Answer() {
+@Override
+public ManagedBuffer answer(InvocationOnMock invocation) {
+  return new FileSegmentManagedBuffer(spyConf, file, 0, 
file.length());
+}
+  });
+
+  RpcHandler rpcHandler = mock(RpcHandler.class);
+  when(rpcHandler.getStreamManager()).thenReturn(sm);
+
+  byte[] data = new byte[ 256 * 1024 * 1024];
--- End diff --

nit: no space after `[`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-10-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r85846395
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java
 ---
@@ -80,46 +84,71 @@ public void receive(TransportClient client, ByteBuffer 
message, RpcResponseCallb
   delegate.receive(client, message, callback);
   return;
 }
+if (saslServer == null || !saslServer.isComplete()) {
+  ByteBuf nettyBuf = Unpooled.wrappedBuffer(message);
+  SaslMessage saslMessage;
+  try {
+saslMessage = SaslMessage.decode(nettyBuf);
+  } finally {
+nettyBuf.release();
+  }
 
-ByteBuf nettyBuf = Unpooled.wrappedBuffer(message);
-SaslMessage saslMessage;
-try {
-  saslMessage = SaslMessage.decode(nettyBuf);
-} finally {
-  nettyBuf.release();
-}
-
-if (saslServer == null) {
-  // First message in the handshake, setup the necessary state.
-  client.setClientId(saslMessage.appId);
-  saslServer = new SparkSaslServer(saslMessage.appId, secretKeyHolder,
-conf.saslServerAlwaysEncrypt());
-}
+  if (saslServer == null) {
+// First message in the handshake, setup the necessary state.
+client.setClientId(saslMessage.appId);
+saslServer = new SparkSaslServer(saslMessage.appId, 
secretKeyHolder,
+  conf.saslServerAlwaysEncrypt());
+  }
 
-byte[] response;
-try {
-  response = saslServer.response(JavaUtils.bufferToArray(
-saslMessage.body().nioByteBuffer()));
-} catch (IOException ioe) {
-  throw new RuntimeException(ioe);
+  byte[] response;
+  try {
+response = saslServer.response(JavaUtils.bufferToArray(
+  saslMessage.body().nioByteBuffer()));
+  } catch (IOException ioe) {
+throw new RuntimeException(ioe);
+  }
+  callback.onSuccess(ByteBuffer.wrap(response));
 }
-callback.onSuccess(ByteBuffer.wrap(response));
 
 // Setup encryption after the SASL response is sent, otherwise the 
client can't parse the
 // response. It's ok to change the channel pipeline here since we are 
processing an incoming
 // message, so the pipeline is busy and no new incoming messages will 
be fed to it before this
 // method returns. This assumes that the code ensures, through other 
means, that no outbound
 // messages are being written to the channel while negotiation is 
still going on.
 if (saslServer.isComplete()) {
-  logger.debug("SASL authentication successful for channel {}", 
client);
-  isComplete = true;
-  if 
(SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP)))
 {
+  if 
(!SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP)))
 {
+logger.debug("SASL authentication successful for channel {}", 
client);
+complete(true);
+return ;
--- End diff --

nit: no space before `;`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-10-31 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r85849868
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/util/ByteArrayReadableChannel.java
 ---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.util;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.ReadableByteChannel;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
--- End diff --

nit: too many empty lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 572 matches

Mail list logo