[GitHub] spark issue #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-02 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21958
  
Thanks, merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync

2018-08-02 Thread ceedubs
Github user ceedubs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21971#discussion_r207246356
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -61,6 +62,36 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) extends 
Serializable with Loggi
   (index, data) => results(index) = data, results.flatten.toSeq)
   }
 
+
+  /**
+   * Returns a future of an aggregation across the RDD.
+   *
+   * @see [[RDD.aggregate]] which is the synchronous version of this 
method.
+   */
+  def aggregateAsync[U](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) 
=> U): FutureAction[U] =
+self.withScope {
--- End diff --

In the synchronous version of `aggregate`, the `zeroValue` is cloned, which 
requires adding an implicit `ClassTag[U]` argument. I didn't really understand 
the motivation for that, so I didn't do it here, but I was hoping that someone 
who understood the cloning could let me know here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21971
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21971
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21722: Spark-24742: Fix NullPointerexception in Field Metadata

2018-08-02 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21722
  
Heh, collided yeah. I thought one commit would end up failing but since 
they're identical guess it results in an empty commit. No big deal.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

2018-08-02 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21236#discussion_r207247448
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -98,6 +98,9 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 if (expected.isNaN) result.isNaN else expected == result
   case (result: Float, expected: Float) =>
 if (expected.isNaN) result.isNaN else expected == result
+  case (result: UnsafeRow, expected: GenericInternalRow) =>
--- End diff --

Roger that, looks like Wenchen just did so. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21971
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21632
  
**[Test build #94015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94015/testReport)**
 for PR 21632 at commit 
[`981d707`](https://github.com/apache/spark/commit/981d7072c4574184342868616c69bd44bc33ce3b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21955
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21955
  
**[Test build #94012 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94012/testReport)**
 for PR 21955 at commit 
[`efea0a8`](https://github.com/apache/spark/commit/efea0a889e0ff9ee226f2bd94c58817d9c96d812).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21955
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94012/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21958: [minor] remove dead code in ExpressionEvalHelper

2018-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21958


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21944: [SPARK-24988][SQL]Add a castBySchema method which...

2018-08-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21944#discussion_r207248446
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1367,6 +1367,22 @@ class Dataset[T] private[sql](
 }: _*)
   }
 
+  /**
+   * Casts all the values of the current Dataset following the types of a 
specific StructType.
+   * This method works also with nested structTypes.
+   *
+   *  @group typedrel
+   *  @since 2.4.0
+   */
+  def castBySchema(schema: StructType): DataFrame = {
+
assert(schema.fields.map(_.name).toList.sameElements(this.schema.fields.map(_.name).toList),
+  "schema should have the same fields as the original schema")
+
+selectExpr(schema.map(
--- End diff --

There are many good one liner tricks and I would just leave those good 
tricks in mailing list or something. I wouldn't add an API only because it 
_might be_ helpful to some users.

We shouldn't add an API only because it _might be_ useful. I would consider 
adding this if there's a request for this PR multiple times, it is not one 
liner change and there's no easy workaround for it.

Otherwise, every system will have an API to send an email.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1669/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-02 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21927#discussion_r207249569
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -340,6 +340,22 @@ class DAGScheduler(
 }
   }
 
+  /**
+   * Check to make sure we don't launch a barrier stage with unsupported 
RDD chain pattern. The
+   * following patterns are not supported:
+   * 1. Ancestor RDDs that have different number of partitions from the 
resulting RDD (eg.
+   * union()/coalesce()/first()/PartitionPruningRDD);
--- End diff --

OK I see that it'll be a different number of partitions, but conceptually 
it should be OK, right?  the user just wants all tasks launched together, even 
if its a different number of tasks than the number of partitions in the 
original barrier rdd.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-02 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/21927#discussion_r207249848
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -340,6 +340,22 @@ class DAGScheduler(
 }
   }
 
+  /**
+   * Check to make sure we don't launch a barrier stage with unsupported 
RDD chain pattern. The
+   * following patterns are not supported:
+   * 1. Ancestor RDDs that have different number of partitions from the 
resulting RDD (eg.
+   * union()/coalesce()/first()/PartitionPruningRDD);
--- End diff --

but anyway, I guess its also fine to not support this case, I was just 
trying to understand myself.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21944: [SPARK-24988][SQL]Add a castBySchema method which...

2018-08-02 Thread mahmoudmahdi24
Github user mahmoudmahdi24 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21944#discussion_r207250060
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1367,6 +1367,22 @@ class Dataset[T] private[sql](
 }: _*)
   }
 
+  /**
+   * Casts all the values of the current Dataset following the types of a 
specific StructType.
+   * This method works also with nested structTypes.
+   *
+   *  @group typedrel
+   *  @since 2.4.0
+   */
+  def castBySchema(schema: StructType): DataFrame = {
+
assert(schema.fields.map(_.name).toList.sameElements(this.schema.fields.map(_.name).toList),
+  "schema should have the same fields as the original schema")
+
+selectExpr(schema.map(
--- End diff --

Ok I understand, Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21754
  
**[Test build #93993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93993/testReport)**
 for PR 21754 at commit 
[`7f98b88`](https://github.com/apache/spark/commit/7f98b885b3c6b8675790c4ba7bc79eef0958448d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93993/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21895
  
**[Test build #93991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93991/testReport)**
 for PR 21895 at commit 
[`c620fff`](https://github.com/apache/spark/commit/c620fff90d20ba1b62e1277317754d5f14567f79).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21895
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93991/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21895
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21944: [SPARK-24988][SQL]Add a castBySchema method which...

2018-08-02 Thread mahmoudmahdi24
Github user mahmoudmahdi24 closed the pull request at:

https://github.com/apache/spark/pull/21944


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21944: [SPARK-24988][SQL]Add a castBySchema method which casts ...

2018-08-02 Thread mahmoudmahdi24
Github user mahmoudmahdi24 commented on the issue:

https://github.com/apache/spark/pull/21944
  
Closed the PR. This might be a useful trick, but we want to avoid adding 
many methods to the API. We'll reopen this in case many users asks for it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/21919
  
Minimum and maximum offset in the sink wouldn't make sense for most 
sources. There aren't any meaningful values to report for e.g. writing out 
Parquet files. It'd make sense to put them inside just the Kafka 
WriterCommitMessage, but then I don't think that requires API support.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-02 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21953
  
We have seen jobs overloading the first disk returned by Yarn.  
Unfortunately the details of the job have long expired.  Its in general a good 
practice to distribute the load anyway.

I remember one of the jobs was python.  I think it was  the case if you 
look in like EvalPythonExec.scala:

  // The queue used to buffer input rows so we can drain it to
  // combine input with output from Python.
  val queue = HybridRowQueue(context.taskMemoryManager(),
new File(Utils.getLocalDir(SparkEnv.get.conf)), child.output.length)

That is always going to hit the disk yarn returns first for every container 
on that node.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-02 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21953
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21930: [SPARK-14540][Core] Fix remaining major issues for Scala...

2018-08-02 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21930
  
@srowen thanks! So 2.12 will be optional for Spark 2.4? And the major 
version for Spark 3.0?
What is the plan?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21953
  
**[Test build #94016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94016/testReport)**
 for PR 21953 at commit 
[`3986e75`](https://github.com/apache/spark/commit/3986e75c3c000e7a7e7674be6837d663499f35f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21954
  
**[Test build #94002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94002/testReport)**
 for PR 21954 at commit 
[`c3bf6a0`](https://github.com/apache/spark/commit/c3bf6a0059a151ba23cf32c842e31ced3b28726c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ResolveHigherOrderFunctions(catalog: SessionCatalog) 
extends Rule[LogicalPlan] `
  * `  s\"its class is $`
  * `case class ResolveLambdaVariables(conf: SQLConf) extends 
Rule[LogicalPlan] `
  * `case class NamedLambdaVariable(`
  * `case class LambdaFunction(`
  * `trait HigherOrderFunction extends Expression `
  * `trait ArrayBasedHigherOrderFunction extends HigherOrderFunction with 
ExpectsInputTypes `
  * `case class ArrayTransform(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21930: [SPARK-14540][Core] Fix remaining major issues for Scala...

2018-08-02 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21930
  
Yes we need to also create a 2.12 build of Spark in 2.4. We might still 
have to label it "beta" as I still kind of suspect there's a corner case 
lurking here. I can't speak for 3.0, but would assume it would try to support 
2.13, and not support 2.11.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21954
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94002/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21954
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21935
  
**[Test build #94004 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94004/testReport)**
 for PR 21935 at commit 
[`fed8505`](https://github.com/apache/spark/commit/fed850598ff4c52ff3c6cd54f2d3d719b8a745e7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21953: [SPARK-24992][Core] spark should randomize yarn l...

2018-08-02 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21953#discussion_r207258035
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -460,7 +461,14 @@ private[spark] object Utils extends Logging {
 if (useCache && fetchCacheEnabled) {
   val cachedFileName = s"${url.hashCode}${timestamp}_cache"
   val lockFileName = s"${url.hashCode}${timestamp}_lock"
-  val localDir = new File(getLocalDir(conf))
+  var localDir: File = null
+  // Set the cachedLocalDir for the first time and re-use it later
+  this.synchronized {
--- End diff --

if we want to be more efficient to not hit the synchronized block each time 
we could do one extra check before it to check cachedLocalDir.isEmpty.   Only 
if its empty do we enter synchronized and then recheck if still empty.  

this would be very similar to getOrCreateLocalRootDirs


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94004/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-08-02 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20611#discussion_r207258477
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -303,94 +303,44 @@ case class LoadDataCommand(
   s"partitioned, but a partition spec was provided.")
   }
 }
-
-val loadPath =
+val loadPath = {
   if (isLocal) {
-val uri = Utils.resolveURI(path)
-val file = new File(uri.getPath)
-val exists = if (file.getAbsolutePath.contains("*")) {
-  val fileSystem = FileSystems.getDefault
-  val dir = file.getParentFile.getAbsolutePath
-  if (dir.contains("*")) {
-throw new AnalysisException(
-  s"LOAD DATA input path allows only filename wildcard: $path")
-  }
-
-  // Note that special characters such as "*" on Windows are not 
allowed as a path.
-  // Calling `WindowsFileSystem.getPath` throws an exception if 
there are in the path.
-  val dirPath = fileSystem.getPath(dir)
-  val pathPattern = new File(dirPath.toAbsolutePath.toString, 
file.getName).toURI.getPath
-  val safePathPattern = if (Utils.isWindows) {
-// On Windows, the pattern should not start with slashes for 
absolute file paths.
-pathPattern.stripPrefix("/")
-  } else {
-pathPattern
-  }
-  val files = new File(dir).listFiles()
-  if (files == null) {
-false
-  } else {
-val matcher = fileSystem.getPathMatcher("glob:" + 
safePathPattern)
-files.exists(f => 
matcher.matches(fileSystem.getPath(f.getAbsolutePath)))
-  }
-} else {
-  new File(file.getAbsolutePath).exists()
-}
-if (!exists) {
-  throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
-}
-uri
+val localFS = FileContext.getLocalFSFileContext()
+localFS.makeQualified(new Path(path))
   } else {
-val uri = new URI(path)
-val hdfsUri = if (uri.getScheme() != null && uri.getAuthority() != 
null) {
-  uri
-} else {
-  // Follow Hive's behavior:
-  // If no schema or authority is provided with non-local inpath,
-  // we will use hadoop configuration "fs.defaultFS".
-  val defaultFSConf = 
sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
-  val defaultFS = if (defaultFSConf == null) {
-new URI("")
-  } else {
-new URI(defaultFSConf)
-  }
-
-  val scheme = if (uri.getScheme() != null) {
-uri.getScheme()
-  } else {
-defaultFS.getScheme()
-  }
-  val authority = if (uri.getAuthority() != null) {
-uri.getAuthority()
-  } else {
-defaultFS.getAuthority()
-  }
-
-  if (scheme == null) {
-throw new AnalysisException(
-  s"LOAD DATA: URI scheme is required for non-local input 
paths: '$path'")
-  }
-
-  // Follow Hive's behavior:
-  // If LOCAL is not specified, and the path is relative,
-  // then the path is interpreted relative to "/user/"
-  val uriPath = uri.getPath()
-  val absolutePath = if (uriPath != null && 
uriPath.startsWith("/")) {
-uriPath
-  } else {
-s"/user/${System.getProperty("user.name")}/$uriPath"
-  }
-  new URI(scheme, authority, absolutePath, uri.getQuery(), 
uri.getFragment())
-}
-val hadoopConf = sparkSession.sessionState.newHadoopConf()
-val srcPath = new Path(hdfsUri)
-val fs = srcPath.getFileSystem(hadoopConf)
-if (!fs.exists(srcPath)) {
-  throw new AnalysisException(s"LOAD DATA input path does not 
exist: $path")
-}
-hdfsUri
+val loadPath = new Path(path)
+// Follow Hive's behavior:
+// If no schema or authority is provided with non-local inpath,
+// we will use hadoop configuration "fs.defaultFS".
+val defaultFSConf = 
sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
+val defaultFS = if (defaultFSConf == null) new URI("") else new 
URI(defaultFSConf)
+// Follow Hive's behavior:
+// If LOCAL is not specified, and the path is relative,
+// then the path is interpreted relative to "/user/"
+val uriPath = new 
Path(s"/user/${System.getProperty

[GitHub] spark issue #21895: [SPARK-24948][SHS] Delegate check access permissions to ...

2018-08-02 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21895
  
retest this pelase


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread carsonwang
Github user carsonwang commented on the issue:

https://github.com/apache/spark/pull/21754
  
This LGTM as a fix. However, ideally we should also support reusing an 
exchange used in different joins. There is no need to shuffle write the same 
table twice, we just need read it differently. For example in one stage, a 
reducer may read partition 0 to 2, while in another stage a reducer may read 
partition 0 to 3. We just need a different partitionStartIndices to form a 
different ShuffledRowRDD, then we can reuse the Exchange. I should have 
addressed this in my new implementation of adaptive execution, @cloud-fan, 
let's pay attention to it when we reviewing that pr. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

2018-08-02 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/21950
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread vackosar
Github user vackosar commented on the issue:

https://github.com/apache/spark/pull/21919
  
@jose-torres why it wouldnt make sense? According to the documentation all 
SS sources have offsets, but not all sinks can also be SS sources e.g. ForEach 
doesnt have offsets in general. So usually the offsets should be available on 
the Sinks, no?
Your expert feedback on this is very appreciated!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-02 Thread vincent-grosbois
Github user vincent-grosbois commented on the issue:

https://github.com/apache/spark/pull/21933
  
Hello, I updated the description and title


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21950
  
**[Test build #94017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94017/testReport)**
 for PR 21950 at commit 
[`aa2a957`](https://github.com/apache/spark/commit/aa2a957751a906fe538822cace019014e763a8c3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21403
  
**[Test build #94001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94001/testReport)**
 for PR 21403 at commit 
[`53e3d96`](https://github.com/apache/spark/commit/53e3d961a0cde6d6ab6b4c8b86b9134b9532f776).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21403
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21403
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94001/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTask...

2018-08-02 Thread jiangxb1987
GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/21972

[SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext with 
BarrierTaskContextImpl

## What changes were proposed in this pull request?

According to 
https://github.com/apache/spark/pull/21758#discussion_r206746905 , current 
declaration of `BarrierTaskContext` didn't extend methods from `TaskContext`. 
Since `TaskContext` is an abstract class and we don't want to change it to a 
trait, we have to define class `BarrierTaskContext` directly.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark BarrierTaskContext

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21972.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21972


commit e5987cf281136528ec0d23f82fe1505abd6545b3
Author: Xingbo Jiang 
Date:   2018-08-02T15:09:37Z

combine BarrierTaskContext with BarrierTaskContextImpl.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21972
  
**[Test build #94018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94018/testReport)**
 for PR 21972 at commit 
[`e5987cf`](https://github.com/apache/spark/commit/e5987cf281136528ec0d23f82fe1505abd6545b3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21972
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21972
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1670/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21936: [SPARK-24981][Core] ShutdownHook timeout causes j...

2018-08-02 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21936#discussion_r207269915
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -571,7 +571,12 @@ class SparkContext(config: SparkConf) extends Logging {
 _shutdownHookRef = ShutdownHookManager.addShutdownHook(
   ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () =>
   logInfo("Invoking stop() from shutdown hook")
-  stop()
+  try {
+stop()
+  } catch {
+case e: Throwable =>
+  logWarning("Ignoring Exception while stopping SparkContext", e)
--- End diff --

minor nit, could you add in "while stopping SparkContext from shutdownhook"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread jose-torres
Github user jose-torres commented on the issue:

https://github.com/apache/spark/pull/21919
  
For file streams, the offsets are just indices into a log the source keeps 
of which files it's seen. So a file sink doesn't have any access to those 
offsets.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21403
  
**[Test build #93999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93999/testReport)**
 for PR 21403 at commit 
[`0f00a06`](https://github.com/apache/spark/commit/0f00a06a1853cb13d1d156bafcb85973c92e2b8e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21403
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21403
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93999/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21954
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21954
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21954
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1671/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21959: [SPARK-23698] Define xrange() for Python 3 in dum...

2018-08-02 Thread cclauss
Github user cclauss closed the pull request at:

https://github.com/apache/spark/pull/21959


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21954
  
**[Test build #94019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94019/testReport)**
 for PR 21954 at commit 
[`c3bf6a0`](https://github.com/apache/spark/commit/c3bf6a0059a151ba23cf32c842e31ced3b28726c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21960: [SPARK-23698] Remove unused definitions of long a...

2018-08-02 Thread cclauss
Github user cclauss closed the pull request at:

https://github.com/apache/spark/pull/21960


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...

2018-08-02 Thread zecevicp
Github user zecevicp commented on the issue:

https://github.com/apache/spark/pull/21109
  
Implementing spilling over seems a lot of work because this is a queue. If 
data is spilled over to disk and you need to pop from the queue, it is not 
clear to me what is the best way to do that. Do you spill over only one part of 
the queue (so that you can add or pop more efficiently)? Which part (the 
beginning or the end)? Or maybe the middle? What is the threshold to bring it 
back to memory from disk? And other similar questions...
But I think it can be expected that much less memory will be consumed by 
the queue, compared to the original `ExternalAppendOnlyUnsafeRowArray`, because 
the queue's purpose IS to reduce the number of rows in memory, so spill-over 
would rarely be needed (that would depend, of course, to the user's range 
condition). 
That's why implementing spilling over doesn't seem critical to me. I can 
try and implement it, if everybody thinks it's really needed, but as I said, 
it's not clear (to me) what would be the best approach.

Regarding the second point, this is not an ordinary range join, but an 
equi-join with a secondary range condition.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16486: [SPARK-13610][ML] Create a Transformer to disassemble ve...

2018-08-02 Thread AlbertPlaPlanas
Github user AlbertPlaPlanas commented on the issue:

https://github.com/apache/spark/pull/16486
  
Was this ever implemented?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21935
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1672/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21972
  
**[Test build #94018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94018/testReport)**
 for PR 21972 at commit 
[`e5987cf`](https://github.com/apache/spark/commit/e5987cf281136528ec0d23f82fe1505abd6545b3).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21972
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94018/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21972
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21935
  
**[Test build #94020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94020/testReport)**
 for PR 21935 at commit 
[`fed8505`](https://github.com/apache/spark/commit/fed850598ff4c52ff3c6cd54f2d3d719b8a745e7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
These test failures are in Spark streaming. Is this just an intermittent 
test failure or actually caused by this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21936: [SPARK-24981][Core] ShutdownHook timeout causes job to f...

2018-08-02 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21936
  
@tgravescs I updated. Thanks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21936: [SPARK-24981][Core] ShutdownHook timeout causes job to f...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21936
  
**[Test build #94021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94021/testReport)**
 for PR 21936 at commit 
[`a328163`](https://github.com/apache/spark/commit/a328163c97c9328a85e6415a716c130de9892b16).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21754
  
**[Test build #94000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94000/testReport)**
 for PR 21754 at commit 
[`5dfd948`](https://github.com/apache/spark/commit/5dfd94843ff776e75e0c0fb5198f36bfebf94288).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread vackosar
Github user vackosar commented on the issue:

https://github.com/apache/spark/pull/21919
  
Yes, I was hoping to improve that eg using filename as offset or other non 
consumer-owned approach, but that would be rather long term. Do you think it is 
solvable?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-02 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21953
  
@tgravescs I updated it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94000/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spar...

2018-08-02 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r207281949
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -107,7 +109,14 @@ private[spark] class Client(
   def run(): Unit = {
 val resolvedDriverSpec = builder.buildFromFeatures(kubernetesConf)
 val configMapName = s"$kubernetesResourceNamePrefix-driver-conf-map"
-val configMap = buildConfigMap(configMapName, 
resolvedDriverSpec.systemProperties)
+val isKerberosEnabled = 
kubernetesConf.getTokenManager.isSecurityEnabled
+// HADOOP_SECURITY_AUTHENTICATION is defined as simple for the driver 
and executors as
+// they need only the delegation token to access secure HDFS, no need 
to sign in to Kerberos
+val maybeSimpleAuthentication =
+  if (isKerberosEnabled) Some((s"-D$HADOOP_SECURITY_AUTHENTICATION", 
"simple")) else None
--- End diff --

I agree that the uses cases presented above require Kerberos login on the 
driver and executors. I will address these concerns in my followup commit. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21754
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-02 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21923
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21970: [SPARK-24996][SQL] Use DSL in DeclarativeAggregate

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21970
  
**[Test build #94013 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94013/testReport)**
 for PR 21970 at commit 
[`6273831`](https://github.com/apache/spark/commit/6273831d38069731fdd689c03ce078e6158db2a4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21970: [SPARK-24996][SQL] Use DSL in DeclarativeAggregate

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21970
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94013/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21970: [SPARK-24996][SQL] Use DSL in DeclarativeAggregate

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21970
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21972
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21972
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1673/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21923
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1674/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21923
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1675/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21972: [SPARK-24795][CORE][FOLLOWUP] Combine BarrierTaskContext...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21972
  
**[Test build #94022 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94022/testReport)**
 for PR 21972 at commit 
[`f3ea13d`](https://github.com/apache/spark/commit/f3ea13d68736cf445d2d72f66cbb2d082a7853bc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21923
  
**[Test build #94023 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94023/testReport)**
 for PR 21923 at commit 
[`ba6aa6c`](https://github.com/apache/spark/commit/ba6aa6c829bfcca1b4b3d5a33fe3a7460e7db1f0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21754
  
**[Test build #94024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94024/testReport)**
 for PR 21754 at commit 
[`5dfd948`](https://github.com/apache/spark/commit/5dfd94843ff776e75e0c0fb5198f36bfebf94288).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-02 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/21305
  
@cloud-fan, I'll fix the conflicts and re-run tests. Yesterday's tests 
passed after I updated for your feedback. I'd like to try to get this in soon 
because it is taking so much time to resolve conflicts without any real changes.

FYI @gatorsmile, @bersprockets, @jzhuge


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21941
  
**[Test build #94003 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94003/testReport)**
 for PR 21941 at commit 
[`e7d69db`](https://github.com/apache/spark/commit/e7d69db7cd0c23d6ee9012b5f48b17e5aeac8d66).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21930: [SPARK-14540][Core] Fix remaining major issues for Scala...

2018-08-02 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21930
  
Sure @lrytz can have a second look on this, also it needs to be battle 
tested.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21911: [SPARK-24940][SQL] Coalesce and Repartition Hint ...

2018-08-02 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21911#discussion_r207285266
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
 ---
@@ -102,6 +104,35 @@ object ResolveHints {
 }
   }
 
+  /**
+   * COALESCE Hint accepts name "COALESCE" and "REPARTITION".
+   * Its parameter includes a partition number.
+   */
+  class ResolveCoalesceHints(conf: SQLConf) extends Rule[LogicalPlan] {
+private val COALESCE_HINT_NAMES = Set("COALESCE", "REPARTITION")
+
+private def applyCoalesceHint(
+  plan: LogicalPlan,
+  numPartitions: Int,
+  shuffle: Boolean): LogicalPlan = {
+  Repartition(numPartitions, shuffle, plan)
+}
+
+def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators {
+  case h: UnresolvedHint if 
COALESCE_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) =>
+h.parameters match {
+  case Seq(Literal(numPartitions: Int, IntegerType)) =>
+val shuffle = h.name.toUpperCase(Locale.ROOT) match {
+  case "REPARTITION" => true
+  case "COALESCE" => false
+}
+applyCoalesceHint(h.child, numPartitions, shuffle)
+  case _ =>
+throw new AnalysisException("COALESCE Hint expects a partition 
number as parameter")
--- End diff --

Can you use `h.name.toUpperCase` in this error message instead? I think 
that would be a better message for users that don't know the relationship 
between COALESCE and REPARTITION.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >