date:20180514

[GitHub] spark issue #21321: [SPARK-24268][SQL] Use datatype.simpleString in error me...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21321
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21299
  
**[Test build #90590 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90590/testReport)**
 for PR 21299 at commit 
[`01e288a`](https://github.com/apache/spark/commit/01e288a332c25dcb9cd5af8c818ae7018dd6a9bb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-14 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187946945
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -136,6 +136,59 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(ArrayContains(a3, Literal.create(null, StringType)), 
null)
   }
 
+  test("ArraysOverlap") {
+val a0 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType))
+val a1 = Literal.create(Seq(4, 5, 3), ArrayType(IntegerType))
+val a2 = Literal.create(Seq(null, 5, 6), ArrayType(IntegerType))
+val a3 = Literal.create(Seq(7, 8), ArrayType(IntegerType))
+val a4 = Literal.create(Seq.empty[Int], ArrayType(IntegerType))
+
+val a5 = Literal.create(Seq[String](null, ""), ArrayType(StringType))
+val a6 = Literal.create(Seq[String]("", "abc"), ArrayType(StringType))
+val a7 = Literal.create(Seq[String]("def", "ghi"), 
ArrayType(StringType))
+
+checkEvaluation(ArraysOverlap(a0, a1), true)
+checkEvaluation(ArraysOverlap(a0, a2), null)
+checkEvaluation(ArraysOverlap(a1, a2), true)
+checkEvaluation(ArraysOverlap(a1, a3), false)
+checkEvaluation(ArraysOverlap(a0, a4), false)
+checkEvaluation(ArraysOverlap(a2, a4), null)
+checkEvaluation(ArraysOverlap(a4, a2), null)
+
+checkEvaluation(ArraysOverlap(a5, a6), true)
+checkEvaluation(ArraysOverlap(a5, a7), null)
+checkEvaluation(ArraysOverlap(a6, a7), false)
+
+// null handling
+checkEvaluation(ArraysOverlap(Literal.create(null, 
ArrayType(IntegerType)), a0), null)
+checkEvaluation(ArraysOverlap(a0, Literal.create(null, 
ArrayType(IntegerType))), null)
+checkEvaluation(ArraysOverlap(
+  Literal.create(Seq(null), ArrayType(IntegerType)),
+  Literal.create(Seq(null), ArrayType(IntegerType))), null)
--- End diff --

This case is covered by 
https://github.com/apache/spark/pull/21028/files#diff-d31eca9f1c4c33104dc2cb8950486910R163
 for instance. Anyway, I am adding another on which is exactly this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20933
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3198/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20933
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #90589 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90589/testReport)**
 for PR 20894 at commit 
[`21f8b10`](https://github.com/apache/spark/commit/21f8b10dda4b0ef71ba69cc6147d1cf8614812f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21246
  
**[Test build #90588 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90588/testReport)**
 for PR 21246 at commit 
[`06b8b6c`](https://github.com/apache/spark/commit/06b8b6c9f4e7d82608d0507860b0dbe7acb6af54).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21291
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3197/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21299
  
**[Test build #90587 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90587/testReport)**
 for PR 21299 at commit 
[`a100dea`](https://github.com/apache/spark/commit/a100dea9573e9b43b993516c817e306d80f72d29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21290: [SPARK-24241][Submit]Do not fail fast when dynami...

2018-05-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21290#discussion_r187943795
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -76,6 +75,7 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
   var proxyUser: String = null
   var principal: String = null
   var keytab: String = null
+  private var dynamicAllocationEnabled: String = null
--- End diff --

Wait, why not simply a boolean here? it's going to be much simpler to write.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3196/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21299
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21317
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3099/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-14 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187941913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask op

[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3195/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-14 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187940989
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/MaskExpressionsUtils.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+/**
+ * Contains all the Utils methods used in the masking expressions.
+ */
+public class MaskExpressionsUtils {
--- End diff --

Because I am invoking also in the Java code generated and I wanted to avoid 
using the match clause (instead of the switch java operation) for performance 
reasons.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21317
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21317
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90576/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21317
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90578/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21317
  
**[Test build #90576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90576/testReport)**
 for PR 21317 at commit 
[`aa0ccc0`](https://github.com/apache/spark/commit/aa0ccc02bc080b067519834ebb2657dd54208c37).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21317
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21317
  
**[Test build #90578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90578/testReport)**
 for PR 21317 at commit 
[`aa0ccc0`](https://github.com/apache/spark/commit/aa0ccc02bc080b067519834ebb2657dd54208c37).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21317
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21317
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3099/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21317
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3194/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21266
  
**[Test build #90579 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90579/testReport)**
 for PR 21266 at commit 
[`fc96adb`](https://github.com/apache/spark/commit/fc96adb099380ffac09d445461435b613e08f9f3).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90579/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21165
  
**[Test build #90585 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90585/testReport)**
 for PR 21165 at commit 
[`05d1d9c`](https://github.com/apache/spark/commit/05d1d9cad761bb09e1131162458fecd5e34f02d2).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21165
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90585/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21165
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21321: [SPARK-24268][SQL] Use datatype.simpleString in error me...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21321
  
**[Test build #90586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90586/testReport)**
 for PR 21321 at commit 
[`ada7667`](https://github.com/apache/spark/commit/ada7667a9f8872f373bc789a7eb0c84987642314).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21321: [SPARK-24268][SQL] Use datatype.simpleString in e...

2018-05-14 Thread mgaido91

GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/21321

[SPARK-24268][SQL] Use datatype.simpleString in error messages

## What changes were proposed in this pull request?

SPARK-22893 tried to unify error messages about dataTypes. Unfortunately, 
still many places were missing the `simpleString` method in other to have the 
same representation everywhere.

The PR unified the messages using alway the simpleString representation of 
the dataTypes in the messages.

## How was this patch tested?

existing/modified UTs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-24268

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21321


commit ada7667a9f8872f373bc789a7eb0c84987642314
Author: Marco Gaido 
Date:   2018-05-05T15:19:45Z

[SPARK-24268][SQL] Use datatype.simpleString in error messages




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #90581 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90581/testReport)**
 for PR 20894 at commit 
[`2bd2713`](https://github.com/apache/spark/commit/2bd27136ae9095beec429ff15a6a5f1be0464419).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90581/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #90581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90581/testReport)**
 for PR 20894 at commit 
[`2bd2713`](https://github.com/apache/spark/commit/2bd27136ae9095beec429ff15a6a5f1be0464419).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21114: [SPARK-22371][CORE] Return None instead of throwing an e...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21114
  
**[Test build #90577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90577/testReport)**
 for PR 21114 at commit 
[`8b30733`](https://github.com/apache/spark/commit/8b30733dba85d9881d0171414616bd0b0893f419).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20933
  
**[Test build #90584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90584/testReport)**
 for PR 20933 at commit 
[`67b1748`](https://github.com/apache/spark/commit/67b1748c8b939a6b484bfc868fd311e381d7f8e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21320
  
**[Test build #90582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90582/testReport)**
 for PR 21320 at commit 
[`9e301b3`](https://github.com/apache/spark/commit/9e301b37bb5863ef5fad8ec15f1d8f3d4b5e6a6f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21311
  
**[Test build #90575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90575/testReport)**
 for PR 21311 at commit 
[`d9d8e62`](https://github.com/apache/spark/commit/d9d8e62c2de7d9d04534396ab3bbf984ab16c7f5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21317
  
**[Test build #90578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90578/testReport)**
 for PR 21317 at commit 
[`aa0ccc0`](https://github.com/apache/spark/commit/aa0ccc02bc080b067519834ebb2657dd54208c37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21165
  
**[Test build #90585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90585/testReport)**
 for PR 21165 at commit 
[`05d1d9c`](https://github.com/apache/spark/commit/05d1d9cad761bb09e1131162458fecd5e34f02d2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19293: [SPARK-22079][SQL] Serializer in HiveOutputWriter miss l...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19293
  
**[Test build #90580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90580/testReport)**
 for PR 19293 at commit 
[`45477fb`](https://github.com/apache/spark/commit/45477fbf00558066e3733a34e1d59ce22c192ee2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21317
  
**[Test build #90576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90576/testReport)**
 for PR 21317 at commit 
[`aa0ccc0`](https://github.com/apache/spark/commit/aa0ccc02bc080b067519834ebb2657dd54208c37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21266
  
**[Test build #90579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90579/testReport)**
 for PR 21266 at commit 
[`fc96adb`](https://github.com/apache/spark/commit/fc96adb099380ffac09d445461435b613e08f9f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21311
  
**[Test build #90574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90574/testReport)**
 for PR 21311 at commit 
[`22a2767`](https://github.com/apache/spark/commit/22a2767b98185edf32be3c36bb255f5837ad7466).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21291
  
**[Test build #90583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90583/testReport)**
 for PR 21291 at commit 
[`3a14bd6`](https://github.com/apache/spark/commit/3a14bd6eeb390dfe0940eb17b5c9988e12aba1bc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21257#discussion_r187935923
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -163,6 +169,12 @@ class HadoopMapReduceCommitProtocol(
   }
 
   override def commitJob(jobContext: JobContext, taskCommits: 
Seq[TaskCommitMessage]): Unit = {
+// first delete the should delete special file
+val committerFs = 
jobContext.getWorkingDirectory.getFileSystem(jobContext.getConfiguration)
--- End diff --

can we change other places in this method to use the `fs` created here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90573/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21319
  
**[Test build #90573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90573/testReport)**
 for PR 21319 at commit 
[`ca6ccb2`](https://github.com/apache/spark/commit/ca6ccb24e8f2910a3ffc07a790f2ba7f57e79056).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-14 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21317
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187935251
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -136,6 +136,59 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(ArrayContains(a3, Literal.create(null, StringType)), 
null)
   }
 
+  test("ArraysOverlap") {
+val a0 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType))
+val a1 = Literal.create(Seq(4, 5, 3), ArrayType(IntegerType))
+val a2 = Literal.create(Seq(null, 5, 6), ArrayType(IntegerType))
+val a3 = Literal.create(Seq(7, 8), ArrayType(IntegerType))
+val a4 = Literal.create(Seq.empty[Int], ArrayType(IntegerType))
+
+val a5 = Literal.create(Seq[String](null, ""), ArrayType(StringType))
+val a6 = Literal.create(Seq[String]("", "abc"), ArrayType(StringType))
+val a7 = Literal.create(Seq[String]("def", "ghi"), 
ArrayType(StringType))
+
+checkEvaluation(ArraysOverlap(a0, a1), true)
+checkEvaluation(ArraysOverlap(a0, a2), null)
+checkEvaluation(ArraysOverlap(a1, a2), true)
+checkEvaluation(ArraysOverlap(a1, a3), false)
+checkEvaluation(ArraysOverlap(a0, a4), false)
+checkEvaluation(ArraysOverlap(a2, a4), null)
+checkEvaluation(ArraysOverlap(a4, a2), null)
+
+checkEvaluation(ArraysOverlap(a5, a6), true)
+checkEvaluation(ArraysOverlap(a5, a7), null)
+checkEvaluation(ArraysOverlap(a6, a7), false)
+
+// null handling
+checkEvaluation(ArraysOverlap(Literal.create(null, 
ArrayType(IntegerType)), a0), null)
+checkEvaluation(ArraysOverlap(a0, Literal.create(null, 
ArrayType(IntegerType))), null)
+checkEvaluation(ArraysOverlap(
+  Literal.create(Seq(null), ArrayType(IntegerType)),
+  Literal.create(Seq(null), ArrayType(IntegerType))), null)
--- End diff --

do we have a test case for it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187934895
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -529,6 +567,239 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least a 
non-null element present also in a2. If the arrays have no common element and 
they are both non-empty and either of them contains a null element null is 
returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
+case TypeCheckResult.TypeCheckSuccess =>
+  if (RowOrdering.isOrderable(elementType)) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} 
cannot be used in comparison.")
+  }
+case failure => failure
+  }
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient private lazy val elementTypeSupportEquals = elementType match 
{
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+
+  @transient private lazy val doEvaluation = if (elementTypeSupportEquals) 
{
+fastEval _
+  } else {
+bruteForceEval _
+  }
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData])
+  }
+
+  /**
+   * A fast implementation which puts all the elements from the smaller 
array in a set
+   * and then performs a lookup on it for each element of the bigger one.
+   * This eval mode works only for data types which implements properly 
the equals method.
+   */
+  private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+val (bigger, smaller, biggerDt) = if (arr1.numElements() > 
arr2.numElements()) {
+  (arr1, arr2, left.dataType.asInstanceOf[ArrayType])
+} else {
+  (arr2, arr1, right.dataType.asInstanceOf[ArrayType])
+}
+if (smaller.numElements() > 0) {
+  val smallestSet = new mutable.HashSet[Any]
+  smaller.foreach(elementType, (_, v) =>
+if (v == null) {
+  hasNull = true
+} else {
+  smallestSet += v
+})
+  bigger.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else if (smallestSet.contains(v1)) {
+  return true
+}
+  )
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  /**
+   * A slower evaluation which performs a nested loop and supports all the 
data types.
+   */
+  private def bruteForceEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+if (arr1.numElements() > 0) {
+  arr1.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else {
+  arr2.foreach(elementType, (_, v2) =>
+if (v1 == null) {
+  hasNull = true
+} else if (ordering.equiv(v1, v2)) {
+  return true
+}
+  )
+})
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (a1, a2) => {
+  val smaller = ctx.freshName("smallerArray")
+  val bigger = ctx.freshName("biggerArray")
+  val comparisonCode = if (elementTypeSupportEquals) {
+fastCodegen(ctx, ev, smaller, bigger)
+  } else {
+bruteForceCodegen(ctx, ev, smaller, bigger)
+  }
+  s"""
+ |ArrayData $smaller;
+ |Arr

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187934593
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -529,6 +567,239 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least a 
non-null element present also in a2. If the arrays have no common element and 
they are both non-empty and either of them contains a null element null is 
returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
+case TypeCheckResult.TypeCheckSuccess =>
+  if (RowOrdering.isOrderable(elementType)) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} 
cannot be used in comparison.")
+  }
+case failure => failure
+  }
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient private lazy val elementTypeSupportEquals = elementType match 
{
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+
+  @transient private lazy val doEvaluation = if (elementTypeSupportEquals) 
{
+fastEval _
+  } else {
+bruteForceEval _
+  }
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData])
+  }
+
+  /**
+   * A fast implementation which puts all the elements from the smaller 
array in a set
+   * and then performs a lookup on it for each element of the bigger one.
+   * This eval mode works only for data types which implements properly 
the equals method.
+   */
+  private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+val (bigger, smaller, biggerDt) = if (arr1.numElements() > 
arr2.numElements()) {
+  (arr1, arr2, left.dataType.asInstanceOf[ArrayType])
+} else {
+  (arr2, arr1, right.dataType.asInstanceOf[ArrayType])
+}
+if (smaller.numElements() > 0) {
+  val smallestSet = new mutable.HashSet[Any]
+  smaller.foreach(elementType, (_, v) =>
+if (v == null) {
+  hasNull = true
+} else {
+  smallestSet += v
+})
+  bigger.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else if (smallestSet.contains(v1)) {
+  return true
+}
+  )
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  /**
+   * A slower evaluation which performs a nested loop and supports all the 
data types.
+   */
+  private def bruteForceEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+if (arr1.numElements() > 0) {
+  arr1.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else {
+  arr2.foreach(elementType, (_, v2) =>
+if (v1 == null) {
+  hasNull = true
+} else if (ordering.equiv(v1, v2)) {
+  return true
+}
+  )
+})
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (a1, a2) => {
+  val smaller = ctx.freshName("smallerArray")
+  val bigger = ctx.freshName("biggerArray")
+  val comparisonCode = if (elementTypeSupportEquals) {
+fastCodegen(ctx, ev, smaller, bigger)
+  } else {
+bruteForceCodegen(ctx, ev, smaller, bigger)
+  }
+  s"""
+ |ArrayData $smaller;
+ |Arr

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187932792
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -529,6 +567,239 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least a 
non-null element present also in a2. If the arrays have no common element and 
they are both non-empty and either of them contains a null element null is 
returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
+case TypeCheckResult.TypeCheckSuccess =>
+  if (RowOrdering.isOrderable(elementType)) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} 
cannot be used in comparison.")
+  }
+case failure => failure
+  }
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient private lazy val elementTypeSupportEquals = elementType match 
{
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+
+  @transient private lazy val doEvaluation = if (elementTypeSupportEquals) 
{
+fastEval _
+  } else {
+bruteForceEval _
+  }
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData])
+  }
+
+  /**
+   * A fast implementation which puts all the elements from the smaller 
array in a set
+   * and then performs a lookup on it for each element of the bigger one.
+   * This eval mode works only for data types which implements properly 
the equals method.
+   */
+  private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+val (bigger, smaller, biggerDt) = if (arr1.numElements() > 
arr2.numElements()) {
+  (arr1, arr2, left.dataType.asInstanceOf[ArrayType])
+} else {
+  (arr2, arr1, right.dataType.asInstanceOf[ArrayType])
+}
+if (smaller.numElements() > 0) {
+  val smallestSet = new mutable.HashSet[Any]
+  smaller.foreach(elementType, (_, v) =>
+if (v == null) {
+  hasNull = true
+} else {
+  smallestSet += v
+})
+  bigger.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else if (smallestSet.contains(v1)) {
+  return true
+}
+  )
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  /**
+   * A slower evaluation which performs a nested loop and supports all the 
data types.
+   */
+  private def bruteForceEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+if (arr1.numElements() > 0) {
+  arr1.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else {
+  arr2.foreach(elementType, (_, v2) =>
+if (v1 == null) {
+  hasNull = true
+} else if (ordering.equiv(v1, v2)) {
+  return true
+}
+  )
+})
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (a1, a2) => {
+  val smaller = ctx.freshName("smallerArray")
+  val bigger = ctx.freshName("biggerArray")
+  val comparisonCode = if (elementTypeSupportEquals) {
+fastCodegen(ctx, ev, smaller, bigger)
+  } else {
+bruteForceCodegen(ctx, ev, smaller, bigger)
+  }
+  s"""
+ |ArrayData $smaller;
+ |Arr

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187931865
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -529,6 +567,239 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least a 
non-null element present also in a2. If the arrays have no common element and 
they are both non-empty and either of them contains a null element null is 
returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
+case TypeCheckResult.TypeCheckSuccess =>
+  if (RowOrdering.isOrderable(elementType)) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} 
cannot be used in comparison.")
+  }
+case failure => failure
+  }
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient private lazy val elementTypeSupportEquals = elementType match 
{
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+
+  @transient private lazy val doEvaluation = if (elementTypeSupportEquals) 
{
+fastEval _
+  } else {
+bruteForceEval _
+  }
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData])
+  }
+
+  /**
+   * A fast implementation which puts all the elements from the smaller 
array in a set
+   * and then performs a lookup on it for each element of the bigger one.
+   * This eval mode works only for data types which implements properly 
the equals method.
+   */
+  private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+val (bigger, smaller, biggerDt) = if (arr1.numElements() > 
arr2.numElements()) {
--- End diff --

the `biggerDt` is not used


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-14 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21316#discussion_r187931208
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
   @Experimental
   @InterfaceStability.Evolving
-  def reduce(func: (T, T) => T): T = rdd.reduce(func)
+  def reduce(func: (T, T) => T): T = withNewExecutionId {
--- End diff --

`reduce`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90571/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

2018-05-14 Thread zheh12

Github user zheh12 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21257#discussion_r187930560
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -163,6 +169,12 @@ class HadoopMapReduceCommitProtocol(
   }
 
   override def commitJob(jobContext: JobContext, taskCommits: 
Seq[TaskCommitMessage]): Unit = {
+// first delete the should delete special file
+val committerFs = 
jobContext.getWorkingDirectory.getFileSystem(jobContext.getConfiguration)
--- End diff --

StagingDir is not always be valid hadoop path, but the JobContext work dir 
always be.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #90571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90571/testReport)**
 for PR 21288 at commit 
[`4520044`](https://github.com/apache/spark/commit/4520044d3be40ba8bf963a151db2dd9769c0f59a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21299
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90570/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21299
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21299
  
**[Test build #90570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90570/testReport)**
 for PR 21299 at commit 
[`a100dea`](https://github.com/apache/spark/commit/a100dea9573e9b43b993516c817e306d80f72d29).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

2018-05-14 Thread zheh12

Github user zheh12 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21257#discussion_r187929156
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -898,4 +898,12 @@ object DDLUtils {
 "Cannot overwrite a path that is also being read from.")
 }
   }
+
+  def verifyReadPath(query: LogicalPlan, outputPath: Path): Boolean = {
--- End diff --

isInReadPath or inReadPath or isReadPath better?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-05-14 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20701
  
kindly ping @holdenk


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-05-14 Thread andrusha

Github user andrusha commented on the issue:

https://github.com/apache/spark/pull/21260
  
@liyinan926 sounds fair, it also pending tests, I'll add those in todo list 
for this PR and ping you once its done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21290: [SPARK-24241][Submit]Do not fail fast when dynamic resou...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21290
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90569/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21290: [SPARK-24241][Submit]Do not fail fast when dynamic resou...

2018-05-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21290
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21290: [SPARK-24241][Submit]Do not fail fast when dynamic resou...

2018-05-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21290
  
**[Test build #90569 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90569/testReport)**
 for PR 21290 at commit 
[`17fa3bc`](https://github.com/apache/spark/commit/17fa3bc1011db45c346dac4c5930258810fc28d1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

2018-05-14 Thread cxzl25

Github user cxzl25 commented on the issue:

https://github.com/apache/spark/pull/21311
  
Thanks for your review. @maropu @kiszk @cloud-fan 

I submitted a modification including the following:
1. spliting append func into two parts:grow/appendG
2. doubling the size when growing
3. sys.error instead of UnsupportedOperationException



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

2018-05-14 Thread cxzl25

Github user cxzl25 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21311#discussion_r187907410
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 }
 
 // There is 8 bytes for the pointer to next value
-if (cursor + 8 + row.getSizeInBytes > page.length * 8L + 
Platform.LONG_ARRAY_OFFSET) {
+val needSize = cursor + 8 + row.getSizeInBytes
+val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
+if (needSize > nowSize) {
   val used = page.length
   if (used >= (1 << 30)) {
 sys.error("Can not build a HashedRelation that is larger than 8G")
--- End diff --

ok. sys.error instead of UnsupportedOperationException


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

2018-05-14 Thread cxzl25

Github user cxzl25 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21311#discussion_r187907559
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 }
 
 // There is 8 bytes for the pointer to next value
-if (cursor + 8 + row.getSizeInBytes > page.length * 8L + 
Platform.LONG_ARRAY_OFFSET) {
+val needSize = cursor + 8 + row.getSizeInBytes
+val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
+if (needSize > nowSize) {
   val used = page.length
   if (used >= (1 << 30)) {
 sys.error("Can not build a HashedRelation that is larger than 8G")
   }
-  ensureAcquireMemory(used * 8L * 2)
-  val newPage = new Array[Long](used * 2)
+  val multiples = math.max(math.ceil(needSize.toDouble / (used * 
8L)).toInt, 2)
+  ensureAcquireMemory(used * 8L * multiples)
--- End diff --

ok.Spliting append func into two parts: grow/append.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

2018-05-14 Thread cxzl25

Github user cxzl25 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21311#discussion_r187907473
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 }
 
 // There is 8 bytes for the pointer to next value
-if (cursor + 8 + row.getSizeInBytes > page.length * 8L + 
Platform.LONG_ARRAY_OFFSET) {
+val needSize = cursor + 8 + row.getSizeInBytes
+val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
+if (needSize > nowSize) {
   val used = page.length
   if (used >= (1 << 30)) {
 sys.error("Can not build a HashedRelation that is larger than 8G")
   }
-  ensureAcquireMemory(used * 8L * 2)
--- End diff --

ok . Doubling the size when growing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r187905421
  
--- Diff: docs/rdd-programming-guide.md ---
@@ -1548,6 +1548,9 @@ data.map(g)
 
 
 
+In new version of Spark(> 2.3), the semantic of Accumulator has been 
changed a bit: it now includes updates from 
--- End diff --

it's not needed now


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21165
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21257#discussion_r187905010
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -898,4 +898,12 @@ object DDLUtils {
 "Cannot overwrite a path that is also being read from.")
 }
   }
+
+  def verifyReadPath(query: LogicalPlan, outputPath: Path): Boolean = {
--- End diff --

`isReadPath`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21257#discussion_r187904340
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -163,6 +169,12 @@ class HadoopMapReduceCommitProtocol(
   }
 
   override def commitJob(jobContext: JobContext, taskCommits: 
Seq[TaskCommitMessage]): Unit = {
+// first delete the should delete special file
+val committerFs = 
jobContext.getWorkingDirectory.getFileSystem(jobContext.getConfiguration)
--- End diff --

will this be different from 
`stagingDir.getFileSystem(jobContext.getConfiguration)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21257#discussion_r187903129
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ---
@@ -120,7 +120,8 @@ abstract class FileCommitProtocol {
* Specifies that a file should be deleted with the commit of this job. 
The default
* implementation deletes the file immediately.
*/
-  def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): 
Boolean = {
+  def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean,
--- End diff --

seems the `recursive` is always passed as true? can we remove it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21291#discussion_r187900810
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -621,6 +621,25 @@ class PlannerSuite extends SharedSQLContext {
   requiredOrdering = Seq(orderingA, orderingB),
   shouldHaveSort = true)
   }
+
+  test("SPARK-24242: RangeExec should have correct output ordering") {
+val df = spark.range(10).orderBy("id")
--- End diff --

why do we put an `orderBy` in the query?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21291#discussion_r187900467
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
@@ -621,6 +621,25 @@ class PlannerSuite extends SharedSQLContext {
   requiredOrdering = Seq(orderingA, orderingB),
   shouldHaveSort = true)
   }
+
+  test("SPARK-24242: RangeExec should have correct output ordering") {
--- End diff --

ordering and partitioning


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21291
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-14 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/21316#discussion_r187899299
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
   @Experimental
   @InterfaceStability.Evolving
-  def reduce(func: (T, T) => T): T = rdd.reduce(func)
+  def reduce(func: (T, T) => T): T = withNewExecutionId {
--- End diff --

@maropu When you asked about this API did you refer to `reduce` or 
`withNewExecutionId`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21106
  
mostly LGTM, though people may have better ideas about naming.

cc @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-05-14 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21320
  
@gatorsmile I believe this is the PR you requested for review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-14 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16578
  
I'm closing this PR in favor of #21320. That PR deals with simple 
projection and filter queries only. I will submit subsequent PRs for 
aggregation and join queries following the acceptance of #21320.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-14 Thread mallman

Github user mallman closed the pull request at:

https://github.com/apache/spark/pull/16578


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21106: [SPARK-23711][SQL][WIP] Add fallback logic for Un...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21106#discussion_r187897203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
 ---
@@ -87,12 +87,11 @@ class InterpretedUnsafeProjection(expressions: 
Array[Expression]) extends Unsafe
 /**
  * Helper functions for creating an [[InterpretedUnsafeProjection]].
  */
-object InterpretedUnsafeProjection extends UnsafeProjectionCreator {
-
+object InterpretedUnsafeProjection {
   /**
* Returns an [[UnsafeProjection]] for given sequence of bound 
Expressions.
*/
-  override protected def createProjection(exprs: Seq[Expression]): 
UnsafeProjection = {
+  protected[sql] def createProjection(exprs: Seq[Expression]): 
UnsafeProjection = {
--- End diff --

Did you change it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-14 Thread mallman

GitHub user mallman opened a pull request:

https://github.com/apache/spark/pull/21320

[SPARK-4502][SQL] Parquet nested column pruning - foundation

(Link to Jira: https://issues.apache.org/jira/browse/SPARK-4502)

_N.B. This is a restart of PR #16578 which includes everything in that PR 
except the aggregation and join schema pruning rules. Relevant review comments 
from that PR should be considered incorporated by reference. Please avoid 
duplication in review by reviewing that PR first. The summary below is an 
edited copy of the summary of the previous PR._

## What changes were proposed in this pull request?

One of the hallmarks of a column-oriented data storage format is the 
ability to read data from a subset of columns, efficiently skipping reads from 
other columns. Spark has long had support for pruning unneeded top-level schema 
fields from the scan of a parquet file. For example, consider a table, 
`contacts`, backed by parquet with the following Spark SQL schema:

```
root
 |-- name: struct
 ||-- first: string
 ||-- last: string
 |-- address: string
```

Parquet stores this table's data in three physical columns: `name.first`, 
`name.last` and `address`. To answer the query

```SQL
select address from contacts
```

Spark will read only from the `address` column of parquet data. However, to 
answer the query

```SQL
select name.first from contacts
```

Spark will read `name.first` and `name.last` from parquet.

This PR modifies Spark SQL to support a finer-grain of schema pruning. With 
this patch, Spark reads only the `name.first` column to answer the previous 
query.

### Implementation

There are two main components of this patch. First, there is a 
`ParquetSchemaPruning` optimizer rule for gathering the required schema fields 
of a `PhysicalOperation` over a parquet file, constructing a new schema based 
on those required fields and rewriting the plan in terms of that pruned schema. 
The pruned schema fields are pushed down to the parquet requested read schema. 
`ParquetSchemaPruning` uses a new `ProjectionOverSchema` extractor for 
rewriting a catalyst expression in terms of a pruned schema.

Second, the `ParquetRowConverter` has been patched to ensure the ordinals 
of the parquet columns read are correct for the pruned schema. 
`ParquetReadSupport` has been patched to address a compatibility mismatch 
between Spark's built in vectorized reader and the parquet-mr library's reader.

### Limitation

Among the complex Spark SQL data types, this patch supports parquet column 
pruning of nested sequences of struct fields only.

## How was this patch tested?

Care has been taken to ensure correctness and prevent regressions. A more 
advanced version of this patch incorporating optimizations for rewriting 
queries involving aggregations and joins has been running on a production Spark 
cluster at VideoAmp for several years. In that time, one bug was found and 
fixed early on, and we added a regression test for that bug.

We forward-ported this patch to Spark master in June 2016 and have been 
running this patch against Spark 2.x branches on ad-hoc clusters since then.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/VideoAmp/spark-public 
spark-4502-parquet_column_pruning-foundation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21320


commit 9e301b37bb5863ef5fad8ec15f1d8f3d4b5e6a6f
Author: Michael Allman 
Date:   2016-06-24T17:21:24Z

[SPARK-4502][SQL] Parquet nested column pruning




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21106: [SPARK-23711][SQL][WIP] Add fallback logic for Un...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21106#discussion_r187897086
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallback.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.codehaus.commons.compiler.CompileException
+import org.codehaus.janino.InternalCompilerException
+
+import org.apache.spark.TaskContext
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Utils
+
+/**
+ * Catches compile error during code generation.
+ */
+object CodegenError {
+  def unapply(throwable: Throwable): Option[Exception] = throwable match {
+case e: InternalCompilerException => Some(e)
+case e: CompileException => Some(e)
+case _ => None
+  }
+}
+
+/**
+ * Defines values for `SQLConf` config of fallback mode. Use for test only.
+ */
+object CodegenObjectFactoryMode extends Enumeration {
+  val AUTO, CODEGEN_ONLY, NO_CODEGEN = Value
+
+  def currentMode: CodegenObjectFactoryMode.Value = {
+// If we weren't on task execution, accesses that config.
+if (TaskContext.get == null) {
+  val config = SQLConf.get.getConf(SQLConf.CODEGEN_FACTORY_MODE)
+  CodegenObjectFactoryMode.withName(config)
+} else {
+  CodegenObjectFactoryMode.AUTO
+}
+  }
+}
+
+/**
+ * A factory which can be used to create objects that have both codegen 
and interpreted
--- End diff --

the comment also needs update.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19789
  
cc @marmbrus @zsxwing 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19293: [SPARK-22079][SQL] Serializer in HiveOutputWriter miss l...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19293
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21114: [SPARK-22371][CORE] Return None instead of throwing an e...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21114
  
this behavior change LGTM, but the test is over complicated and seems has 
limited value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21114: [SPARK-22371][CORE] Return None instead of throwi...

2018-05-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21114#discussion_r187891772
  
--- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala ---
@@ -237,6 +236,65 @@ class AccumulatorSuite extends SparkFunSuite with 
Matchers with LocalSparkContex
 acc.merge("kindness")
 assert(acc.value === "kindness")
   }
+
+  test("updating garbage collected accumulators") {
--- End diff --

what does this test do? prove accumulator can be GCed even it's valid? The 
map in  `AccumulatorContext` is fixed-size so this definitely can happen and we 
don't need to prove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-14 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21266
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

401 - 500 of 555 matches

Mail list logo