[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday

2018-04-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21009
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21009: [SPARK-23905][SQL] Add UDF weekday

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21009


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21004
  
**[Test build #89315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89315/testReport)**
 for PR 21004 at commit 
[`12ac191`](https://github.com/apache/spark/commit/12ac191cb29f4ba1f817abffc8c7422efe837b38).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89315/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21052
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89316/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21057#discussion_r181299329
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, 
fromOffsets=None,
 :param topics:  list of topic_name to consume.
 :param kafkaParams: Additional params for Kafka.
 :param fromOffsets: Per-topic/partition Kafka offsets defining the 
(inclusive) starting
-point of the stream.
+point of the stream (Dict with keys of type 
TopicAndPartition and int values).
--- End diff --

I would say sth like ``a dictionary containing `TopicAndPartition` to 
integers.``.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21052
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21037
  
**[Test build #89314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89314/testReport)**
 for PR 21037 at commit 
[`16ae59c`](https://github.com/apache/spark/commit/16ae59cf02da2cf0cd2e9a311b348bd82b452bff).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21024
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89318/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21053
  
**[Test build #89313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89313/testReport)**
 for PR 21053 at commit 
[`bb0ab45`](https://github.com/apache/spark/commit/bb0ab45b4a9bbf1155dbb9513508bbef3685b3f6).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21052: [SPARK-23799] FilterEstimation.evaluateInSet produces de...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21052
  
**[Test build #89316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89316/testReport)**
 for PR 21052 at commit 
[`74b6ebd`](https://github.com/apache/spark/commit/74b6ebdc2cd8a91944cc6159946f560ba7212a6a).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21024
  
**[Test build #89318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89318/testReport)**
 for PR 21024 at commit 
[`e739a0a`](https://github.com/apache/spark/commit/e739a0a247bc3782ee4348246eff921c86f83e13).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89312/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21024
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21053
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89313/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21037
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89314/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21037
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21053
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89312/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-13 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/21004
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21004
  
**[Test build #89319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89319/testReport)**
 for PR 21004 at commit 
[`12ac191`](https://github.com/apache/spark/commit/12ac191cb29f4ba1f817abffc8c7422efe837b38).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2300/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21053
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-13 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21037
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21053
  
**[Test build #89321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89321/testReport)**
 for PR 21053 at commit 
[`bb0ab45`](https://github.com/apache/spark/commit/bb0ab45b4a9bbf1155dbb9513508bbef3685b3f6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21037
  
**[Test build #89322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89322/testReport)**
 for PR 21037 at commit 
[`16ae59c`](https://github.com/apache/spark/commit/16ae59cf02da2cf0cd2e9a311b348bd82b452bff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89320/testReport)**
 for PR 21061 at commit 
[`29c9b92`](https://github.com/apache/spark/commit/29c9b92e32766a3a79eabb9040e25c368020fa65).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2301/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20953: [SPARK-23822][SQL] Improve error message for Parq...

2018-04-13 Thread yuchenhuo
Github user yuchenhuo commented on a diff in the pull request:

https://github.com/apache/spark/pull/20953#discussion_r181306071
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala
 ---
@@ -179,7 +182,23 @@ class FileScanRDD(
 currentIterator = readCurrentFile()
   }
 
-  hasNext
+  try {
+hasNext
+  } catch {
+case e: SchemaColumnConvertNotSupportedException =>
+  val message = "Parquet column cannot be converted in " +
+s"file ${currentFile.filePath}. Column: ${e.getColumn}, " +
+s"Expected: ${e.getLogicalType}, Found: 
${e.getPhysicalType}"
+  throw new QueryExecutionException(message, e)
--- End diff --

Yes, you are right. Sorry, I shouldn't say "use QueryExecutionException 
instead of the original SparkException". The final exception would still be 
wrapped with a SparkException. Inside the SparkException would be 
QueryExecutionException. But the reason is still the same, they don't want to 
throw too many different Exceptions which might be hard to capture and display.


![38043674-a2485370-326c-11e8-82a6-c36691f1e523](https://user-images.githubusercontent.com/37087310/38722077-9e115878-3eb1-11e8-989c-525268c96e3f.png)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21053
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21053
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2302/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21037
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21037
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2303/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-04-13 Thread vc60er
Github user vc60er commented on the issue:

https://github.com/apache/spark/pull/20078
  
by set spark.streaming.dynamicAllocation.minExecutors  also has same issue 
.https://issues.apache.org/jira/browse/SPARK-14788

@felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21040: [SPARK-23930][SQL] Add slice function

2018-04-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21040#discussion_r181313290
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,101 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+
+/**
+ * Slices an array according to the requested start index and length
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Subsets array x starting from index start (or 
starting from the end if start is negative) with the specified length.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3, 4), 2, 2);
+   [2,3]
+  > SELECT _FUNC_(array(1, 2, 3, 4), -2, 2);
+   [3,4]
+  """, since = "2.4.0")
+// scalastyle:on line.size.limit
+case class Slice(x: Expression, start: Expression, length: Expression)
+  extends TernaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = x.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
IntegerType, IntegerType)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def children: Seq[Expression] = Seq(x, start, length)
+
+  override def nullSafeEval(xVal: Any, startVal: Any, lengthVal: Any): Any 
= {
+val startInt = startVal.asInstanceOf[Int]
+val lengthInt = lengthVal.asInstanceOf[Int]
+val arr = xVal.asInstanceOf[ArrayData]
+val startIndex = if (startInt == 0) {
+  throw new RuntimeException(
+s"Unexpected value for start in function $prettyName:  SQL array 
indices start at 1.")
+} else if (startInt < 0) {
+  startInt + arr.numElements()
+} else {
+  startInt - 1
+}
+if (lengthInt < 0) {
+  throw new RuntimeException(s"Unexpected value for length in function 
$prettyName: " +
+s"length must be greater than or equal to 0.")
+}
+// this can happen if start is negative and its absolute value is 
greater than the
+// number of elements in the array
+if (startIndex < 0) {
+  return new GenericArrayData(Array.empty[AnyRef])
+}
+val elementType = x.dataType.asInstanceOf[ArrayType].elementType
+val data = arr.toArray[AnyRef](elementType)
+new GenericArrayData(data.slice(startIndex, startIndex + lengthInt))
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val elementType = x.dataType.asInstanceOf[ArrayType].elementType
+nullSafeCodeGen(ctx, ev, (x, start, length) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val values = ctx.freshName("values")
+  val i = ctx.freshName("i")
+  val startIdx = ctx.freshName("startIdx")
+  val resLength = ctx.freshName("resLength")
+  val defaultIntValue = 
CodeGenerator.defaultValue(CodeGenerator.JAVA_INT, false)
+  s"""
+ |${CodeGenerator.JAVA_INT} $startIdx = $defaultIntValue;
+ |${CodeGenerator.JAVA_INT} $resLength = $defaultIntValue;
+ |if ($start == 0) {
+ |  throw new RuntimeException("Unexpected value for start in 
function $prettyName: "
+ |+ "SQL array indices start at 1.");
+ |} else if ($start < 0) {
+ |  $startIdx = $start + $x.numElements();
+ |} else {
+ |  // arrays in SQL are 1-based instead of 0-based
+ |  $startIdx = $start - 1;
+ |}
+ |if ($length < 0) {
+ |  throw new RuntimeException("Unexpected value for length in 
function $prettyName: "
+ |+ "length must be greater than or equal to 0.");
+ |} else if ($length > $x.numElements() - $startIdx) {
+ |  $resLength = $x.numElements() - $startIdx;
+ |} else {
+ |  $resLength = $length;
+ |}
+ |Object[] $values;
+ |if ($startIdx < 0) {
+ |  $values = new Object[0];
+ |} else {
+ |  $values = new Object[$resLength];
+ |  for (int $i = 0; $i < $resLength; $i ++) {
+ |$values[$i] = ${CodeGenerator.getValue(x, elementType, s"$i 
+ $startIdx")};
--- End diff --

For the future, I agree that this is the right way to generate Java code 
since we can avoid boxing.

On the other hand, you are proposing to postpone specialization. In `eval` 
and generated code, `GenericArrayData` is generated by using `Object[]`.
I may misunderstand `for coherency` since I may not find the target of the 
coh

[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20858
  
**[Test build #89323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89323/testReport)**
 for PR 20858 at commit 
[`7f5124b`](https://github.com/apache/spark/commit/7f5124ba8752387b3e1d6c0922b551a2cba98356).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20983: [SPARK-23747][Structured Streaming] Add EpochCoordinator...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20983
  
**[Test build #89324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89324/testReport)**
 for PR 20983 at commit 
[`8fa609c`](https://github.com/apache/spark/commit/8fa609cd8ad6130aa16b9bf624fe5b5e0f5ef256).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20888#discussion_r181326106
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -152,22 +154,28 @@ class DataFrameRangeSuite extends QueryTest with 
SharedSQLContext with Eventuall
   }
 
   test("Cancelling stage in a query with Range.") {
+val slices = 10
+
 val listener = new SparkListener {
-  override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
-eventually(timeout(10.seconds), interval(1.millis)) {
-  assert(DataFrameRangeSuite.stageToKill > 0)
+  override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
+eventually(timeout(10.seconds)) {
+  assert(DataFrameRangeSuite.isTaskStarted)
 }
-sparkContext.cancelStage(DataFrameRangeSuite.stageToKill)
+sparkContext.cancelStage(taskStart.stageId)
+DataFrameRangeSuite.semaphore.release(slices)
--- End diff --

I see your point and tried similar things before. How do you think it's 
possible to wait on anything in the task's code without having 
`NotSerializableException`? That's a quite hard limitation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20874
  
**[Test build #89325 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89325/testReport)**
 for PR 20874 at commit 
[`088ac7d`](https://github.com/apache/spark/commit/088ac7dbc8e9bb651be8044a83569de6871a67bf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21024
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21024
  
**[Test build #89326 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89326/testReport)**
 for PR 21024 at commit 
[`e739a0a`](https://github.com/apache/spark/commit/e739a0a247bc3782ee4348246eff921c86f83e13).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21024
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21024
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2304/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20874
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20874
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2305/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21059: [SPARK-23974][CORE] fix when numExecutorsTarget equals m...

2018-04-13 Thread sadhen
Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/21059
  
@jiangxb1987 

I re-investigated the logs and find that there must be bugs in the yarn 
scheduler backend. And this PR is not the right way to fix the issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21059: [SPARK-23974][CORE] fix when numExecutorsTarget e...

2018-04-13 Thread sadhen
Github user sadhen closed the pull request at:

https://github.com/apache/spark/pull/21059


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21060
  
**[Test build #89327 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89327/testReport)**
 for PR 21060 at commit 
[`4656724`](https://github.com/apache/spark/commit/4656724d27c208d794f99691cfbf93b4bb118d93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21040: [SPARK-23930][SQL] Add slice function

2018-04-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21040#discussion_r181338128
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,101 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+
+/**
+ * Slices an array according to the requested start index and length
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Subsets array x starting from index start (or 
starting from the end if start is negative) with the specified length.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3, 4), 2, 2);
+   [2,3]
+  > SELECT _FUNC_(array(1, 2, 3, 4), -2, 2);
+   [3,4]
+  """, since = "2.4.0")
+// scalastyle:on line.size.limit
+case class Slice(x: Expression, start: Expression, length: Expression)
+  extends TernaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = x.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
IntegerType, IntegerType)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def children: Seq[Expression] = Seq(x, start, length)
+
+  override def nullSafeEval(xVal: Any, startVal: Any, lengthVal: Any): Any 
= {
+val startInt = startVal.asInstanceOf[Int]
+val lengthInt = lengthVal.asInstanceOf[Int]
+val arr = xVal.asInstanceOf[ArrayData]
+val startIndex = if (startInt == 0) {
+  throw new RuntimeException(
+s"Unexpected value for start in function $prettyName:  SQL array 
indices start at 1.")
+} else if (startInt < 0) {
+  startInt + arr.numElements()
+} else {
+  startInt - 1
+}
+if (lengthInt < 0) {
+  throw new RuntimeException(s"Unexpected value for length in function 
$prettyName: " +
+s"length must be greater than or equal to 0.")
+}
+// this can happen if start is negative and its absolute value is 
greater than the
+// number of elements in the array
+if (startIndex < 0) {
+  return new GenericArrayData(Array.empty[AnyRef])
+}
+val elementType = x.dataType.asInstanceOf[ArrayType].elementType
+val data = arr.toArray[AnyRef](elementType)
+new GenericArrayData(data.slice(startIndex, startIndex + lengthInt))
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val elementType = x.dataType.asInstanceOf[ArrayType].elementType
+nullSafeCodeGen(ctx, ev, (x, start, length) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val values = ctx.freshName("values")
+  val i = ctx.freshName("i")
+  val startIdx = ctx.freshName("startIdx")
+  val resLength = ctx.freshName("resLength")
+  val defaultIntValue = 
CodeGenerator.defaultValue(CodeGenerator.JAVA_INT, false)
+  s"""
+ |${CodeGenerator.JAVA_INT} $startIdx = $defaultIntValue;
+ |${CodeGenerator.JAVA_INT} $resLength = $defaultIntValue;
+ |if ($start == 0) {
+ |  throw new RuntimeException("Unexpected value for start in 
function $prettyName: "
+ |+ "SQL array indices start at 1.");
+ |} else if ($start < 0) {
+ |  $startIdx = $start + $x.numElements();
+ |} else {
+ |  // arrays in SQL are 1-based instead of 0-based
+ |  $startIdx = $start - 1;
+ |}
+ |if ($length < 0) {
+ |  throw new RuntimeException("Unexpected value for length in 
function $prettyName: "
+ |+ "length must be greater than or equal to 0.");
+ |} else if ($length > $x.numElements() - $startIdx) {
+ |  $resLength = $x.numElements() - $startIdx;
+ |} else {
+ |  $resLength = $length;
+ |}
+ |Object[] $values;
+ |if ($startIdx < 0) {
+ |  $values = new Object[0];
+ |} else {
+ |  $values = new Object[$resLength];
+ |  for (int $i = 0; $i < $resLength; $i ++) {
+ |$values[$i] = ${CodeGenerator.getValue(x, elementType, s"$i 
+ $startIdx")};
--- End diff --

My target of coherency was the `CreateArray` operator and the code 
generated in `GenerateSafeProjection`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21025
  
**[Test build #89328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89328/testReport)**
 for PR 21025 at commit 
[`a7d3a2e`](https://github.com/apache/spark/commit/a7d3a2e28719daf4a49614887d2aa79d090aab69).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21024
  
**[Test build #89329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89329/testReport)**
 for PR 21024 at commit 
[`1cde795`](https://github.com/apache/spark/commit/1cde795fe96b915f7b322ea1746c436d51391528).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2306/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-13 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20981
  
ping @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21025
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2307/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21025: [SPARK-23918][SQL] Add array_min function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21025
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20560
  
**[Test build #89330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89330/testReport)**
 for PR 20560 at commit 
[`6c5f04c`](https://github.com/apache/spark/commit/6c5f04cb989736ced5d7c8695a0740e512df36c6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for Arr...

2018-04-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20984#discussion_r181339660
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -164,3 +167,46 @@ abstract class ArrayData extends SpecializedGetters 
with Serializable {
 }
   }
 }
+
+/**
+ * Implements an `IndexedSeq` interface for `ArrayData`. Notice that if 
the original `ArrayData`
+ * is a primitive array and contains null elements, it is better to ask 
for `IndexedSeq[Any]`,
+ * instead of `IndexedSeq[Int]`, in order to keep the null elements.
+ */
+class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) 
extends IndexedSeq[T] {
+
+  private def getAccessor(dataType: DataType): (Int) => Any = dataType 
match {
+case BooleanType => (idx: Int) => arrayData.getBoolean(idx)
+case ByteType => (idx: Int) => arrayData.getByte(idx)
+case ShortType => (idx: Int) => arrayData.getShort(idx)
+case IntegerType => (idx: Int) => arrayData.getInt(idx)
--- End diff --

I'd like to reuse the access getter in #20981 which covers `DateType` and 
`TimestampType`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21024
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2308/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21024: [SPARK-23917][SQL] Add array_max function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21024
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20560
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2309/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20560
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21031#discussion_r181340756
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3282,6 +3282,14 @@ object functions {
*/
   def size(e: Column): Column = withExpr { Size(e.expr) }
 
+  /**
+   * Returns length of array or map as BigInt.
--- End diff --

BigInt -> long


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...

2018-04-13 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/20923
  
@jerryshao comments? I know without the patched hive or mutant hadoop build 
Spark doesn't work with Hadoop 3, but this sets everything up to build 
consistently, which is a prerequisite to fixing up the semi-official spark hive 
JAR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20695
  
**[Test build #89331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89331/testReport)**
 for PR 20695 at commit 
[`20968c1`](https://github.com/apache/spark/commit/20968c1101d7c19bd81bf561e47e6b477fe0a19a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2310/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89331/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20695
  
**[Test build #89331 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89331/testReport)**
 for PR 20695 at commit 
[`20968c1`](https://github.com/apache/spark/commit/20968c1101d7c19bd81bf561e47e6b477fe0a19a).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SummaryBuilder(JavaWrapper):`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21031#discussion_r181344225
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3282,6 +3282,14 @@ object functions {
*/
   def size(e: Column): Column = withExpr { Size(e.expr) }
 
+  /**
+   * Returns length of array or map as BigInt.
--- End diff --

Good catch, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21031
  
**[Test build #89332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89332/testReport)**
 for PR 21031 at commit 
[`a21f85b`](https://github.com/apache/spark/commit/a21f85ba2bd4c2f3ee33ac0499a4f92fe2e54629).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21031
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21062: Branch 1.2

2018-04-13 Thread androidbestcoder
GitHub user androidbestcoder opened a pull request:

https://github.com/apache/spark/pull/21062

Branch 1.2

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-1.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21062.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21062


commit e7f9dd5cd10d18d0b712916750ac1643df169b4f
Author: Ernest 
Date:   2014-12-18T23:42:26Z

[SPARK-4880] remove spark.locality.wait in Analytics

spark.locality.wait set to 10 in examples/graphx/Analytics.scala.
Should be left to the user.

Author: Ernest 

Closes #3730 from Earne/SPARK-4880 and squashes the following commits:

d79ed04 [Ernest] remove spark.locality.wait in Analytics

(cherry picked from commit a7ed6f3cc537f57de87d28e8466ca88fbfff53b5)
Signed-off-by: Reynold Xin 

commit 61c9b89d84c868e9ecf5cffb9718c46753c9996e
Author: Madhu Siddalingaiah 
Date:   2014-12-19T00:00:53Z

[SPARK-4884]: Improve Partition docs

Rewording was based on this discussion: 
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html
This is the associated JIRA ticket: 
https://issues.apache.org/jira/browse/SPARK-4884

Author: Madhu Siddalingaiah 

Closes #3722 from msiddalingaiah/master and squashes the following commits:

79e679f [Madhu Siddalingaiah] [DOC]: improve documentation
51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
cbccbfe [Madhu Siddalingaiah] Documentation: replace  with  (again)
332f7a2 [Madhu Siddalingaiah] Documentation: replace  with 
cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
0fc12d7 [Madhu Siddalingaiah] Documentation: add description for 
repartitionAndSortWithinPartitions

(cherry picked from commit d5a596d4188bfa85ff49ee85039f54255c19a4de)
Signed-off-by: Josh Rosen 

commit 075b399c59b508251f4fb259e7b0c13b79ff5883
Author: Aaron Davidson 
Date:   2014-12-19T00:43:16Z

[SPARK-4837] NettyBlockTransferService should use spark.blockManager.port 
config

This is used in NioBlockTransferService here:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala#L66

Author: Aaron Davidson 

Closes #3688 from aarondav/SPARK-4837 and squashes the following commits:

ebd2007 [Aaron Davidson] [SPARK-4837] NettyBlockTransferService should use 
spark.blockManager.port config

(cherry picked from commit 105293a7d06b26e7b179a0447eb802074ee9c218)
Signed-off-by: Josh Rosen 

commit ca37639aa1b537d0f9b56bf1362bf293635e235c
Author: Andrew Or 
Date:   2014-12-19T01:37:42Z

[SPARK-4754] Refactor SparkContext into ExecutorAllocationClient

This is such that the `ExecutorAllocationManager` does not take in the 
`SparkContext` with all of its dependencies as an argument. This prevents 
future developers of this class to tie down this class further with the 
`SparkContext`, which has really become quite a monstrous object.

cc'ing pwendell who originally suggested this, and JoshRosen who may have 
thoughts about the trait mix-in style of `SparkContext`.

Author: Andrew Or 

Closes #3614 from andrewor14/dynamic-allocation-sc and squashes the 
following commits:

187070d [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
dynamic-allocation-sc
59baf6c [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
dynamic-allocation-sc
347a348 [Andrew Or] Refactor SparkContext into ExecutorAllocationClient

(cherry picked from commit 9804a759b68f56eceb8a2f4ea90f76a92b5f9f67)
Signed-off-by: Andrew Or 

Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala

commit fd7bb9d9728fa2b4fc6f26ae6a31cfa60d560ad4
Author: Sandy Ryza 
Date:   2014-12-19T06:40:44Z

SPARK-3428. TaskMetrics for running tasks is missing GC time metrics

Author: Sandy Ryza 

Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits:

cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing 
GC time metrics

(cherry picke

[GitHub] spark issue #21062: Branch 1.2

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21062
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21053
  
**[Test build #89321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89321/testReport)**
 for PR 21053 at commit 
[`bb0ab45`](https://github.com/apache/spark/commit/bb0ab45b4a9bbf1155dbb9513508bbef3685b3f6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21062: Branch 1.2

2018-04-13 Thread androidbestcoder
Github user androidbestcoder closed the pull request at:

https://github.com/apache/spark/pull/21062


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21031
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2311/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21053
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21031
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21053
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89321/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20858
  
**[Test build #89323 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89323/testReport)**
 for PR 20858 at commit 
[`7f5124b`](https://github.com/apache/spark/commit/7f5124ba8752387b3e1d6c0922b551a2cba98356).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89323/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21063: [SPARK-23886][Structured Streaming][WIP] Update q...

2018-04-13 Thread efimpoberezkin
GitHub user efimpoberezkin opened a pull request:

https://github.com/apache/spark/pull/21063

[SPARK-23886][Structured Streaming][WIP] Update query status for 
ContinuousExecution

## What changes were proposed in this pull request?

Added query status updates to ContinuousExecution

## How was this patch tested?

Existing tests in ContinuousSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/efimpoberezkin/spark pr/update-query-status

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21063.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21063


commit 8fa7d9f1f1f804e2c75819cb27c67f841c688cdc
Author: Efim Poberezkin 
Date:   2018-04-13T10:20:59Z

Added query status update messages




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Extending the concat function to supp...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21063: [SPARK-23886][Structured Streaming][WIP] Update query st...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21063
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21063: [SPARK-23886][Structured Streaming][WIP] Update query st...

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21063
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20938: [SPARK-23821][SQL] Collection function: flatten

2018-04-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20938#discussion_r181342596
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,160 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Transforms an array of arrays into a single array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(arrayOfArrays) - Transforms an array of arrays into a 
single array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(array(1, 2), array(3, 4));
+   [1,2,3,4]
+  """,
+  since = "2.4.0")
+case class Flatten(child: Expression) extends UnaryExpression {
+
+  override def nullable: Boolean = child.nullable || dataType.containsNull
--- End diff --

`child.nullable || child.dataType.asInstanceOf[ArrayType].containsNull`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20938: [SPARK-23821][SQL] Collection function: flatten

2018-04-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20938#discussion_r181345710
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,160 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Transforms an array of arrays into a single array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(arrayOfArrays) - Transforms an array of arrays into a 
single array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(array(1, 2), array(3, 4));
+   [1,2,3,4]
+  """,
+  since = "2.4.0")
+case class Flatten(child: Expression) extends UnaryExpression {
+
+  override def nullable: Boolean = child.nullable || dataType.containsNull
+
+  override def dataType: ArrayType = {
+child
+  .dataType.asInstanceOf[ArrayType]
+  .elementType.asInstanceOf[ArrayType]
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(_: ArrayType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"The argument should be an array of arrays, " +
+s"but '${child.sql}' is of ${child.dataType.simpleString} type."
+  )
+  }
+
+  override def nullSafeEval(array: Any): Any = {
+val elements = array.asInstanceOf[ArrayData].toObjectArray(dataType)
+
+if (elements.contains(null)) {
+  null
+} else {
+  val flattened = elements.flatMap(
+_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType)
+  )
+  new GenericArrayData(flattened)
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, c => {
+  val code = if (CodeGenerator.isPrimitiveType(dataType.elementType)) {
+  genCodeForConcatOfPrimitiveElements(ctx, c, ev.value)
+} else {
+  genCodeForConcatOfComplexElements(ctx, c, ev.value)
--- End diff --

I'm wondering if we say "complex" for non-primitive types?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20938: [SPARK-23821][SQL] Collection function: flatten

2018-04-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20938#discussion_r181347402
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,160 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Transforms an array of arrays into a single array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(arrayOfArrays) - Transforms an array of arrays into a 
single array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(array(1, 2), array(3, 4));
+   [1,2,3,4]
+  """,
+  since = "2.4.0")
+case class Flatten(child: Expression) extends UnaryExpression {
+
+  override def nullable: Boolean = child.nullable || dataType.containsNull
+
+  override def dataType: ArrayType = {
+child
+  .dataType.asInstanceOf[ArrayType]
+  .elementType.asInstanceOf[ArrayType]
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(_: ArrayType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"The argument should be an array of arrays, " +
+s"but '${child.sql}' is of ${child.dataType.simpleString} type."
+  )
+  }
+
+  override def nullSafeEval(array: Any): Any = {
+val elements = array.asInstanceOf[ArrayData].toObjectArray(dataType)
+
+if (elements.contains(null)) {
+  null
+} else {
+  val flattened = elements.flatMap(
+_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType)
+  )
+  new GenericArrayData(flattened)
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, c => {
+  val code = if (CodeGenerator.isPrimitiveType(dataType.elementType)) {
+  genCodeForConcatOfPrimitiveElements(ctx, c, ev.value)
--- End diff --

nit: indent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20938: [SPARK-23821][SQL] Collection function: flatten

2018-04-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20938#discussion_r181333291
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,160 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Transforms an array of arrays into a single array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(arrayOfArrays) - Transforms an array of arrays into a 
single array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(array(1, 2), array(3, 4));
+   [1,2,3,4]
+  """,
+  since = "2.4.0")
+case class Flatten(child: Expression) extends UnaryExpression {
+
+  override def nullable: Boolean = child.nullable || dataType.containsNull
+
+  override def dataType: ArrayType = {
+child
+  .dataType.asInstanceOf[ArrayType]
+  .elementType.asInstanceOf[ArrayType]
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(_: ArrayType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"The argument should be an array of arrays, " +
+s"but '${child.sql}' is of ${child.dataType.simpleString} type."
+  )
+  }
+
+  override def nullSafeEval(array: Any): Any = {
+val elements = array.asInstanceOf[ArrayData].toObjectArray(dataType)
+
+if (elements.contains(null)) {
+  null
+} else {
+  val flattened = elements.flatMap(
+_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType)
+  )
+  new GenericArrayData(flattened)
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, c => {
+  val code = if (CodeGenerator.isPrimitiveType(dataType.elementType)) {
+  genCodeForConcatOfPrimitiveElements(ctx, c, ev.value)
+} else {
+  genCodeForConcatOfComplexElements(ctx, c, ev.value)
+}
+  nullElementsProtection(ev, c, code)
+})
+  }
+
+  private def nullElementsProtection(
+  ev: ExprCode,
+  childVariableName: String,
+  coreLogic: String): String = {
+s"""
+|for(int z=0; z < $childVariableName.numElements(); z++) {
+|  ${ev.isNull} |= $childVariableName.isNullAt(z);
--- End diff --

How about breaking when `null` is found?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89320/testReport)**
 for PR 21061 at commit 
[`29c9b92`](https://github.com/apache/spark/commit/29c9b92e32766a3a79eabb9040e25c368020fa65).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19627: [SPARK-21088][ML] CrossValidator, TrainValidationSplit s...

2018-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19627
  
**[Test build #89334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89334/testReport)**
 for PR 19627 at commit 
[`80f07fb`](https://github.com/apache/spark/commit/80f07fb93a00e2cda402d312e5c6e915bb400c12).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89320/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >