[GitHub] spark pull request #22014: [SPARK-25036][SQL] avoid match may not be exhaust...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22014#discussion_r208091824
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -709,6 +709,7 @@ object ScalaReflection extends ScalaReflection {
   def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] match {
 case Schema(s: StructType, _) =>
   s.toAttributes
+case _ => throw new RuntimeException(s"$schemaFor is not supported at 
attributesFor()")
--- End diff --

How about this:

```scala
case other =>
  throw new UnsupportedOperationException(s"Attributes for type $other 
is not supported")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22014: [SPARK-25036][SQL] avoid match may not be exhaust...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22014#discussion_r208091445
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala
 ---
@@ -67,6 +67,7 @@ case class ApproxCountDistinctForIntervals(
 (endpointsExpression.dataType, endpointsExpression.eval()) match {
   case (ArrayType(elementType, _), arrayData: ArrayData) =>
 arrayData.toObjectArray(elementType).map(_.toString.toDouble)
+  case _ => throw new RuntimeException("not found at endpoints")
--- End diff --

Can we do this like:

```scala
val endpointsType = endpointsExpression.dataType.asInstanceOf[ArrayType]
val endpoints = endpointsExpression.eval().asInstanceOf[ArrayData]

endpoints.toObjectArray(endpointsType.elementType).map(_.toString.toDouble)
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22014: [SPARK-25036][SQL] avoid match may not be exhaust...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22014#discussion_r208090085
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -471,6 +471,7 @@ class CodegenContext {
   case NewFunctionSpec(functionName, None, None) => functionName
   case NewFunctionSpec(functionName, Some(_), 
Some(innerClassInstance)) =>
 innerClassInstance + "." + functionName
+  case _ => null // nothing to do since addNewFunctionInteral() must 
return one of them
--- End diff --

Shall we throw an `IllegalArgumentException`? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22014: [SPARK-25036][SQL] avoid match may not be exhaust...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22014#discussion_r208089613
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala
 ---
@@ -86,6 +87,7 @@ object ValueInterval {
 val newMax = if (n1.max <= n2.max) n1.max else n2.max
 (Some(EstimationUtils.fromDouble(newMin, dt)),
   Some(EstimationUtils.fromDouble(newMax, dt)))
+  case _ => throw new RuntimeException(s"Not supported pair: $r1, $r2 
at intersect()")
--- End diff --

Shall we do `UnsupportedOperationException`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-06 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r208091782
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -60,14 +61,26 @@ private[spark] object PythonEvalType {
  */
 private[spark] abstract class BasePythonRunner[IN, OUT](
 funcs: Seq[ChainedPythonFunctions],
-bufferSize: Int,
-reuseWorker: Boolean,
 evalType: Int,
-argOffsets: Array[Array[Int]])
+argOffsets: Array[Array[Int]],
+conf: SparkConf)
   extends Logging {
 
   require(funcs.length == argOffsets.length, "argOffsets should have the 
same length as funcs")
 
+  private val bufferSize = conf.getInt("spark.buffer.size", 65536)
+  private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", 
true)
+  private val memoryMb = {
+val allocation = conf.get(PYSPARK_EXECUTOR_MEMORY)
+if (reuseWorker) {
--- End diff --

No, I'm not sure where that is. Is it on the python side? If you can point 
me to it, I'll have a closer look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22009
  
**[Test build #94336 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94336/testReport)**
 for PR 22009 at commit 
[`cab6d28`](https://github.com/apache/spark/commit/cab6d2828dacaca6e62d3409c684d18a1fc861f2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1884/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94326/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21898
  
**[Test build #94326 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94326/testReport)**
 for PR 21898 at commit 
[`1f71e65`](https://github.com/apache/spark/commit/1f71e6583f9f9f270d07323f15c731717e13518d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21991
  
The failed test is `FlatMapGroupsWithStateSuite.flatMapGroupsWithState`. I 
saw it fails some times occasionally. I think it should not be related to this 
change. @HyukjinKwon @dbtsai 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-06 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r208090280
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeWriteCompatibilitySuite.scala
 ---
@@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+import scala.collection.mutable
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.analysis
+import org.apache.spark.sql.catalyst.expressions.Cast
+
+class DataTypeWriteCompatibilitySuite extends SparkFunSuite {
--- End diff --

I'm planning on adding this, but it would be great to get this in and I'll 
add the tests next. It would be great to get this in to no longer keep rebasing 
it! Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-08-06 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21305#discussion_r208090428
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -336,4 +337,124 @@ object DataType {
   case (fromDataType, toDataType) => fromDataType == toDataType
 }
   }
+
+  private val SparkGeneratedName = """col\d+""".r
+  private def isSparkGeneratedName(name: String): Boolean = name match {
+case SparkGeneratedName(_*) => true
+case _ => false
+  }
+
+  /**
+   * Returns true if the write data type can be read using the read data 
type.
+   *
+   * The write type is compatible with the read type if:
+   * - Both types are arrays, the array element types are compatible, and 
element nullability is
+   *   compatible (read allows nulls or write does not contain nulls).
+   * - Both types are maps and the map key and value types are compatible, 
and value nullability
+   *   is compatible  (read allows nulls or write does not contain nulls).
+   * - Both types are structs and each field in the read struct is present 
in the write struct and
+   *   compatible (including nullability), or is nullable if the write 
struct does not contain the
+   *   field. Write-side structs are not compatible if they contain fields 
that are not present in
+   *   the read-side struct.
+   * - Both types are atomic and the write type can be safely cast to the 
read type.
+   *
+   * Extra fields in write-side structs are not allowed to avoid 
accidentally writing data that
+   * the read schema will not read, and to ensure map key equality is not 
changed when data is read.
+   *
+   * @param write a write-side data type to validate against the read type
+   * @param read a read-side data type
+   * @return true if data written with the write type can be read using 
the read type
+   */
+  def canWrite(
+  write: DataType,
+  read: DataType,
+  resolver: Resolver,
+  context: String,
+  addError: String => Unit = (_: String) => {}): Boolean = {
+(write, read) match {
+  case (wArr: ArrayType, rArr: ArrayType) =>
+// run compatibility check first to produce all error messages
+val typesCompatible =
+  canWrite(wArr.elementType, rArr.elementType, resolver, context + 
".element", addError)
+
+if (wArr.containsNull && !rArr.containsNull) {
+  addError(s"Cannot write nullable elements to array of non-nulls: 
'$context'")
+  false
+} else {
+  typesCompatible
+}
+
+  case (wMap: MapType, rMap: MapType) =>
+// map keys cannot include data fields not in the read schema 
without changing equality when
+// read. map keys can be missing fields as long as they are 
nullable in the read schema.
+
+// run compatibility check first to produce all error messages
+val keyCompatible =
+  canWrite(wMap.keyType, rMap.keyType, resolver, context + ".key", 
addError)
+val valueCompatible =
+  canWrite(wMap.valueType, rMap.valueType, resolver, context + 
".value", addError)
+val typesCompatible = keyCompatible && valueCompatible
+
+if (wMap.valueContainsNull && !rMap.valueContainsNull) {
+  addError(s"Cannot write nullable values to map of non-nulls: 
'$context'")
+  false
+} else {
+  typesCompatible
+}
+
+  case (StructType(writeFields), StructType(readFields)) =>
+var fieldCompatible = true
+readFields.zip(writeFields).foreach {
+  case (rField, wField) =>
+val namesMatch = resolver(wField.name, rField.name) || 
isSparkGeneratedName(wField.name)
+val fieldContext = s"$context.${rField.name}"
+val typesCompatible =
+  canWrite(wField.dataType, rField.dataType, resolver, 
fieldContext, addError)
+
+if (!namesMatch) {
+  addError(s"Struct '$context' field name does not match (may 
be out of order): " +
+  s"expected '${rField.name}', found '${wField.name}'")
+  fieldCompatible = false
+} else if (!rField.nullable && wField.nullable) {
+  addError(s"Cannot write nullable values to non-null field: 
'$fieldContext'")
+  fieldCompatible = false
+} else if (!typesCompatible) {
+  // errors are added in the recursive call to canWrite above
+  fieldCompatible = false
+}
+}
+
+if (readFields.size > writeFields.size) {
+  val missingFieldsStr = readFields.takeRight(rea

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-08-06 Thread fuqiliang
Github user fuqiliang commented on the issue:

https://github.com/apache/spark/pull/20666
  
for specify, the json file (Sanity4.json) is
`{"a":"a1","int":1,"other":4.4}
{"a":"a2","int":"","other":""}`

code :

> val config = new SparkConf().setMaster("local[5]").setAppName("test")
> val sc = SparkContext.getOrCreate(config)
> val sql = new SQLContext(sc)
> 
>  val file_path = 
this.getClass.getClassLoader.getResource("Sanity4.json").getFile
>  val df = sql.read.schema(null).json(file_path)
>  df.show(30)



then in spark 1.6, result is
+---++-+
|  a| int|other|
+---++-+
| a1|   1|  4.4|
| a2|null| null|
+---++-+

root
 |-- a: string (nullable = true)
 |-- int: long (nullable = true)
 |-- other: double (nullable = true)

but in spark 2.2, result is
+++-+
|   a| int|other|
+++-+
|  a1|   1|  4.4|
|null|null| null|
+++-+

root
 |-- a: string (nullable = true)
 |-- int: long (nullable = true)
 |-- other: double (nullable = true)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-06 Thread skambha
Github user skambha commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r208089990
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
 ---
@@ -169,25 +181,50 @@ package object expressions  {
 })
   }
 
-  // Find matches for the given name assuming that the 1st part is a 
qualifier (i.e. table name,
-  // alias, or subquery alias) and the 2nd part is the actual name. 
This returns a tuple of
+  // Find matches for the given name assuming that the 1st two parts 
are qualifier
+  // (i.e. database name and table name) and the 3rd part is the 
actual column name.
+  //
+  // For example, consider an example where "db1" is the database 
name, "a" is the table name
+  // and "b" is the column name and "c" is the struct field name.
+  // If the name parts is db1.a.b.c, then Attribute will match
+  // Attribute(b, qualifier("db1,"a")) and List("c") will be the 
second element
+  var matches: (Seq[Attribute], Seq[String]) = nameParts match {
+case dbPart +: tblPart +: name +: nestedFields =>
+  val key = (dbPart.toLowerCase(Locale.ROOT),
+tblPart.toLowerCase(Locale.ROOT), 
name.toLowerCase(Locale.ROOT))
+  val attributes = collectMatches(name, 
qualified3Part.get(key)).filter {
+a => (resolver(dbPart, a.qualifier.head) && resolver(tblPart, 
a.qualifier.last))
+  }
+  (attributes, nestedFields)
+case all =>
+  (Seq.empty, Seq.empty)
+  }
+
+  // If there are no matches, then find matches for the given name 
assuming that
+  // the 1st part is a qualifier (i.e. table name, alias, or subquery 
alias) and the
+  // 2nd part is the actual name. This returns a tuple of
   // matched attributes and a list of parts that are to be resolved.
   //
   // For example, consider an example where "a" is the table name, "b" 
is the column name,
   // and "c" is the struct field name, i.e. "a.b.c". In this case, 
Attribute will be "a.b",
   // and the second element will be List("c").
-  val matches = nameParts match {
-case qualifier +: name +: nestedFields =>
-  val key = (qualifier.toLowerCase(Locale.ROOT), 
name.toLowerCase(Locale.ROOT))
-  val attributes = collectMatches(name, qualified.get(key)).filter 
{ a =>
-resolver(qualifier, a.qualifier.get)
+  matches = matches match {
--- End diff --

done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-08-06 Thread skambha
Github user skambha commented on the issue:

https://github.com/apache/spark/pull/17185
  
Thanks for the review.  I have addressed your comments and pushed the 
changes.  
@cloud-fan, Please take a look.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17185
  
**[Test build #94335 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94335/testReport)**
 for PR 17185 at commit 
[`5f7e5d7`](https://github.com/apache/spark/commit/5f7e5d7bddca593d72818b07d71f678bd0a1982d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21991
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94324/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22018: [SPARK-25038][SQL] Accelerate Spark Plan generation when...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22018
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21991
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-08-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21721#discussion_r208089043
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ---
@@ -196,6 +237,18 @@ trait ProgressReporter extends Logging {
 currentStatus = currentStatus.copy(isTriggerActive = false)
   }
 
+  /** Extract writer from the executed query plan. */
+  private def dataSourceWriter: Option[DataSourceWriter] = {
+if (lastExecution == null) return None
+lastExecution.executedPlan.collect {
+  case p if p.isInstanceOf[WriteToDataSourceV2Exec] =>
--- End diff --

this only works for microbatch mode, do we have a plan to support 
continuous mode?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22018: [SPARK-25038][SQL] Accelerate Spark Plan generation when...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22018
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21991
  
**[Test build #94324 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94324/testReport)**
 for PR 21991 at commit 
[`272d8fd`](https://github.com/apache/spark/commit/272d8fd4c6a46164069e2e3a892f016e9664cf5f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22018: [SPARK-25038][SQL] Accelerate Spark Plan generation when...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22018
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21990: [SPARK-25003][PYSPARK][MASTER] Use SessionExtensions in ...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21990
  
@RussellSpitzer, let's close other ones except for this and name it 
`[SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark`. Let me review this 
one within few days. Also, I don't think we should do it, at least, to 
branch-2.2. This logic here is quite convoluted and I would rather avoid to 
backport even to branch-2.3 actually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21988: [SPARK-25003][PYSPARK][BRANCH-2.2] Use SessionExtensions...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21988
  
Yea, let's just close except the master one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22018: [SPARK-25038][SQL] Accelerate Spark Plan generati...

2018-08-06 Thread habren
GitHub user habren opened a pull request:

https://github.com/apache/spark/pull/22018

[SPARK-25038][SQL] Accelerate Spark Plan generation when Spark SQL re…

https://issues.apache.org/jira/browse/SPARK-25038

When Spark SQL read large amount of data, it take a long time (more than 10 
minutes) to generate physical Plan and then ActiveJob

 

Example:

There is a table which is partitioned by date and hour. There are more than 
13 TB data each hour and 185 TB per day. When we just issue a very simple SQL, 
it take a long time to generate ActiveJob

 

The SQL statement is

select count(device_id) from test_tbl where date=20180731 and hour='21';
 

Before optimization, it takes 2 minutes and 9 seconds to generate the Job

 

The SQL is issued at 2018-08-07 09:07:41



However, the job is submitted at 2018-08-07 09:09:53, which is 2minutes and 
9 seconds later than the SQL issue time



 

After the optimization, it takes only 4 seconds to generate the Job

The SQL is issued at 2018-08-07 09:20:15



 

And the job is submitted at 2018-08-07 09:20:19, which is 4 seconds later 
than the SQL issue time



 

 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/habren/spark SPARK-25038

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22018.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22018


commit 2bb5924e04eba5accfe58a4fbae094d46cc36488
Author: Jason Guo 
Date:   2018-08-07T03:13:03Z

[SPARK-25038][SQL] Accelerate Spark Plan generation when Spark SQL read 
large amount of data




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-08-06 Thread fuqiliang
Github user fuqiliang commented on the issue:

https://github.com/apache/spark/pull/20666
  
Hi, guys, I am a spark user.
I have a question for this "JSON doesn't support partial results for 
corrupted records." behavior.
In spark 1.6, the partial results is given, but when upgraded to 2.2, I 
loss some meaningful datas in my json file.

Could i get those datas come back in spark 2+?  @viirya 
Thanks for help.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21937: [SPARK-23914][SQL][follow-up] refactor ArrayUnion

2018-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21937


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21860
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94327/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21860
  
**[Test build #94327 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94327/testReport)**
 for PR 21860 at commit 
[`f290668`](https://github.com/apache/spark/commit/f2906684f49c84183cf1f5e64ab4b887d4a77ca1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21937: [SPARK-23914][SQL][follow-up] refactor ArrayUnion

2018-08-06 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21937
  
Thanks! merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21937: [SPARK-23914][SQL][follow-up] refactor ArrayUnion

2018-08-06 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21937
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21937: [SPARK-23914][SQL][follow-up] refactor ArrayUnion

2018-08-06 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21937#discussion_r208085448
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3767,230 +3767,160 @@ object ArraySetLike {
   """,
   since = "2.4.0")
 case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetLike
-with ComplexTypeMergingExpression {
-  var hsInt: OpenHashSet[Int] = _
-  var hsLong: OpenHashSet[Long] = _
-
-  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
-val elem = array.getInt(idx)
-if (!hsInt.contains(elem)) {
-  if (resultArray != null) {
-resultArray.setInt(pos, elem)
-  }
-  hsInt.add(elem)
-  true
-} else {
-  false
-}
-  }
-
-  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
-val elem = array.getLong(idx)
-if (!hsLong.contains(elem)) {
-  if (resultArray != null) {
-resultArray.setLong(pos, elem)
-  }
-  hsLong.add(elem)
-  true
-} else {
-  false
-}
-  }
+  with ComplexTypeMergingExpression {
 
-  def evalIntLongPrimitiveType(
-  array1: ArrayData,
-  array2: ArrayData,
-  resultArray: ArrayData,
-  isLongType: Boolean): Int = {
-// store elements into resultArray
-var nullElementSize = 0
-var pos = 0
-Seq(array1, array2).foreach { array =>
-  var i = 0
-  while (i < array.numElements()) {
-val size = if (!isLongType) hsInt.size else hsLong.size
-if (size + nullElementSize > 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
-  ArraySetLike.throwUnionLengthOverflowException(size)
-}
-if (array.isNullAt(i)) {
-  if (nullElementSize == 0) {
-if (resultArray != null) {
-  resultArray.setNullAt(pos)
+  @transient lazy val evalUnion: (ArrayData, ArrayData) => ArrayData = {
+if (elementTypeSupportEquals) {
+  (array1, array2) =>
+val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
+val hs = new OpenHashSet[Any]
+var foundNullElement = false
+Seq(array1, array2).foreach { array =>
+  var i = 0
+  while (i < array.numElements()) {
+if (array.isNullAt(i)) {
+  if (!foundNullElement) {
+arrayBuffer += null
+foundNullElement = true
+  }
+} else {
+  val elem = array.get(i, elementType)
+  if (!hs.contains(elem)) {
+if (arrayBuffer.size > 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  
ArraySetLike.throwUnionLengthOverflowException(arrayBuffer.size)
+}
+arrayBuffer += elem
+hs.add(elem)
+  }
 }
-pos += 1
-nullElementSize = 1
+i += 1
   }
-} else {
-  val assigned = if (!isLongType) {
-assignInt(array, i, resultArray, pos)
+}
+new GenericArrayData(arrayBuffer)
+} else {
+  (array1, array2) =>
+val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
+var alreadyIncludeNull = false
+Seq(array1, array2).foreach(_.foreach(elementType, (_, elem) => {
+  var found = false
+  if (elem == null) {
+if (alreadyIncludeNull) {
+  found = true
+} else {
+  alreadyIncludeNull = true
+}
   } else {
-assignLong(array, i, resultArray, pos)
+// check elem is already stored in arrayBuffer or not?
+var j = 0
+while (!found && j < arrayBuffer.size) {
+  val va = arrayBuffer(j)
+  if (va != null && ordering.equiv(va, elem)) {
+found = true
+  }
+  j = j + 1
+}
   }
-  if (assigned) {
-pos += 1
+  if (!found) {
+if (arrayBuffer.length > 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  
ArraySetLike.throwUnionLengthOverflowException(arrayBuffer.length)
+}
+arrayBuffer += elem
   }
-}
-i += 1
-  }
+}))
+new GenericArrayData(arrayBuffer)
 }
-pos
   }
 
   override def nullSafeEval(input1: Any, input2: Any):

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20611
  
**[Test build #94334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94334/testReport)**
 for PR 20611 at commit 
[`5b5bb52`](https://github.com/apache/spark/commit/5b5bb52e1c334eeec49c318e4c437d04c489671b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-08-06 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21222
  
Thanks @zsxwing for merging and thanks all for reviewing!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `S...

2018-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21991


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-08-06 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21622
  
Thanks @HyukjinKwon for merging, and thanks all for reviewing!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20611
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21721


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21991
  

https://github.com/apache/spark/commit/51bee7aca13451167fa3e701fcd60f023eae5e61 
looks good :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21991
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21721
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22016: Fix typos

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22016
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94328/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE] Replicate large blocks as a stream.

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94319/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22016: Fix typos

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22016
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22016: Fix typos

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22016
  
**[Test build #94328 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94328/testReport)**
 for PR 22016 at commit 
[`0d26901`](https://github.com/apache/spark/commit/0d2690185f6f8765accb78d39a3e74c1df5a4536).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE] Replicate large blocks as a stream.

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE] Replicate large blocks as a stream.

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21451
  
**[Test build #94319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94319/testReport)**
 for PR 21451 at commit 
[`6d059f2`](https://github.com/apache/spark/commit/6d059f25f3595243a8dd6195a5ee938a78e40d99).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94330/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21622: [SPARK-24637][SS] Add metrics regarding state and...

2018-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21622


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22009
  
**[Test build #94330 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94330/testReport)**
 for PR 22009 at commit 
[`2f6d1d2`](https://github.com/apache/spark/commit/2f6d1d27a2a5aabc0db87b2e97f7f8e6fd6fe91c).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21898
  
**[Test build #94333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94333/testReport)**
 for PR 21898 at commit 
[`1f71e65`](https://github.com/apache/spark/commit/1f71e6583f9f9f270d07323f15c731717e13518d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21622
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1883/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21898
  
is there a way to increase the build timeout? cc @shaneknapp 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-06 Thread skambha
Github user skambha commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r208079529
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -201,7 +204,7 @@ case class Alias(child: Expression, name: String)(
   }
 
   override def sql: String = {
-val qualifierPrefix = qualifier.map(_ + ".").getOrElse("")
+val qualifierPrefix = if (qualifier.nonEmpty) qualifier.mkString(".") 
+ "." else ""
--- End diff --

ok, sounds good.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21898
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r208079335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -201,7 +204,7 @@ case class Alias(child: Expression, name: String)(
   }
 
   override def sql: String = {
-val qualifierPrefix = qualifier.map(_ + ".").getOrElse("")
+val qualifierPrefix = if (qualifier.nonEmpty) qualifier.mkString(".") 
+ "." else ""
--- End diff --

ah my bad, I thought it would return empty string for empty seq. Let's 
leave it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21991
  
Let me leave it to you @dbtsai. I thought you live in a timezone completely 
different with me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21991
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94322/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21991
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21978
  
I'd like to wait for https://github.com/apache/spark/pull/17185

#17185 allows users to do `db1.table1.col1`, and we can later extend it to 
`catalog1.db1.table1.col1`.

We should also update the column resolution logic to consider catalog name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21991
  
**[Test build #94322 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94322/testReport)**
 for PR 21991 at commit 
[`11887ae`](https://github.com/apache/spark/commit/11887aefdda4f1a21cde9ad7d1099c91b0744264).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-06 Thread skambha
Github user skambha commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r208078928
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -794,19 +795,37 @@ case class LocalLimit(limitExpr: Expression, child: 
LogicalPlan) extends OrderPr
 /**
  * Aliased subquery.
  *
- * @param alias the alias name for this subquery.
+ * @param name the alias identifier for this subquery.
  * @param child the logical plan of this subquery.
  */
 case class SubqueryAlias(
-alias: String,
+name: AliasIdentifier,
 child: LogicalPlan)
   extends OrderPreservingUnaryNode {
 
-  override def doCanonicalize(): LogicalPlan = child.canonicalized
+  def alias: String = name.identifier
 
-  override def output: Seq[Attribute] = 
child.output.map(_.withQualifier(Some(alias)))
+  override def output: Seq[Attribute] = {
+val qualifierList = name.database.map(Seq(_, 
alias)).getOrElse(Seq(alias))
+child.output.map(_.withQualifier(qualifierList))
+  }
+  override def doCanonicalize(): LogicalPlan = child.canonicalized
 }
 
+object SubqueryAlias {
+  def apply(
+  identifier: String,
+  child: LogicalPlan): SubqueryAlias = {
+SubqueryAlias(AliasIdentifier(identifier), child)
+  }
+
+  def apply(
+  identifier: String,
+  database: Option[String],
--- End diff --

good point! I'll take care of this in the next push.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21991
  
Yes, I think so. I was about to merge this in that way :-). Seems to me we 
are good to merge now since the current change is only checked by Python lint 
which is already passed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21980
  
**[Test build #94332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94332/testReport)**
 for PR 21980 at commit 
[`d4d8d0f`](https://github.com/apache/spark/commit/d4d8d0fd2597d52dd2da5b36da6f05a60d89d25e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-06 Thread skambha
Github user skambha commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r208078754
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -201,7 +204,7 @@ case class Alias(child: Expression, name: String)(
   }
 
   override def sql: String = {
-val qualifierPrefix = qualifier.map(_ + ".").getOrElse("")
+val qualifierPrefix = if (qualifier.nonEmpty) qualifier.mkString(".") 
+ "." else ""
--- End diff --

This won't work for the case when we have Seq.empty.  The suffix "." gets 
returned even for a empty sequence. 
For a non empty Seq, the above call will be fine. 

Shall we leave the 'if' as is  or is there an equivalent preferred style 
that would work? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21980
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1882/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21980
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94318/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21898
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21898
  
**[Test build #94318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94318/testReport)**
 for PR 21898 at commit 
[`1f71e65`](https://github.com/apache/spark/commit/1f71e6583f9f9f270d07323f15c731717e13518d).
 * This patch **fails from timeout after a configured wait of \`300m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21980
  
**[Test build #94331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94331/testReport)**
 for PR 21980 at commit 
[`f60a238`](https://github.com/apache/spark/commit/f60a2384f335b1c95e81a0c232299af9bb426654).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21980
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1881/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21980
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22009
  
**[Test build #94330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94330/testReport)**
 for PR 22009 at commit 
[`2f6d1d2`](https://github.com/apache/spark/commit/2f6d1d27a2a5aabc0db87b2e97f7f8e6fd6fe91c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1880/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

2018-08-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21980#discussion_r208078032
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 ---
@@ -854,6 +854,26 @@ class StreamingQuerySuite extends StreamTest with 
BeforeAndAfter with Logging wi
 assert(uuids.distinct.size == 2)
   }
 
+  test("Rand/Randn in streaming query should not produce results in each 
execution") {
--- End diff --

oops, fixed typo.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21991
  
@HyukjinKwon thanks! Is it possible to use this script to merge this PR 
which has many people involve? A good demonstration of collaboration in the 
community. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21991
  
**[Test build #94329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94329/testReport)**
 for PR 21991 at commit 
[`272d8fd`](https://github.com/apache/spark/commit/272d8fd4c6a46164069e2e3a892f016e9664cf5f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21991
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1879/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21991
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21199
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21956: [MINOR][DOCS] Fix grammatical error in SortShuffl...

2018-08-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21956


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94321/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21632
  
cc also @jkbradley 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21199
  
**[Test build #94321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94321/testReport)**
 for PR 21199 at commit 
[`f4a39d9`](https://github.com/apache/spark/commit/f4a39d9ebae2d6f6ae59caf3140310b17e75b602).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TextSocketContinuousReader(options: DataSourceOptions) extends 
ContinuousReader with Logging `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21991
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21991
  
I am merging this since this is not actually tested and only thing is 
Python linter which is already passed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22016: Fix typos

2018-08-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22016
  
**[Test build #94328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94328/testReport)**
 for PR 22016 at commit 
[`0d26901`](https://github.com/apache/spark/commit/0d2690185f6f8765accb78d39a3e74c1df5a4536).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21956: [MINOR][DOCS] Fix grammatical error in SortShuffleManage...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21956
  
@kiszk, I would appreciate if you feel free to open a PR fixing them, or 
suggest them in someone's PR fixing them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21956: [MINOR][DOCS] Fix grammatical error in SortShuffleManage...

2018-08-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21956
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21980
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

2018-08-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21980#discussion_r208075258
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 ---
@@ -854,6 +854,26 @@ class StreamingQuerySuite extends StreamTest with 
BeforeAndAfter with Logging wi
 assert(uuids.distinct.size == 2)
   }
 
+  test("Rand/Randn in streaming query should not produce results in each 
execution") {
--- End diff --

`produce results` -> `produce same results`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   >