[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94779/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22009
  
**[Test build #94779 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94779/testReport)**
 for PR 22009 at commit 
[`f4f85a8`](https://github.com/apache/spark/commit/f4f85a833ef319a6860134e12655574aca081ed6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20226
  
**[Test build #94786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94786/testReport)**
 for PR 20226 at commit 
[`1facc05`](https://github.com/apache/spark/commit/1facc0554aae0829a19bbb7607b25ff7eda4ef8d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20529: [SPARK-23350][SS]Bug fix for exception handling w...

2018-08-14 Thread yanlin-Lynn
Github user yanlin-Lynn closed the pull request at:

https://github.com/apache/spark/pull/20529


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20529: [SPARK-23350][SS]Bug fix for exception handling when sto...

2018-08-14 Thread yanlin-Lynn
Github user yanlin-Lynn commented on the issue:

https://github.com/apache/spark/pull/20529
  
No Need any more, close this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94785/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94785/testReport)**
 for PR 21889 at commit 
[`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-14 Thread ajacques
Github user ajacques commented on a diff in the pull request:

https://github.com/apache/spark/pull/21889#discussion_r210170646
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
 ---
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import java.io.File
+
+import org.scalactic.Equality
+
+import org.apache.spark.sql.{DataFrame, QueryTest, Row}
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.execution.FileSourceScanExec
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types.StructType
+
+class ParquetSchemaPruningSuite
+extends QueryTest
+with ParquetTest
+with SharedSQLContext {
+  case class FullName(first: String, middle: String, last: String)
+  case class Contact(
+id: Int,
+name: FullName,
+address: String,
+pets: Int,
+friends: Array[FullName] = Array(),
+relatives: Map[String, FullName] = Map())
+
+  val janeDoe = FullName("Jane", "X.", "Doe")
+  val johnDoe = FullName("John", "Y.", "Doe")
+  val susanSmith = FullName("Susan", "Z.", "Smith")
+
+  private val contacts =
+Contact(0, janeDoe, "123 Main Street", 1, friends = Array(susanSmith),
+  relatives = Map("brother" -> johnDoe)) ::
+Contact(1, johnDoe, "321 Wall Street", 3, relatives = Map("sister" -> 
janeDoe)) :: Nil
+
+  case class Name(first: String, last: String)
+  case class BriefContact(id: Int, name: Name, address: String)
+
+  private val briefContacts =
+BriefContact(2, Name("Janet", "Jones"), "567 Maple Drive") ::
+BriefContact(3, Name("Jim", "Jones"), "6242 Ash Street") :: Nil
+
+  case class ContactWithDataPartitionColumn(
+id: Int,
+name: FullName,
+address: String,
+pets: Int,
+friends: Array[FullName] = Array(),
--- End diff --

This is a definition in a constructor, so I don't think we can do:
```scala
case class Contact(
  [...]
  friends: Array.empty[FullName],
  relatives: Map.empty[String, FullName]
)
```
Scala wants a colon, so I opted for:
```scala
case class Contact(
  [...]
  friends: Array[FullName] = Array.empty,
  relatives: Map[String, FullName] = Map.empty
)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

2018-08-14 Thread rednaxelafx
Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/22103#discussion_r210169785
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
   try {
 val cf = new ClassFile(new ByteArrayInputStream(classBytes))
 val stats = cf.methodInfos.asScala.flatMap { method =>
-  method.getAttributes().filter(_.getClass.getName == 
codeAttr.getName).map { a =>
+  method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
--- End diff --

> I worried whether the strictness may change behavior.

Right, I can tell. And as I've mentioned above, although my new check is 
stricter, it doesn't make the behavior any "worse" than before, because we're 
reflectively accessing the `code` field immediately after via 
`codeAttrField.get(a)`, and that won't work unless the classes are matching 
exactly.

The old code before my change would actually be too permissive -- in the 
case of class loader mismatch, the old check will allow the it go run to the 
reflective access site, but it'll then fail because reflection doesn't allow 
access from the wrong class.

This can be exemplified by the following pseudocode 
```
val c1 = new URLClassLoader(somePath).loadClass("Foo") // load a class
val c2 = new URLClassLoader(somePath).loadClass("Foo") // load another 
class with the same name from the same path, but different class loader
val nameEq = c1.getName == c2.getName // true
val refEq = c1 eq c2 // false
val f1 = c1.getClass.getField("a")
val o1 = c1.newInstance
val o2 = c2.newInstance
f1.get(o1) // okay
f1.get(o2) // fail with exception
``` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94785/testReport)**
 for PR 21889 at commit 
[`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22031: [TODO][SPARK-23932][SQL] Higher order function zip_with

2018-08-14 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22031
  
@techaddict Thanks! I look forward to the update.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22031: [TODO][SPARK-23932][SQL] Higher order function zip_with

2018-08-14 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/22031
  
Hi @ueshin I will update the PR tommorow


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22031: [TODO][SPARK-23932][SQL] Higher order function zip_with

2018-08-14 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22031
  
Hi @techaddict, 
Do you have time to continue working on this?
If you don't have enough time, I can take this over, so please let me know.
Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

2018-08-14 Thread deshanxiao
Github user deshanxiao commented on the issue:

https://github.com/apache/spark/pull/22093
  
@xuanyuanking I think it carefully later. You are right. The UT just need 
to guarantee the KryoSerializer right not all. I will add it in 
`KryoSerializerSuite`. Should I delete current UT from FileSuit?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22099: [SPARK-25111][BUILD] increment kinesis client/producer &...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22099
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22099: [SPARK-25111][BUILD] increment kinesis client/producer &...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22099
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94776/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22099: [SPARK-25111][BUILD] increment kinesis client/producer &...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22099
  
**[Test build #94776 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94776/testReport)**
 for PR 22099 at commit 
[`e79e5b9`](https://github.com/apache/spark/commit/e79e5b9c0bbdf24dcc3cda30dc2c1a70d12b02aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210164955
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,60 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Returns a map that applies the function to each value of the map.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(expr, func) - Transforms values in the map using the 
function.",
+examples = """
+Examples:
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 
1);
+map(array(1, 2, 3), array(2, 3, 4))
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + 
v);
+map(array(1, 2, 3), array(2, 4, 6))
+  """,
+since = "2.4.0")
+case class TransformValues(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  override def dataType: DataType = {
+val map = argument.dataType.asInstanceOf[MapType]
+MapType(map.keyType, function.dataType, function.nullable)
+  }
+
+  @transient val MapType(keyType, valueType, valueContainsNull) = 
argument.dataType
--- End diff --

`lazy val`?
Could you add a test when argument is not a map in invalid cases of 
`DataFrameFunctionsSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210165194
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala
 ---
@@ -95,6 +95,12 @@ class HigherOrderFunctionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 aggregate(expr, zero, merge, identity)
   }
 
+  def transformValues(expr: Expression, f: (Expression, Expression) => 
Expression): Expression = {
+val valueType = expr.dataType.asInstanceOf[MapType].valueType
+val keyType = expr.dataType.asInstanceOf[MapType].keyType
+TransformValues(expr, createLambda(keyType, false, valueType, true, f))
--- End diff --

We should use `valueContainsNull` instead of `true`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210165373
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala
 ---
@@ -283,6 +289,61 @@ class HigherOrderFunctionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   15)
   }
 
+  test("TransformValues") {
+val ai0 = Literal.create(
+  Map(1 -> 1, 2 -> 2, 3 -> 3),
+  MapType(IntegerType, IntegerType))
+val ai1 = Literal.create(
+  Map(1 -> 1, 2 -> null, 3 -> 3),
+  MapType(IntegerType, IntegerType))
+val ain = Literal.create(
+  Map.empty[Int, Int],
+  MapType(IntegerType, IntegerType))
--- End diff --

Can you add tests for `Literal.create(null, MapType(IntegerType, 
IntegerType))`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210165543
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/higher-order-functions.sql ---
@@ -51,3 +51,17 @@ select exists(ys, y -> y > 30) as v from nested;
 
 -- Check for element existence in a null array
 select exists(cast(null as array), y -> y > 30) as v;
+ 
+create or replace temporary view nested as values
+  (1, map(1,1,2,2,3,3)),
+  (2, map(4,4,5,5,6,6))
--- End diff --

nit:

```
  (1, map(1, 1, 2, 2, 3, 3)),
  (2, map(4, 4, 5, 5, 6, 6))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210164879
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,60 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Returns a map that applies the function to each value of the map.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(expr, func) - Transforms values in the map using the 
function.",
+examples = """
+Examples:
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 
1);
--- End diff --

nit: we need one more right parenthesis after the second `array(1, 2, 3)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210164976
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,60 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Returns a map that applies the function to each value of the map.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(expr, func) - Transforms values in the map using the 
function.",
+examples = """
+Examples:
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 
1);
+map(array(1, 2, 3), array(2, 3, 4))
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + 
v);
+map(array(1, 2, 3), array(2, 4, 6))
+  """,
+since = "2.4.0")
+case class TransformValues(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  override def dataType: DataType = {
+val map = argument.dataType.asInstanceOf[MapType]
+MapType(map.keyType, function.dataType, function.nullable)
--- End diff --

We can use `keyType` from the following val?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210165225
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala
 ---
@@ -283,6 +289,61 @@ class HigherOrderFunctionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   15)
   }
 
+  test("TransformValues") {
+val ai0 = Literal.create(
+  Map(1 -> 1, 2 -> 2, 3 -> 3),
+  MapType(IntegerType, IntegerType))
--- End diff --

Can you add `valueContainsNull` explicitly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210165102
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,60 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Returns a map that applies the function to each value of the map.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(expr, func) - Transforms values in the map using the 
function.",
+examples = """
+Examples:
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 
1);
+map(array(1, 2, 3), array(2, 3, 4))
+   > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + 
v);
+map(array(1, 2, 3), array(2, 4, 6))
+  """,
+since = "2.4.0")
+case class TransformValues(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  override def dataType: DataType = {
+val map = argument.dataType.asInstanceOf[MapType]
+MapType(map.keyType, function.dataType, function.nullable)
+  }
+
+  @transient val MapType(keyType, valueType, valueContainsNull) = 
argument.dataType
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction)
+  : TransformValues = {
+copy(function = f(function, (keyType, false) :: (valueType, 
valueContainsNull) :: Nil))
+  }
+
+  @transient lazy val (keyVar, valueVar) = {
+val LambdaFunction(
+_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: 
Nil, _) = function
+(keyVar, valueVar)
+  }
--- End diff --

nit: how about:

```scala
@transient lazy val LambdaFunction(_,
  (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, 
_) = function
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22045#discussion_r210165448
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -2302,6 +2302,210 @@ class DataFrameFunctionsSuite extends QueryTest 
with SharedSQLContext {
 assert(ex5.getMessage.contains("function map_zip_with does not support 
ordering on type map"))
   }
 
+  test("transform values function - test various primitive data types 
combinations") {
--- End diff --

We don't need so many cases here. We only need to verify the api works end 
to end.
Evaluation checks of the function should be in `HigherOrderFunctionsSuite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21661
  
**[Test build #94784 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94784/testReport)**
 for PR 21661 at commit 
[`34ca4ef`](https://github.com/apache/spark/commit/34ca4ef2b0757b2832ac2dbcc364b42eb4f34e48).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21661
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2202/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21661
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94781/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21561
  
**[Test build #94781 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94781/testReport)**
 for PR 21561 at commit 
[`ecab85c`](https://github.com/apache/spark/commit/ecab85c921fbc81865b800be25d533fea8e75fd5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21661
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22045
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22045
  
**[Test build #94783 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94783/testReport)**
 for PR 22045 at commit 
[`b73106d`](https://github.com/apache/spark/commit/b73106d43000972ab9adae3d3b463a0dada2b9cc).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22045
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94783/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22045
  
**[Test build #94783 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94783/testReport)**
 for PR 22045 at commit 
[`b73106d`](https://github.com/apache/spark/commit/b73106d43000972ab9adae3d3b463a0dada2b9cc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210165079
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,65 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Transform Keys for every entry of the map by applying the 
transform_keys function.
+ * Returns map with transformed key entries
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, func) - Transforms elements in a map using the 
function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
1);
+   map(array(2, 3, 4), array(1, 2, 3))
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
v);
+   map(array(2, 4, 6), array(1, 2, 3))
+  """,
+  since = "2.4.0")
+case class TransformKeys(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  override def dataType: DataType = {
+val map = argument.dataType.asInstanceOf[MapType]
+MapType(function.dataType, map.valueType, map.valueContainsNull)
+  }
+
+  @transient val MapType(keyType, valueType, valueContainsNull) = 
argument.dataType
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction): TransformKeys = {
+copy(function = f(function, (keyType, false) :: (valueType, 
valueContainsNull) :: Nil))
+  }
+
+  @transient lazy val (keyVar, valueVar) = {
+val LambdaFunction(
+_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: 
Nil, _) = function
+(keyVar, valueVar)
+  }
--- End diff --

nit: how about:

```scala
@transient lazy val LambdaFunction(_,
  (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, 
_) = function
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

2018-08-14 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22045
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94780/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #94780 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94780/testReport)**
 for PR 17086 at commit 
[`9e59fd5`](https://github.com/apache/spark/commit/9e59fd592e9cbe43e9fc3d5c317cd3c4e2d6ac43).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22013
  
Btw, we need one more right parenthesis after the second `array(1, 2, 3)` 
and a space at `(k,v)` in the description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-14 Thread deshanxiao
Github user deshanxiao commented on the issue:

https://github.com/apache/spark/pull/22109
  
please help me cc @HyukjinKwon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210161746
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 assert(ex5.getMessage.contains("function map_zip_with does not support 
ordering on type map"))
   }
 
+  test("transform keys function - test various primitive data types 
combinations") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0)
+).toDF("j")
+
+val dfExample3 = Seq(
+  Map[Int, Boolean](25 -> true, 26 -> false)
+).toDF("x")
+
+val dfExample4 = Seq(
+  Map[Array[Int], Boolean](Array(1, 2) -> false)
+).toDF("y")
+
+
+def testMapOfPrimitiveTypesCombination(): Unit = {
+  checkAnswer(dfExample1.selectExpr("transform_keys(i, (k, v) -> k + 
v)"),
+Seq(Row(Map(2 -> 1, 18 -> 9, 16 -> 8, 14 -> 7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, " +
+"(k, v) -> map_from_arrays(ARRAY(1, 2, 3), ARRAY('one', 'two', 
'three'))[k])"),
+Seq(Row(Map("one" -> 1.0, "two" -> 1.4, "three" -> 1.7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> 
CAST(v * 2 AS BIGINT) + k)"),
+Seq(Row(Map(3 -> 1.0, 4 -> 1.4, 6 -> 1.7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> k + 
v)"),
+Seq(Row(Map(2.0 -> 1.0, 3.4 -> 1.4, 4.7 -> 1.7
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) ->  k % 
2 = 0 OR v)"),
+Seq(Row(Map(true -> true, true -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 
2 * k, 3 * k))"),
+Seq(Row(Map(50 -> true, 78 -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 
2 * k, 3 * k))"),
+Seq(Row(Map(50 -> true, 78 -> false
+
+  checkAnswer(dfExample4.selectExpr("transform_keys(y, (k, v) -> 
array_contains(k, 3) AND v)"),
+Seq(Row(Map(false -> false
+}
+// Test with local relation, the Project will be evaluated without 
codegen
+testMapOfPrimitiveTypesCombination()
+dfExample1.cache()
+dfExample2.cache()
+dfExample3.cache()
+dfExample4.cache()
+// Test with cached relation, the Project will be evaluated with 
codegen
+testMapOfPrimitiveTypesCombination()
+  }
+
+  test("transform keys function - Invalid lambda functions and 
exceptions") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[String, String]("a" -> "b")
+).toDF("j")
+
+val dfExample3 = Seq(
+  Map[String, String]("a" -> null)
+).toDF("x")
+
+def testInvalidLambdaFunctions(): Unit = {
+  val ex1 = intercept[AnalysisException] {
+dfExample1.selectExpr("transform_keys(i, k -> k )")
+  }
+  assert(ex1.getMessage.contains("The number of lambda function 
arguments '1' does not match"))
+
+  val ex2 = intercept[AnalysisException] {
+dfExample2.selectExpr("transform_keys(j, (k, v, x) -> k + 1)")
+  }
+  assert(ex2.getMessage.contains(
+  "The number of lambda function arguments '3' does not match"))
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210161509
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 assert(ex5.getMessage.contains("function map_zip_with does not support 
ordering on type map"))
   }
 
+  test("transform keys function - test various primitive data types 
combinations") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0)
+).toDF("j")
+
+val dfExample3 = Seq(
+  Map[Int, Boolean](25 -> true, 26 -> false)
+).toDF("x")
+
+val dfExample4 = Seq(
+  Map[Array[Int], Boolean](Array(1, 2) -> false)
+).toDF("y")
+
+
+def testMapOfPrimitiveTypesCombination(): Unit = {
+  checkAnswer(dfExample1.selectExpr("transform_keys(i, (k, v) -> k + 
v)"),
+Seq(Row(Map(2 -> 1, 18 -> 9, 16 -> 8, 14 -> 7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, " +
+"(k, v) -> map_from_arrays(ARRAY(1, 2, 3), ARRAY('one', 'two', 
'three'))[k])"),
+Seq(Row(Map("one" -> 1.0, "two" -> 1.4, "three" -> 1.7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> 
CAST(v * 2 AS BIGINT) + k)"),
+Seq(Row(Map(3 -> 1.0, 4 -> 1.4, 6 -> 1.7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> k + 
v)"),
+Seq(Row(Map(2.0 -> 1.0, 3.4 -> 1.4, 4.7 -> 1.7
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) ->  k % 
2 = 0 OR v)"),
+Seq(Row(Map(true -> true, true -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 
2 * k, 3 * k))"),
+Seq(Row(Map(50 -> true, 78 -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 
2 * k, 3 * k))"),
+Seq(Row(Map(50 -> true, 78 -> false
+
+  checkAnswer(dfExample4.selectExpr("transform_keys(y, (k, v) -> 
array_contains(k, 3) AND v)"),
+Seq(Row(Map(false -> false
+}
+// Test with local relation, the Project will be evaluated without 
codegen
+testMapOfPrimitiveTypesCombination()
+dfExample1.cache()
+dfExample2.cache()
+dfExample3.cache()
+dfExample4.cache()
+// Test with cached relation, the Project will be evaluated with 
codegen
+testMapOfPrimitiveTypesCombination()
+  }
+
+  test("transform keys function - Invalid lambda functions and 
exceptions") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[String, String]("a" -> "b")
+).toDF("j")
+
+val dfExample3 = Seq(
+  Map[String, String]("a" -> null)
+).toDF("x")
+
+def testInvalidLambdaFunctions(): Unit = {
+  val ex1 = intercept[AnalysisException] {
+dfExample1.selectExpr("transform_keys(i, k -> k )")
--- End diff --

nit: extra space after `k -> k`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210162791
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala
 ---
@@ -283,6 +289,75 @@ class HigherOrderFunctionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
   15)
   }
 
+  test("TransformKeys") {
+val ai0 = Literal.create(
+  Map(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4),
+  MapType(IntegerType, IntegerType, valueContainsNull = false))
+val ai1 = Literal.create(
+  Map.empty[Int, Int],
+  MapType(IntegerType, IntegerType, valueContainsNull = true))
+val ai2 = Literal.create(
+  Map(1 -> 1, 2 -> null, 3 -> 3),
+  MapType(IntegerType, IntegerType, valueContainsNull = true))
+val ai3 = Literal.create(null, MapType(IntegerType, IntegerType, 
valueContainsNull = false))
+
+val plusOne: (Expression, Expression) => Expression = (k, v) => k + 1
+val plusValue: (Expression, Expression) => Expression = (k, v) => k + v
+val modKey: (Expression, Expression) => Expression = (k, v) => k % 3
+
+checkEvaluation(transformKeys(ai0, plusOne), Map(2 -> 1, 3 -> 2, 4 -> 
3, 5 -> 4))
+checkEvaluation(transformKeys(ai0, plusValue), Map(2 -> 1, 4 -> 2, 6 
-> 3, 8 -> 4))
+checkEvaluation(
+  transformKeys(transformKeys(ai0, plusOne), plusValue), Map(3 -> 1, 5 
-> 2, 7 -> 3, 9 -> 4))
+checkEvaluation(transformKeys(ai0, modKey),
+  ArrayBasedMapData(Array(1, 2, 0, 1), Array(1, 2, 3, 4)))
+checkEvaluation(transformKeys(ai1, plusOne), Map.empty[Int, Int])
+checkEvaluation(transformKeys(ai1, plusOne), Map.empty[Int, Int])
+checkEvaluation(
+  transformKeys(transformKeys(ai1, plusOne), plusValue), 
Map.empty[Int, Int])
+checkEvaluation(transformKeys(ai2, plusOne), Map(2 -> 1, 3 -> null, 4 
-> 3))
+checkEvaluation(
+  transformKeys(transformKeys(ai2, plusOne), plusOne), Map(3 -> 1, 4 
-> null, 5 -> 3))
+checkEvaluation(transformKeys(ai3, plusOne), null)
+
+val as0 = Literal.create(
+  Map("a" -> "xy", "bb" -> "yz", "ccc" -> "zx"),
+  MapType(StringType, StringType, valueContainsNull = false))
+val as1 = Literal.create(
+  Map("a" -> "xy", "bb" -> "yz", "ccc" -> null),
+  MapType(StringType, StringType, valueContainsNull = true))
+val as2 = Literal.create(null,
+  MapType(StringType, StringType, valueContainsNull = false))
+val asn = Literal.create(Map.empty[StringType, StringType],
--- End diff --

`as3`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210161936
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 assert(ex5.getMessage.contains("function map_zip_with does not support 
ordering on type map"))
   }
 
+  test("transform keys function - test various primitive data types 
combinations") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0)
--- End diff --

Do we need `E0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210162501
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/higher-order-functions.sql ---
@@ -51,3 +51,17 @@ select exists(ys, y -> y > 30) as v from nested;
 
 -- Check for element existence in a null array
 select exists(cast(null as array), y -> y > 30) as v;
+ 
+create or replace temporary view nested as values
+  (1, map(1,1,2,2,3,3)),
+  (2, map(4,4,5,5,6,6))
--- End diff --

nit:

```
  (1, map(1, 1, 2, 2, 3, 3)),
  (2, map(4, 4, 5, 5, 6, 6))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210161616
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 assert(ex5.getMessage.contains("function map_zip_with does not support 
ordering on type map"))
   }
 
+  test("transform keys function - test various primitive data types 
combinations") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0)
+).toDF("j")
+
+val dfExample3 = Seq(
+  Map[Int, Boolean](25 -> true, 26 -> false)
+).toDF("x")
+
+val dfExample4 = Seq(
+  Map[Array[Int], Boolean](Array(1, 2) -> false)
+).toDF("y")
+
+
+def testMapOfPrimitiveTypesCombination(): Unit = {
+  checkAnswer(dfExample1.selectExpr("transform_keys(i, (k, v) -> k + 
v)"),
+Seq(Row(Map(2 -> 1, 18 -> 9, 16 -> 8, 14 -> 7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, " +
+"(k, v) -> map_from_arrays(ARRAY(1, 2, 3), ARRAY('one', 'two', 
'three'))[k])"),
+Seq(Row(Map("one" -> 1.0, "two" -> 1.4, "three" -> 1.7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> 
CAST(v * 2 AS BIGINT) + k)"),
+Seq(Row(Map(3 -> 1.0, 4 -> 1.4, 6 -> 1.7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> k + 
v)"),
+Seq(Row(Map(2.0 -> 1.0, 3.4 -> 1.4, 4.7 -> 1.7
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) ->  k % 
2 = 0 OR v)"),
+Seq(Row(Map(true -> true, true -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 
2 * k, 3 * k))"),
+Seq(Row(Map(50 -> true, 78 -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 
2 * k, 3 * k))"),
+Seq(Row(Map(50 -> true, 78 -> false
+
+  checkAnswer(dfExample4.selectExpr("transform_keys(y, (k, v) -> 
array_contains(k, 3) AND v)"),
+Seq(Row(Map(false -> false
+}
+// Test with local relation, the Project will be evaluated without 
codegen
+testMapOfPrimitiveTypesCombination()
+dfExample1.cache()
+dfExample2.cache()
+dfExample3.cache()
+dfExample4.cache()
+// Test with cached relation, the Project will be evaluated with 
codegen
+testMapOfPrimitiveTypesCombination()
+  }
+
+  test("transform keys function - Invalid lambda functions and 
exceptions") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[String, String]("a" -> "b")
+).toDF("j")
+
+val dfExample3 = Seq(
+  Map[String, String]("a" -> null)
+).toDF("x")
+
+def testInvalidLambdaFunctions(): Unit = {
+  val ex1 = intercept[AnalysisException] {
+dfExample1.selectExpr("transform_keys(i, k -> k )")
+  }
+  assert(ex1.getMessage.contains("The number of lambda function 
arguments '1' does not match"))
+
+  val ex2 = intercept[AnalysisException] {
+dfExample2.selectExpr("transform_keys(j, (k, v, x) -> k + 1)")
+  }
+  assert(ex2.getMessage.contains(
+  "The number of lambda function arguments '3' does not match"))
+
+  val ex3 = intercept[RuntimeException] {
+dfExample3.selectExpr("transform_keys(x, (k, v) -> v)").show()
+  }
+  assert(ex3.getMessage.contains("Cannot use null as map key!"))
--- End diff --

Seems like we can do those tests only with `dfExample3`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210163358
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 assert(ex5.getMessage.contains("function map_zip_with does not support 
ordering on type map"))
   }
 
+  test("transform keys function - test various primitive data types 
combinations") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0)
+).toDF("j")
+
+val dfExample3 = Seq(
+  Map[Int, Boolean](25 -> true, 26 -> false)
+).toDF("x")
+
+val dfExample4 = Seq(
+  Map[Array[Int], Boolean](Array(1, 2) -> false)
+).toDF("y")
+
+
+def testMapOfPrimitiveTypesCombination(): Unit = {
+  checkAnswer(dfExample1.selectExpr("transform_keys(i, (k, v) -> k + 
v)"),
+Seq(Row(Map(2 -> 1, 18 -> 9, 16 -> 8, 14 -> 7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, " +
+"(k, v) -> map_from_arrays(ARRAY(1, 2, 3), ARRAY('one', 'two', 
'three'))[k])"),
+Seq(Row(Map("one" -> 1.0, "two" -> 1.4, "three" -> 1.7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> 
CAST(v * 2 AS BIGINT) + k)"),
+Seq(Row(Map(3 -> 1.0, 4 -> 1.4, 6 -> 1.7
+
+  checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> k + 
v)"),
+Seq(Row(Map(2.0 -> 1.0, 3.4 -> 1.4, 4.7 -> 1.7
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) ->  k % 
2 = 0 OR v)"),
+Seq(Row(Map(true -> true, true -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 
2 * k, 3 * k))"),
+Seq(Row(Map(50 -> true, 78 -> false
+
+  checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 
2 * k, 3 * k))"),
+Seq(Row(Map(50 -> true, 78 -> false
+
+  checkAnswer(dfExample4.selectExpr("transform_keys(y, (k, v) -> 
array_contains(k, 3) AND v)"),
+Seq(Row(Map(false -> false
+}
+// Test with local relation, the Project will be evaluated without 
codegen
+testMapOfPrimitiveTypesCombination()
+dfExample1.cache()
+dfExample2.cache()
+dfExample3.cache()
+dfExample4.cache()
+// Test with cached relation, the Project will be evaluated with 
codegen
+testMapOfPrimitiveTypesCombination()
+  }
+
+  test("transform keys function - Invalid lambda functions and 
exceptions") {
+val dfExample1 = Seq(
+  Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
+).toDF("i")
+
+val dfExample2 = Seq(
+  Map[String, String]("a" -> "b")
+).toDF("j")
+
+val dfExample3 = Seq(
+  Map[String, String]("a" -> null)
+).toDF("x")
+
+def testInvalidLambdaFunctions(): Unit = {
+  val ex1 = intercept[AnalysisException] {
+dfExample1.selectExpr("transform_keys(i, k -> k )")
+  }
+  assert(ex1.getMessage.contains("The number of lambda function 
arguments '1' does not match"))
+
+  val ex2 = intercept[AnalysisException] {
+dfExample2.selectExpr("transform_keys(j, (k, v, x) -> k + 1)")
+  }
+  assert(ex2.getMessage.contains(
+  "The number of lambda function arguments '3' does not match"))
+
+  val ex3 = intercept[RuntimeException] {
+dfExample3.selectExpr("transform_keys(x, (k, v) -> v)").show()
+  }
+  assert(ex3.getMessage.contains("Cannot use null as map key!"))
+}
+
+testInvalidLambdaFunctions()
+dfExample1.cache()
+dfExample2.cache()
+testInvalidLambdaFunctions()
--- End diff --

We need `dfExample3.cache()` as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210160909
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,65 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Transform Keys for every entry of the map by applying the 
transform_keys function.
+ * Returns map with transformed key entries
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, func) - Transforms elements in a map using the 
function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
1);
+   map(array(2, 3, 4), array(1, 2, 3))
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
v);
+   map(array(2, 4, 6), array(1, 2, 3))
+  """,
+  since = "2.4.0")
+case class TransformKeys(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  override def dataType: DataType = {
+val map = argument.dataType.asInstanceOf[MapType]
+MapType(function.dataType, map.valueType, map.valueContainsNull)
+  }
+
+  @transient val MapType(keyType, valueType, valueContainsNull) = 
argument.dataType
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction): TransformKeys = {
+copy(function = f(function, (keyType, false) :: (valueType, 
valueContainsNull) :: Nil))
+  }
+
+  @transient lazy val (keyVar, valueVar) = {
+val LambdaFunction(
+_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: 
Nil, _) = function
+(keyVar, valueVar)
+  }
+
+  override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): 
Any = {
+val map = argumentValue.asInstanceOf[MapData]
+val f = functionForEval
+val resultKeys = new GenericArrayData(new Array[Any](map.numElements))
+var i = 0
+while (i < map.numElements) {
+  keyVar.value.set(map.keyArray().get(i, keyVar.dataType))
+  valueVar.value.set(map.valueArray().get(i, valueVar.dataType))
+  val result = f.eval(inputRow)
+  if (result ==  null) {
+throw new RuntimeException("Cannot use null as map key!")
+  }
+  resultKeys.update(i, result)
+  i += 1
+}
+new ArrayBasedMapData(resultKeys, map.valueArray())
+  }
+
+  override def prettyName: String = "transform_keys"
+  }
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210159871
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,65 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Transform Keys for every entry of the map by applying the 
transform_keys function.
+ * Returns map with transformed key entries
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, func) - Transforms elements in a map using the 
function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
1);
+   map(array(2, 3, 4), array(1, 2, 3))
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
v);
+   map(array(2, 4, 6), array(1, 2, 3))
+  """,
+  since = "2.4.0")
+case class TransformKeys(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  override def dataType: DataType = {
+val map = argument.dataType.asInstanceOf[MapType]
+MapType(function.dataType, map.valueType, map.valueContainsNull)
+  }
+
+  @transient val MapType(keyType, valueType, valueContainsNull) = 
argument.dataType
+
+  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => 
LambdaFunction): TransformKeys = {
+copy(function = f(function, (keyType, false) :: (valueType, 
valueContainsNull) :: Nil))
+  }
+
+  @transient lazy val (keyVar, valueVar) = {
+val LambdaFunction(
+_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: 
Nil, _) = function
+(keyVar, valueVar)
+  }
+
+  override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): 
Any = {
+val map = argumentValue.asInstanceOf[MapData]
+val f = functionForEval
+val resultKeys = new GenericArrayData(new Array[Any](map.numElements))
+var i = 0
+while (i < map.numElements) {
+  keyVar.value.set(map.keyArray().get(i, keyVar.dataType))
+  valueVar.value.set(map.valueArray().get(i, valueVar.dataType))
+  val result = f.eval(inputRow)
+  if (result ==  null) {
--- End diff --

nit: extra space between `==` and `null`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210160419
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,65 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Transform Keys for every entry of the map by applying the 
transform_keys function.
+ * Returns map with transformed key entries
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, func) - Transforms elements in a map using the 
function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
1);
+   map(array(2, 3, 4), array(1, 2, 3))
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
v);
+   map(array(2, 4, 6), array(1, 2, 3))
+  """,
+  since = "2.4.0")
+case class TransformKeys(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  override def dataType: DataType = {
+val map = argument.dataType.asInstanceOf[MapType]
+MapType(function.dataType, map.valueType, map.valueContainsNull)
+  }
+
+  @transient val MapType(keyType, valueType, valueContainsNull) = 
argument.dataType
--- End diff --

`lazy val`?
Could you add a test when `argument` is not a map in invalid cases of 
`DataFrameFunctionsSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22013#discussion_r210160577
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -497,6 +497,65 @@ case class ArrayAggregate(
   override def prettyName: String = "aggregate"
 }
 
+/**
+ * Transform Keys for every entry of the map by applying the 
transform_keys function.
+ * Returns map with transformed key entries
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, func) - Transforms elements in a map using the 
function.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
1);
+   map(array(2, 3, 4), array(1, 2, 3))
+  > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 
v);
+   map(array(2, 4, 6), array(1, 2, 3))
+  """,
+  since = "2.4.0")
+case class TransformKeys(
+argument: Expression,
+function: Expression)
+  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
+
+  override def nullable: Boolean = argument.nullable
+
+  override def dataType: DataType = {
+val map = argument.dataType.asInstanceOf[MapType]
+MapType(function.dataType, map.valueType, map.valueContainsNull)
--- End diff --

We can use `valueType` and `valueContainsNull` from the following val?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21123
  
@gatorsmile Yea, it shouldn't break the compatibility for now for the 
reasons described in the PR I guess.

@rdblue, however, strictly isn't that specific to Datasource V2, or did I 
misunderstand? I was wondering what blocks this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

2018-08-14 Thread deshanxiao
Github user deshanxiao commented on the issue:

https://github.com/apache/spark/pull/22093
  
@xuanyuanking Thanks for your suggestion. We could indeed test whether 
`TaskCommitMessage` can be serialized by `KryoSerializer`. But We could not 
explain why the framework must serialize `TaskCommitMessage`. So we create our 
own `SparkContext`, It could be more convinced to show that it won't work 
because of the framework not registering `TaskCommitMessage`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22013
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94775/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22013
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22013
  
**[Test build #94775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94775/testReport)**
 for PR 22013 at commit 
[`5db526b`](https://github.com/apache/spark/commit/5db526be7bad0fa38dc9743c919014b475cf8aeb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22104
  
> we can implement a dummy data source v1/v2 at scala side

There's an example https://github.com/apache/spark/pull/21007 that 
implement something in Scala and use it in Python side test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22109
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21698
  
> ... assume that computation is idempotent - we do not support non 
determinism in computation

Ah this is a reasonable restriction, we should document it in the RDD 
classdoc. How about the source (root RDD or shuffle)? The output of reduce task 
is non-deterministic because Spark fetches multiple shuffle blocks at the same 
time and it's random which shuffle blocks can finish fetching first. External 
sorter has the same problem: the output order can change if spilling happens.

Generally I think there are 3 directions:
1. assume computing functions are idempotent, and also make Spark internal 
operations idempotent(reducer, external sorter, maybe more). I think this is 
hard to do, but should be the clearest semantic.
2. assume computing functions are idempotent and are insensitive to the 
input data order. Then Spark internal operations can have different output 
orders. An example is adding sort before round-robin, which makes this 
computing functions insensitive to the input data order. But I don't think it's 
reasonable to apply this restriction to all computing functions.
3. assume computing functions are random. This is not friendly to the 
scheduler, as it needs to be able to revert a finished task. We need to think 
about if it's possible to revert a result task.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94773/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #94773 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94773/testReport)**
 for PR 21306 at commit 
[`21acda2`](https://github.com/apache/spark/commit/21acda237744d4299e5bb449dce1ec8a1735f6de).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class CaseInsensitiveStringMap implements Map `
  * `public class Catalogs `
  * `  final class AddColumn implements TableChange `
  * `  final class RenameColumn implements TableChange `
  * `  final class UpdateColumn implements TableChange `
  * `  final class DeleteColumn implements TableChange `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22109
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22109
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem...

2018-08-14 Thread deshanxiao
GitHub user deshanxiao opened a pull request:

https://github.com/apache/spark/pull/22109

[SPARK-25120][CORE][HistoryServer]Fix the problem of EventLogListener may 
miss driver SparkListenerBloc…

## What changes were proposed in this pull request?

Sometimes in spark history tab "Executors" , it couldn't find driver 
information because the information of this page created by EventLog replaying 
the event of SparkListenerBlockManagerAdded. In SparkContext, driver registers 
blockmanager before adding the EventLogEventListener to LiveListenerBus. In 
this case,the  EventLogEventListener may miss the driver's  
SparkListenerBlockManagerAdded event and the history ui won't show the info of 
driver in "Executors" .

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deshanxiao/spark fix-jira25120

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22109


commit 26ca9c2c08c62961183e6461183c2963b6a00474
Author: xiaodeshan 
Date:   2018-08-15T02:49:25Z

fix the problem of EventLogListener may miss driver 
SparkListenerBlockManagerAdded event




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17461: [SPARK-20082][ml] LDA incremental model learning

2018-08-14 Thread sprintcheng
Github user sprintcheng commented on the issue:

https://github.com/apache/spark/pull/17461
  
May I know when this change being included into official release, I 
download spark 2.3.1 and still do NOT find this method(lda.setInitialModel) has 
been added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21537
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2201/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2200/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21537
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21561
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21537
  
**[Test build #94782 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94782/testReport)**
 for PR 21537 at commit 
[`508e091`](https://github.com/apache/spark/commit/508e091f53084deefc35001ce8d89455ca549e53).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21561
  
**[Test build #94781 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94781/testReport)**
 for PR 21561 at commit 
[`ecab85c`](https://github.com/apache/spark/commit/ecab85c921fbc81865b800be25d533fea8e75fd5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21537
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22024: [SPARK-25034][CORE] Remove allocations in onBlockFetchSu...

2018-08-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22024
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21537
  
Thanks @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...

2018-08-14 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/21561#discussion_r210158840
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -299,7 +299,7 @@ class KMeans private (
   val bcCenters = sc.broadcast(centers)
 
   // Find the new centers
-  val newCenters = data.mapPartitions { points =>
+  val collected = data.mapPartitions { points =>
--- End diff --

I am neutral on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...

2018-08-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21860#discussion_r210158345
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -232,6 +232,25 @@ class WholeStageCodegenSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("SPARK-24901 check merge FastHashMap and RegularHashMap generate 
code max size") {
--- End diff --

This test looks weird, it's tricky to check the generated code size as it 
can change later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...

2018-08-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21860#discussion_r210157862
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -853,33 +853,42 @@ case class HashAggregateExec(
 
 val updateRowInHashMap: String = {
   if (isFastHashMapEnabled) {
-ctx.INPUT_ROW = fastRowBuffer
-val boundUpdateExpr = 
updateExpr.map(BindReferences.bindReference(_, inputAttr))
-val subExprs = 
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
-val effectiveCodes = subExprs.codes.mkString("\n")
-val fastRowEvals = 
ctx.withSubExprEliminationExprs(subExprs.states) {
-  boundUpdateExpr.map(_.genCode(ctx))
-}
-val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) =>
-  val dt = updateExpr(i).dataType
-  CodeGenerator.updateColumn(
-fastRowBuffer, dt, i, ev, updateExpr(i).nullable, 
isVectorizedHashMapEnabled)
-}
+if (isVectorizedHashMapEnabled) {
+  ctx.INPUT_ROW = fastRowBuffer
+  val boundUpdateExpr = 
updateExpr.map(BindReferences.bindReference(_, inputAttr))
+  val subExprs = 
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
+  val effectiveCodes = subExprs.codes.mkString("\n")
+  val fastRowEvals = 
ctx.withSubExprEliminationExprs(subExprs.states) {
+boundUpdateExpr.map(_.genCode(ctx))
+  }
+  val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) 
=>
+val dt = updateExpr(i).dataType
+CodeGenerator.updateColumn(
+  fastRowBuffer, dt, i, ev, updateExpr(i).nullable, 
isVectorizedHashMapEnabled)
+  }
 
-// If fast hash map is on, we first generate code to update row in 
fast hash map, if the
-// previous loop up hit fast hash map. Otherwise, update row in 
regular hash map.
-s"""
-   |if ($fastRowBuffer != null) {
-   |  // common sub-expressions
-   |  $effectiveCodes
-   |  // evaluate aggregate function
-   |  ${evaluateVariables(fastRowEvals)}
-   |  // update fast row
-   |  ${updateFastRow.mkString("\n").trim}
-   |} else {
-   |  $updateRowInRegularHashMap
-   |}
-   """.stripMargin
+  // If fast hash map is on, we first generate code to update row 
in fast hash map, if the
+  // previous loop up hit fast hash map. Otherwise, update row in 
regular hash map.
+  s"""
+ |if ($fastRowBuffer != null) {
+ |  // common sub-expressions
+ |  $effectiveCodes
+ |  // evaluate aggregate function
+ |  ${evaluateVariables(fastRowEvals)}
+ |  // update fast row
+ |  ${updateFastRow.mkString("\n").trim}
+ |} else {
+ |  $updateRowInRegularHashMap
+ |}
+  """.stripMargin
+} else {
+  s"""
+ |if ($fastRowBuffer != null) {
+ |  $unsafeRowBuffer = $fastRowBuffer;
+ |}
+ |$updateRowInRegularHashMap
--- End diff --

can you add some comment to explain it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22107
  
Seems fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22104
  
@icexelloss we can implement a dummy data source v1/v2 at scala side and 
scan them in PySpark test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22107#discussion_r210157418
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), 
except(), and intersect() on a DataF
   unlink(jsonPath2)
 })
 
+test_that("intersectAll() and exceptAll()", {
+  df1 <- createDataFrame(
+list(list("a", 1),
+  list("a", 1),
+  list("a", 1),
+  list("a", 1),
+  list("b", 3),
+  list("c", 4)),
+  schema = c("a", "b"))
+  df2 <- createDataFrame(
+list(list("a", 1), list("a", 1), list("b", 3)),
+schema = c("a", "b"))
+  intersect_all_expected <- data.frame("a" = c("a", "a", "b"), "b" = c(1, 
1, 3),
+   stringsAsFactors = FALSE)
+  except_all_expected <- data.frame("a" = c("a", "a", "c"), "b" = c(1, 1, 
4),
+stringsAsFactors = FALSE)
+  intersect_all_df <- arrange(intersectAll(df1, df2), df1$a)
--- End diff --

Strictly, the naming rule is `intersectAllDf` or `intersect.all.df` (see 
https://github.com/apache/spark/pull/17590#issuecomment-293732796)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22107#discussion_r210157193
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), 
except(), and intersect() on a DataF
   unlink(jsonPath2)
 })
 
+test_that("intersectAll() and exceptAll()", {
+  df1 <- createDataFrame(
+list(list("a", 1),
+  list("a", 1),
+  list("a", 1),
+  list("a", 1),
+  list("b", 3),
+  list("c", 4)),
+  schema = c("a", "b"))
+  df2 <- createDataFrame(
+list(list("a", 1), list("a", 1), list("b", 3)),
+schema = c("a", "b"))
--- End diff --

nit:

```r
df2 <- createDataFrame(list(list("a", 1), list("a", 1), list("b", 3)), 
schema = c("a", "b"))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2199/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #94780 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94780/testReport)**
 for PR 17086 at commit 
[`9e59fd5`](https://github.com/apache/spark/commit/9e59fd592e9cbe43e9fc3d5c317cd3c4e2d6ac43).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22107#discussion_r210156892
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), 
except(), and intersect() on a DataF
   unlink(jsonPath2)
 })
 
+test_that("intersectAll() and exceptAll()", {
+  df1 <- createDataFrame(
+list(list("a", 1),
+  list("a", 1),
+  list("a", 1),
+  list("a", 1),
+  list("b", 3),
+  list("c", 4)),
--- End diff --

nit:

```r
list(list("a", 1), list("a", 1), list("a", 1),
 list("a", 1), list("b", 3), list("c", 4)),
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17086
  
@WeichenXu123, is it good to go?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17086
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22098: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22098
  
Ah, @shaneknapp, probably one possibility about the motivation is that we 
don't touch build configuration often and control the build time within the 
codebase so that committers don't have to bother you every time it needs. For 
instance, although I know how to control the build time too, I don't touch them.

Shall we just keep the current way and discuss if we happen to increase it 
again? I think 400m looks very enough as a limit, and we will put some efforts 
to reduce or keep the current limit I believe.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggregate.row....

2018-08-14 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21931
  
@kiszk , @cloud-fan
I was on vacation some time ago, I'm sorry to delayed reply. 
I have update it, Can you help to review it again if your have some times. 
thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22009
  
**[Test build #94779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94779/testReport)**
 for PR 22009 at commit 
[`f4f85a8`](https://github.com/apache/spark/commit/f4f85a833ef319a6860134e12655574aca081ed6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2198/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22009
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >