date:20180723

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1259/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93481/testReport)**
 for PR 21103 at commit 
[`f099cbf`](https://github.com/apache/spark/commit/f099cbff16c8ab3c6975837a38a984eb6a2fe1b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...

2018-07-23 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21850
  
retest this please




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93465/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #93465 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93465/testReport)**
 for PR 21546 at commit 
[`3224625`](https://github.com/apache/spark/commit/322462586e1c8ac301a44e6da47589a599e423d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21823
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21823
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93467/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21823
  
**[Test build #93467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93467/testReport)**
 for PR 21823 at commit 
[`f3a7963`](https://github.com/apache/spark/commit/f3a79636469470a620fc755374e2a90c2c39d3ba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21839: [SPARK-24339][SQL] Prunes the unused columns from child ...

2018-07-23 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/21839
  
Thanks for reviewing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongToUnsafeRowMap in ex...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21772
  
**[Test build #93480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93480/testReport)**
 for PR 21772 at commit 
[`c9ebfd0`](https://github.com/apache/spark/commit/c9ebfd0acdeefa1495b48df84b137ea213b2f7fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21854
  
**[Test build #93479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93479/testReport)**
 for PR 21854 at commit 
[`c1ce69c`](https://github.com/apache/spark/commit/c1ce69c9e33b461b83b1f158d300ad26da839e2d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21854
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1258/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21854
  
Actually I think `Rand` and `Randn` should also have the same issue. But I 
want to hear opinions first before dealing them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-23 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r204624088
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.spark.annotation.{Experimental, Since}
+
+
+/**
+ * :: Experimental ::
+ * Carries all task infos of a barrier task.
+ *
+ * @param address the IPv4 address(host:port) of the executor that a 
barrier task is running on
+ */
+@Experimental
+@Since("2.4.0")
+class BarrierTaskInfo(val address: String)
--- End diff --

We make this a public API because the `BarrierTaskContext. getTaskInfos()` 
will return a list of `BarrierTaskInfo`s, so users have to access the class.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #93478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93478/testReport)**
 for PR 16677 at commit 
[`d05c144`](https://github.com/apache/spark/commit/d05c144aecdd57f4ee3d179a240ccafa6c02bb66).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21854: [SPARK-24896][SQL] Uuid should produce different ...

2018-07-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21854#discussion_r204622960
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1392,3 +1394,17 @@ object UpdateNullabilityInAttributeReferences 
extends Rule[LogicalPlan] {
   }
   }
 }
+
+/**
+ * Set the seed for random number generation in Uuid expressions for 
streaming query.
+ */
+object ResolvedUuidExpressionsForStreaming extends Rule[LogicalPlan] {
+  private lazy val random = new Random()
+
+  override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
+case p => p transformExpressionsUp {
+  case Uuid(_) if p.isStreaming => Uuid(Some(random.nextLong()))
--- End diff --

Yeah, sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21853
  
**[Test build #93477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93477/testReport)**
 for PR 21853 at commit 
[`a86cb9f`](https://github.com/apache/spark/commit/a86cb9f8764ac4962905ee1b8772fec5692d4342).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21853
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1256/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21853
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21854: [SPARK-24896][SQL] Uuid should produce different ...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21854#discussion_r204622410
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1392,3 +1394,17 @@ object UpdateNullabilityInAttributeReferences 
extends Rule[LogicalPlan] {
   }
   }
 }
+
+/**
+ * Set the seed for random number generation in Uuid expressions for 
streaming query.
+ */
+object ResolvedUuidExpressionsForStreaming extends Rule[LogicalPlan] {
+  private lazy val random = new Random()
+
+  override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
+case p => p transformExpressionsUp {
+  case Uuid(_) if p.isStreaming => Uuid(Some(random.nextLong()))
--- End diff --

not a big deal at all but can we remove `(_)` like _: Uuid ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21851
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93469/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21854
  
**[Test build #93476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93476/testReport)**
 for PR 21854 at commit 
[`8ef299f`](https://github.com/apache/spark/commit/8ef299f19a16ed63187e36c333221995f0461731).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21854
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21851
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1255/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21851
  
**[Test build #93469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93469/testReport)**
 for PR 21851 at commit 
[`b499b97`](https://github.com/apache/spark/commit/b499b9727a4cb9cc42149d05a4d54dba2de8bd9e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class KnowNotNull(child: Expression) extends UnaryExpression `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21854: [SPARK-24896][SQL] Uuid should produce different ...

2018-07-23 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/21854

[SPARK-24896][SQL] Uuid should produce different values for each execution 
in streaming query

## What changes were proposed in this pull request?

`Uuid`'s results depend on random seed given during analysis. Thus under 
streaming query, we will have the same uuids in each execution.

## How was this patch tested?

Added test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 uuid_in_streaming

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21854.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21854


commit 8ef299f19a16ed63187e36c333221995f0461731
Author: Liang-Chi Hsieh 
Date:   2018-07-24T04:21:40Z

Uuid should produce different values in streaming query.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21850
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93468/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21850
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21850
  
**[Test build #93468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93468/testReport)**
 for PR 21850 at commit 
[`a9c97ce`](https://github.com/apache/spark/commit/a9c97ceb445d4de9b65dc21b2fad3c4c0d66efca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21772
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongHashedRelatio...

2018-07-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21772#discussion_r204618927
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
 ---
@@ -278,6 +278,39 @@ class HashedRelationSuite extends SparkFunSuite with 
SharedSQLContext {
 map.free()
   }
 
+  test("SPARK-24809: Serializing LongHashedRelation in executor may result 
in data error") {
+val unsafeProj = UnsafeProjection.create(Array[DataType](LongType))
+val originalMap = new LongToUnsafeRowMap(mm, 1)
+
+val key1 = 1L
+val value1 = new Random().nextLong()
+
+val key2 = 2L
+val value2 = new Random().nextLong()
+
+originalMap.append(key1, unsafeProj(InternalRow(value1)))
+originalMap.append(key2, unsafeProj(InternalRow(value2)))
+originalMap.optimize()
+
+val resultRow = new UnsafeRow(1)
+assert(originalMap.getValue(key1, resultRow).getLong(0) === value1)
+assert(originalMap.getValue(key2, resultRow).getLong(0) === value2)
+
+val ser = new KryoSerializer(
+(new SparkConf).set("spark.kryo.referenceTracking", 
"false")).newInstance()
+
+val mapSerializedInDriver = 
ser.deserialize[LongToUnsafeRowMap](ser.serialize(originalMap))
--- End diff --

nit:
```scala
// Simulate serialize/deserialize twice on driver and executor
val firstTimeSerialized = ...
val secondTimeSerialized = ...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongHashedRelatio...

2018-07-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21772#discussion_r204618745
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
 ---
@@ -278,6 +278,39 @@ class HashedRelationSuite extends SparkFunSuite with 
SharedSQLContext {
 map.free()
   }
 
+  test("SPARK-24809: Serializing LongHashedRelation in executor may result 
in data error") {
+val unsafeProj = UnsafeProjection.create(Array[DataType](LongType))
+val originalMap = new LongToUnsafeRowMap(mm, 1)
+
+val key1 = 1L
+val value1 = new Random().nextLong()
+
+val key2 = 2L
+val value2 = new Random().nextLong()
+
+originalMap.append(key1, unsafeProj(InternalRow(value1)))
+originalMap.append(key2, unsafeProj(InternalRow(value2)))
+originalMap.optimize()
+
+val resultRow = new UnsafeRow(1)
+assert(originalMap.getValue(key1, resultRow).getLong(0) === value1)
+assert(originalMap.getValue(key2, resultRow).getLong(0) === value2)
--- End diff --

We don't need to test `LongToUnsafeRowMap`'s normal feature here. We just 
need to verify the map after two ser/de can work normally.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-23 Thread Achuth17

Github user Achuth17 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21608#discussion_r204618663
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -148,6 +148,19 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
 }
   }
 
+  test("verify table size calculation is accurate") {
--- End diff --

@maropu, Does this test look okay?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21608
  
**[Test build #93475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93475/testReport)**
 for PR 21608 at commit 
[`4c405c5`](https://github.com/apache/spark/commit/4c405c52ff9e4893c82f3b7480a85ebc8219f588).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-23 Thread Achuth17

Github user Achuth17 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21608#discussion_r204618589
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
 ---
@@ -55,4 +57,11 @@ private[sql] object DataSourceV2Utils extends Logging {
 
 case _ => Map.empty
   }
+
+  // SPARK-15895: Metadata files (e.g. Parquet summary files) and 
temporary files should not be
--- End diff --

Moved it, I couldn't see that file before pulling the upstream master. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongHashedRelatio...

2018-07-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21772#discussion_r204618320
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
 ---
@@ -278,6 +278,39 @@ class HashedRelationSuite extends SparkFunSuite with 
SharedSQLContext {
 map.free()
   }
 
+  test("SPARK-24809: Serializing LongHashedRelation in executor may result 
in data error") {
+val unsafeProj = UnsafeProjection.create(Array[DataType](LongType))
+val originalMap = new LongToUnsafeRowMap(mm, 1)
+
+val key1 = 1L
+val value1 = new Random().nextLong()
+
+val key2 = 2L
+val value2 = new Random().nextLong()
--- End diff --

Is it necessary to use `Random` here? Can we use two arbitrary long values?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21772
  
As you actually modify `LongToUnsafeRowMap`, is it better to update the PR 
title and description to reflect that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21653
  
**[Test build #93474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93474/testReport)**
 for PR 21653 at commit 
[`b6585da`](https://github.com/apache/spark/commit/b6585da0f137d3d3675925368c4668c884de900c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongHashedRelatio...

2018-07-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21772#discussion_r204617884
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -772,6 +772,8 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 array = readLongArray(readBuffer, length)
 val pageLength = readLong().toInt
 page = readLongArray(readBuffer, pageLength)
+// Set cursor because cursor is used in write function.
--- End diff --

maybe: `Restore cursor variable to make this map able to be serialized 
again on executors`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21653
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21835: [SPARK-24779]Add sequence / map_concat / map_from_entrie...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21835
  
Seems fine except 
https://github.com/apache/spark/pull/21835#discussion_r204617441


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r204617441
  
--- Diff: R/pkg/tests/fulltests/test_context.R ---
@@ -21,10 +21,11 @@ test_that("Check masked functions", {
   # Check that we are not masking any new function from base, stats, 
testthat unexpectedly
   # NOTE: We should avoid adding entries to *namesOfMaskedCompletely* as 
masked functions make it
   # hard for users to use base R functions. Please check when in doubt.
-  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not")
+  namesOfMaskedCompletely <- c("cov", "filter", "sample", "not", 
"sequence")
--- End diff --

@felixcheung, I remember we discourage to exclude this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r204616561
  
--- Diff: R/pkg/R/functions.R ---
@@ -1986,15 +1998,20 @@ setMethod("levenshtein", signature(y = "Column"),
 #' are on the same day of month, or both are the last day of month, time 
of day will be ignored.
 #' Otherwise, the difference is calculated based on 31 days per month, and 
rounded to 8 digits.
 #'
+#' @param roundOff an optional parameter to specify if the result is 
rounded off to 8 digits
 #' @rdname column_datetime_diff_functions
 #' @aliases months_between months_between,Column-method
 #' @note months_between since 1.5.0
 setMethod("months_between", signature(y = "Column"),
-  function(y, x) {
+  function(y, x, roundOff = NULL) {
--- End diff --

Can we add a simple check if `roundOff` is a logical or not?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-23 Thread liutang123

Github user liutang123 commented on the issue:

https://github.com/apache/spark/pull/21772
  
@viirya This case  occurred in our cluster and we took a lot of time to 
find this bug.
For some man-made reasons, the small table's max id has become abnormally 
large. The LongHasedRelation generated based on the table was not optimized to 
`dense` and has become abnormally big(approximately 400MB).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21848: [SPARK-24890] [SQL] Short circuiting the `if` con...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21848#discussion_r204615261
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -403,14 +404,14 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
   e.copy(branches = newBranches)
 }
 
-  case e @ CaseWhen(branches, _) if branches.headOption.map(_._1) == 
Some(TrueLiteral) =>
+  case CaseWhen(branches, _) if 
branches.headOption.map(_._1).contains(TrueLiteral) =>
--- End diff --

Eh, in any event, wouldn't it be better to revert this change back if 
there's any actual advantage against a unrelated style change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21848: [SPARK-24890] [SQL] Short circuiting the `if` con...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21848#discussion_r204615040
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -651,6 +652,7 @@ object SimplifyCaseConversionExpressions extends 
Rule[LogicalPlan] {
   }
 }
 
+
--- End diff --

I think it's okay to remove it back though assuming from

> Use one or two blank line(s) to separate class definitions.


https://github.com/databricks/scala-style-guide#blank-lines-vertical-whitespace

Looks either way is fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongHashedRelatio...

2018-07-23 Thread liutang123

Github user liutang123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21772#discussion_r204613880
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala
 ---
@@ -278,6 +278,39 @@ class HashedRelationSuite extends SparkFunSuite with 
SharedSQLContext {
 map.free()
   }
 
+  test("SPARK-24809: Serializing LongHashedRelation in executor may result 
in data error") {
--- End diff --

I think this UT can cover the case I had met.
End-to-end test is too hard to structure because this case just occurs when 
executor's memory is not enough to hold the block and the broadcast cache is 
removed by the garbage collector.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-23 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r204613831
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,331 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType =
+ArrayType(elementType, 
left.dataType.asInstanceOf[ArrayType].containsNull)
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var exceptNullElement = true
--- End diff --

Ah, I mean `nullElement` for Array`except`. I will rename this to a better 
name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21772
  
**[Test build #93473 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93473/testReport)**
 for PR 21772 at commit 
[`06a9547`](https://github.com/apache/spark/commit/06a95472f1e889cdd21e7051fc906526f333f6d3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...

2018-07-23 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21853
  
Also, could you add `Closes #21049` in the description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21788: [SPARK-24609][ML][DOC] PySpark/SparkR doc doesn't explai...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21788
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21788: [SPARK-24609][ML][DOC] PySpark/SparkR doc doesn't explai...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21788
  
**[Test build #93472 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93472/testReport)**
 for PR 21788 at commit 
[`22396b0`](https://github.com/apache/spark/commit/22396b06066bb4befe83fcc60668d0380856d4e0).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...

2018-07-23 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21853
  
LGTM except for minor comments


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21788: [SPARK-24609][ML][DOC] PySpark/SparkR doc doesn't explai...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21788
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93472/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21853: [SPARK-23957][SQL] Sorts in subqueries are redund...

2018-07-23 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21853#discussion_r204612114
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -970,4 +973,300 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
 Row("3", "b") :: Row("4", "b") :: Nil)
 }
   }
+
+  private def getNumSortsInQuery(query: String): Int = {
+val plan = sql(query).queryExecution.optimizedPlan
+getNumSorts(plan) + getSubqueryExpressions(plan).map{s => 
getNumSorts(s.plan)}.sum
+  }
+
+  private def getSubqueryExpressions(plan: LogicalPlan): 
Seq[SubqueryExpression] = {
+val subqueryExpressions = ArrayBuffer.empty[SubqueryExpression]
+plan transformAllExpressions {
+  case s: SubqueryExpression =>
+subqueryExpressions ++= (getSubqueryExpressions(s.plan) :+ s)
+s
+}
+subqueryExpressions
+  }
+
+  private def getNumSorts(plan: LogicalPlan): Int = {
+plan.collect { case s: Sort => s }.size
+  }
+
+  test("SPARK-23957 Remove redundant sort from subquery plan(in 
subquery)") {
+withTempView("t1", "t2", "t3") {
+  Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
+  Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
+  Seq((1, 1, 1), (2, 2, 2)).toDF("c1", "c2", 
"c3").createOrReplaceTempView("t3")
+
+  // Simple order by
+  val query1 =
+"""
+   |SELECT c1 FROM t1
+   |WHERE
+   |c1 IN (SELECT c1 FROM t2 ORDER BY c1)
+""".stripMargin
+  assert(getNumSortsInQuery(query1) == 0)
+
+  // Nested order bys
+  val query2 =
+"""
+   |SELECT c1
+   |FROM   t1
+   |WHERE  c1 IN (SELECT c1
+   |  FROM   (SELECT *
+   |  FROM   t2
+   |  ORDER  BY c2)
+   |  ORDER  BY c1)
+""".stripMargin
+  assert(getNumSortsInQuery(query2) == 0)
+
+
+  // nested IN
+  val query3 =
+"""
+   |SELECT c1
+   |FROM   t1
+   |WHERE  c1 IN (SELECT c1
+   |  FROM   t2
+   |  WHERE  c1 IN (SELECT c1
+   |FROM   t3
+   |WHERE  c1 = 1
+   |ORDER  BY c3)
+   |  ORDER  BY c2)
+""".stripMargin
+  assert(getNumSortsInQuery(query3) == 0)
+
+  // Complex subplan and multiple sorts
+  val query4 =
+"""
+   |SELECT c1
+   |FROM   t1
+   |WHERE  c1 IN (SELECT c1
+   |  FROM   (SELECT c1, c2, count(*)
+   |  FROM   t2
+   |  GROUP BY c1, c2
+   |  HAVING count(*) > 0
+   |  ORDER BY c2)
+   |  ORDER  BY c1)
+""".stripMargin
+  assert(getNumSortsInQuery(query4) == 0)
+
+  // Join in subplan
+  val query5 =
+"""
+   |SELECT c1 FROM t1
+   |WHERE
+   |c1 IN (SELECT t2.c1 FROM t2, t3
+   |   WHERE t2.c1 = t3.c1
+   |   ORDER BY t2.c1)
+""".stripMargin
+  assert(getNumSortsInQuery(query5) == 0)
+
+  val query6 =
+"""
+   |SELECT c1
+   |FROM   t1
+   |WHERE  (c1, c2) IN (SELECT c1, max(c2)
+   |FROM   (SELECT c1, c2, count(*)
+   |FROM   t2
+   |GROUP BY c1, c2
+   |HAVING count(*) > 0
+   |ORDER BY c2)
+   |GROUP BY c1
+   |HAVING max(c2) > 0
+   |ORDER  BY c1)
+""".stripMargin
+  // The rule to remove redundant sorts is not able to remove the 
inner sort under
+  // an Aggregate operator. We only remove the top level sort.
+  assert(getNumSortsInQuery(query6) == 1)
+
+  // Cases when sort is not removed from the plan
+  // Limit on top of sort
+  val query7 =
+"""
+   |SELECT c1 FROM t1
+   |WHERE
+   |c1 IN (SELECT c1 FROM t2 ORDER BY c1 limit 1)
+""".stripMargin
+  assert(getNumSortsInQuery(query7) == 1)
+
+  // Sort below a set operations (intersect,

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21848
  
Hmm, seems we have limitation on where non deterministic expressions can be 
in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21853: [SPARK-23957][SQL] Sorts in subqueries are redund...

2018-07-23 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21853#discussion_r204609653
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -164,10 +164,20 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
* Optimize all the subqueries inside expression.
*/
   object OptimizeSubqueries extends Rule[LogicalPlan] {
+private def removeTopLevelSorts(plan: LogicalPlan): LogicalPlan = {
+  plan match {
+case Sort(_, _, child) => child
+case Project(fields, child) => Project(fields, 
removeTopLevelSorts(child))
+case other => other
+  }
+}
 def apply(plan: LogicalPlan): LogicalPlan = plan 
transformAllExpressions {
   case s: SubqueryExpression =>
 val Subquery(newPlan) = Optimizer.this.execute(Subquery(s.plan))
-s.withNewPlan(newPlan)
+// At this point we have an optimized subquery plan that we are 
going to attach
+// to this subquery expression. Here we can safely remove any top 
level sorts
--- End diff --

super nit: `any top level sort`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21853: [SPARK-23957][SQL] Sorts in subqueries are redund...

2018-07-23 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21853#discussion_r204609622
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -164,10 +164,20 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
* Optimize all the subqueries inside expression.
*/
   object OptimizeSubqueries extends Rule[LogicalPlan] {
+private def removeTopLevelSorts(plan: LogicalPlan): LogicalPlan = {
--- End diff --

nit: `removeTopLevelSort`? (I think this func removes a single sort on the 
top?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21853: [SPARK-23957][SQL] Sorts in subqueries are redund...

2018-07-23 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21853#discussion_r204609532
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -970,4 +973,300 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
 Row("3", "b") :: Row("4", "b") :: Nil)
 }
   }
+
+  private def getNumSortsInQuery(query: String): Int = {
+val plan = sql(query).queryExecution.optimizedPlan
+getNumSorts(plan) + getSubqueryExpressions(plan).map{s => 
getNumSorts(s.plan)}.sum
+  }
+
+  private def getSubqueryExpressions(plan: LogicalPlan): 
Seq[SubqueryExpression] = {
+val subqueryExpressions = ArrayBuffer.empty[SubqueryExpression]
+plan transformAllExpressions {
+  case s: SubqueryExpression =>
+subqueryExpressions ++= (getSubqueryExpressions(s.plan) :+ s)
+s
+}
+subqueryExpressions
+  }
+
+  private def getNumSorts(plan: LogicalPlan): Int = {
+plan.collect { case s: Sort => s }.size
+  }
+
+  test("SPARK-23957 Remove redundant sort from subquery plan(in 
subquery)") {
+withTempView("t1", "t2", "t3") {
+  Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
+  Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
+  Seq((1, 1, 1), (2, 2, 2)).toDF("c1", "c2", 
"c3").createOrReplaceTempView("t3")
+
+  // Simple order by
+  val query1 =
+"""
+   |SELECT c1 FROM t1
+   |WHERE
+   |c1 IN (SELECT c1 FROM t2 ORDER BY c1)
+""".stripMargin
+  assert(getNumSortsInQuery(query1) == 0)
+
+  // Nested order bys
+  val query2 =
+"""
+   |SELECT c1
+   |FROM   t1
+   |WHERE  c1 IN (SELECT c1
+   |  FROM   (SELECT *
+   |  FROM   t2
+   |  ORDER  BY c2)
+   |  ORDER  BY c1)
+""".stripMargin
+  assert(getNumSortsInQuery(query2) == 0)
+
+
+  // nested IN
+  val query3 =
+"""
+   |SELECT c1
+   |FROM   t1
+   |WHERE  c1 IN (SELECT c1
+   |  FROM   t2
+   |  WHERE  c1 IN (SELECT c1
+   |FROM   t3
+   |WHERE  c1 = 1
+   |ORDER  BY c3)
+   |  ORDER  BY c2)
+""".stripMargin
+  assert(getNumSortsInQuery(query3) == 0)
+
+  // Complex subplan and multiple sorts
+  val query4 =
+"""
+   |SELECT c1
+   |FROM   t1
+   |WHERE  c1 IN (SELECT c1
+   |  FROM   (SELECT c1, c2, count(*)
+   |  FROM   t2
+   |  GROUP BY c1, c2
+   |  HAVING count(*) > 0
+   |  ORDER BY c2)
+   |  ORDER  BY c1)
+""".stripMargin
+  assert(getNumSortsInQuery(query4) == 0)
+
+  // Join in subplan
+  val query5 =
+"""
+   |SELECT c1 FROM t1
+   |WHERE
+   |c1 IN (SELECT t2.c1 FROM t2, t3
+   |   WHERE t2.c1 = t3.c1
+   |   ORDER BY t2.c1)
+""".stripMargin
+  assert(getNumSortsInQuery(query5) == 0)
+
+  val query6 =
+"""
+   |SELECT c1
+   |FROM   t1
+   |WHERE  (c1, c2) IN (SELECT c1, max(c2)
+   |FROM   (SELECT c1, c2, count(*)
+   |FROM   t2
+   |GROUP BY c1, c2
+   |HAVING count(*) > 0
+   |ORDER BY c2)
+   |GROUP BY c1
+   |HAVING max(c2) > 0
+   |ORDER  BY c1)
+""".stripMargin
+  // The rule to remove redundant sorts is not able to remove the 
inner sort under
+  // an Aggregate operator. We only remove the top level sort.
+  assert(getNumSortsInQuery(query6) == 1)
+
+  // Cases when sort is not removed from the plan
+  // Limit on top of sort
+  val query7 =
+"""
+   |SELECT c1 FROM t1
+   |WHERE
+   |c1 IN (SELECT c1 FROM t2 ORDER BY c1 limit 1)
+""".stripMargin
+  assert(getNumSortsInQuery(query7) == 1)
+
+  // Sort below a set operations (intersect,

[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21837#discussion_r204609385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1430,6 +1431,18 @@ object SQLConf {
   "This only takes effect when spark.sql.repl.eagerEval.enabled is set 
to true.")
 .intConf
 .createWithDefault(20)
+
+  val AVRO_COMPRESSION_CODEC = 
buildConf("spark.sql.avro.compression.codec")
+.doc("Compression codec used in writing of AVRO files.")
+.stringConf
+.createWithDefault("snappy")
+
+  val AVRO_DEFLATE_LEVEL = buildConf("spark.sql.avro.deflate.level")
+.doc("Compression level for the deflate codec used in writing of AVRO 
files. " +
+  "Valid value must be in the range of from 1 to 9 inclusive. " +
--- End diff --

This can be -1 right (https://www.zlib.net/manual.html)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21837#discussion_r204609328
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1430,6 +1431,18 @@ object SQLConf {
   "This only takes effect when spark.sql.repl.eagerEval.enabled is set 
to true.")
 .intConf
 .createWithDefault(20)
+
+  val AVRO_COMPRESSION_CODEC = 
buildConf("spark.sql.avro.compression.codec")
+.doc("Compression codec used in writing of AVRO files.")
+.stringConf
+.createWithDefault("snappy")
+
+  val AVRO_DEFLATE_LEVEL = buildConf("spark.sql.avro.deflate.level")
+.doc("Compression level for the deflate codec used in writing of AVRO 
files. " +
+  "Valid value must be in the range of from 1 to 9 inclusive. " +
+  "The default value is -1 which corresponds to 6 level in the current 
implementation.")
--- End diff --

can we do the check like `checkValue(_ => -1, ...)` here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...

2018-07-23 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21608#discussion_r204609271
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
 ---
@@ -55,4 +57,11 @@ private[sql] object DataSourceV2Utils extends Logging {
 
 case _ => Map.empty
   }
+
+  // SPARK-15895: Metadata files (e.g. Parquet summary files) and 
temporary files should not be
--- End diff --

Why do you use `DataSourceV2Utils` instead of `DataSourceUtils`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21837#discussion_r204608704
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1430,6 +1431,18 @@ object SQLConf {
   "This only takes effect when spark.sql.repl.eagerEval.enabled is set 
to true.")
 .intConf
 .createWithDefault(20)
+
+  val AVRO_COMPRESSION_CODEC = 
buildConf("spark.sql.avro.compression.codec")
+.doc("Compression codec used in writing of AVRO files.")
+.stringConf
+.createWithDefault("snappy")
+
+  val AVRO_DEFLATE_LEVEL = buildConf("spark.sql.avro.deflate.level")
+.doc("Compression level for the deflate codec used in writing of AVRO 
files. " +
+  "Valid value must be in the range of from 1 to 9 inclusive. " +
+  "The default value is -1 which corresponds to 6 level in the current 
implementation.")
--- End diff --

Per 
https://github.com/apache/spark/pull/21837/files/f8b580ba33736a19fb14a6d7fa9fc929b4cf20ba#r204300978,
 I guess the default compression level still looks 6 (from reading 
https://www.zlib.net/manual.html)? I think we better describe what -1 means 
here as well

Also, can we do the check like `checkValue(_ => -1, ...)` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21752: [SPARK-24788][SQL] fixed UnresolvedException when toStri...

2018-07-23 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21752
  
cc: @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21837#discussion_r204608456
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1430,6 +1431,18 @@ object SQLConf {
   "This only takes effect when spark.sql.repl.eagerEval.enabled is set 
to true.")
 .intConf
 .createWithDefault(20)
+
+  val AVRO_COMPRESSION_CODEC = 
buildConf("spark.sql.avro.compression.codec")
+.doc("Compression codec used in writing of AVRO files.")
+.stringConf
+.createWithDefault("snappy")
--- End diff --

can we `.checkValues(Set(` too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21837#discussion_r204608411
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1430,6 +1431,18 @@ object SQLConf {
   "This only takes effect when spark.sql.repl.eagerEval.enabled is set 
to true.")
 .intConf
 .createWithDefault(20)
+
+  val AVRO_COMPRESSION_CODEC = 
buildConf("spark.sql.avro.compression.codec")
+.doc("Compression codec used in writing of AVRO files.")
+.stringConf
+.createWithDefault("snappy")
+
+  val AVRO_DEFLATE_LEVEL = buildConf("spark.sql.avro.deflate.level")
+.doc("Compression level for the deflate codec used in writing of AVRO 
files. " +
+  "Valid value must be in the range of from 1 to 9 inclusive. " +
+  "The default value is -1 which corresponds to 6 level in the current 
implementation.")
--- End diff --

Per 
https://github.com/apache/spark/pull/21837/files/f8b580ba33736a19fb14a6d7fa9fc929b4cf20ba#r204300978,
 I guess the default compression level is not 6? I think we better find out 
what -1 means and describe it here.

Also, can we do the check like `checkValue(_ => -1, ...)` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21848
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93471/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21848
  
**[Test build #93471 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93471/testReport)**
 for PR 21848 at commit 
[`bf0b2d9`](https://github.com/apache/spark/commit/bf0b2d91a89ec7913db2f86b632e6950ac490f70).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21848
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r204607379
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
 ---
@@ -0,0 +1,388 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.scalatest.BeforeAndAfterAll
+import org.scalatest.exceptions.TestFailedException
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.NamedExpression
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types._
+
+// scalastyle:off line.size.limit
+class SelectedFieldSuite extends SparkFunSuite with BeforeAndAfterAll {
+  // The test schema as a tree string, i.e. `schema.treeString`
+  // root
+  //  |-- col1: string (nullable = false)
+  //  |-- col2: struct (nullable = true)
+  //  ||-- field1: integer (nullable = true)
+  //  ||-- field2: array (nullable = true)
+  //  |||-- element: integer (containsNull = false)
+  //  ||-- field3: array (nullable = false)
+  //  |||-- element: struct (containsNull = true)
+  //  ||||-- subfield1: integer (nullable = true)
+  //  ||||-- subfield2: integer (nullable = true)
+  //  ||||-- subfield3: array (nullable = true)
+  //  |||||-- element: integer (containsNull = true)
+  //  ||-- field4: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: struct (valueContainsNull = false)
+  //  ||||-- subfield1: integer (nullable = true)
+  //  ||||-- subfield2: array (nullable = true)
+  //  |||||-- element: integer (containsNull = false)
+  //  ||-- field5: array (nullable = false)
+  //  |||-- element: struct (containsNull = true)
+  //  ||||-- subfield1: struct (nullable = false)
+  //  |||||-- subsubfield1: integer (nullable = true)
+  //  |||||-- subsubfield2: integer (nullable = true)
+  //  ||||-- subfield2: struct (nullable = true)
+  //  |||||-- subsubfield1: struct (nullable = true)
+  //  ||||||-- subsubsubfield1: string (nullable = 
true)
+  //  |||||-- subsubfield2: integer (nullable = true)
+  //  ||-- field6: struct (nullable = true)
+  //  |||-- subfield1: string (nullable = false)
+  //  |||-- subfield2: string (nullable = true)
+  //  ||-- field7: struct (nullable = true)
+  //  |||-- subfield1: struct (nullable = true)
+  //  ||||-- subsubfield1: integer (nullable = true)
+  //  ||||-- subsubfield2: integer (nullable = true)
+  //  ||-- field8: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: array (valueContainsNull = false)
+  //  ||||-- element: struct (containsNull = true)
+  //  |||||-- subfield1: integer (nullable = true)
+  //  |||||-- subfield2: array (nullable = true)
+  //  ||||||-- element: integer (containsNull = false)
+  //  ||-- field9: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: integer (valueContainsNull = false)
+  //  |-- col3: array (nullable = false)
+  //  ||-- element: struct (containsNull = false)
+  //  |||-- field1: struct (nullable = true)
+  //  ||||-- subfield1: integer (nullable = false)
+  //  ||||-- subfield2: integer (nullable = true)
+  //  |||-- field2: map (nullable = true)
+  //  ||||-- key: string
+  //  ||||-- value: integer

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21546
  
SGTM except the discussion going on in 
https://github.com/apache/spark/pull/21546#discussion_r204324646


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21546#discussion_r204606545
  
--- Diff: python/pyspark/serializers.py ---
@@ -184,27 +184,67 @@ def loads(self, obj):
 raise NotImplementedError
 
 
-class ArrowSerializer(FramedSerializer):
+class BatchOrderSerializer(Serializer):
--- End diff --

Thanks for elaborating this, @BryanCutler. Would you mind if I ask to add 
this separately in a separate PR? I am actually not super sure on this ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21788: [SPARK-24609][ML][DOC] PySpark/SparkR doc doesn't explai...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21788
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1254/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21788: [SPARK-24609][ML][DOC] PySpark/SparkR doc doesn't explai...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21788
  
**[Test build #93472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93472/testReport)**
 for PR 21788 at commit 
[`22396b0`](https://github.com/apache/spark/commit/22396b06066bb4befe83fcc60668d0380856d4e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21788: [SPARK-24609][ML][DOC] PySpark/SparkR doc doesn't explai...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21788
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-23 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21834
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21848
  
**[Test build #93471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93471/testReport)**
 for PR 21848 at commit 
[`bf0b2d9`](https://github.com/apache/spark/commit/bf0b2d91a89ec7913db2f86b632e6950ac490f70).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21848
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21848
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1253/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21822
  
**[Test build #93470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93470/testReport)**
 for PR 21822 at commit 
[`38980ad`](https://github.com/apache/spark/commit/38980ad066d26327387673910e0dfd981102cab9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21851
  
**[Test build #93469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93469/testReport)**
 for PR 21851 at commit 
[`b499b97`](https://github.com/apache/spark/commit/b499b9727a4cb9cc42149d05a4d54dba2de8bd9e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21789
  
I think you better take a look for it because that looks related with the 
current change. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21851
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1251/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-23 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21848
  
This will simply the scope of this PR a lot by just having both 
`AssertTrue` and `AssertNotNull` as `non-deterministic` expression. My concern 
is the more  `non-deterministic` expressions we have, the less optimization we 
can do. Luckily, both of them are not used in general expressions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1252/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule

2018-07-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21851
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21822
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21837#discussion_r204601582
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala ---
@@ -68,4 +70,25 @@ class AvroOptions(
   .map(_.toBoolean)
   .getOrElse(!ignoreFilesWithoutExtension)
   }
+
+  /**
+   * The `compression` option allows to specify a compression codec used 
in write.
+   * Currently supported codecs are `uncompressed`, `snappy` and `deflate`.
+   * If the option is not set, the `snappy` compression is used by default.
+   */
+  val compression: String = 
parameters.get("compression").getOrElse(sqlConf.avroCompressionCodec)
+
+
+  /**
+   * Level of compression in the range of 1..9 inclusive. 1 - for fast, 9 
- for best compression.
+   * If the compression level is not set for `deflate` compression, the 
current value of SQL
+   * config `spark.sql.avro.deflate.level` is used by default. For other 
compressions, the default
+   * value is `6`.
+   */
+  val compressionLevel: Int = {
--- End diff --

Can we don't expose this as an option for now?  IIUC, this compression 
level only applies to deflate, right? Also, this option looks not for backward 
compatibility as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15071: [SPARK-17517][SQL]Improve generated Code for Broa...

2018-07-23 Thread yaooqinn

Github user yaooqinn closed the pull request at:

https://github.com/apache/spark/pull/15071


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21635


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21837: [SPARK-24881][SQL] New Avro options - compression...

2018-07-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21837#discussion_r204601282
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -896,4 +896,33 @@ class AvroSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils {
   assert(count == 8)
 }
   }
+
+  test("SPARK-24881: write with compression - avro options") {
+withTempPath { dir =>
+  val uncompressDir = s"$dir/uncompress"
+  val deflateDir = s"$dir/deflate"
+  val snappyDir = s"$dir/snappy"
+
+  val df = spark.read.format("avro").load(testAvro)
+  df.write
+.option("compression", "uncompressed")
+.format("avro")
+.save(uncompressDir)
+  df.write
+.options(Map("compression" -> "deflate", "compressionLevel" -> 
"9"))
+.format("avro")
+.save(deflateDir)
+  df.write
+.option("compression", "snappy")
+.format("avro")
+.save(snappyDir)
+
+  val uncompressSize = FileUtils.sizeOfDirectory(new 
File(uncompressDir))
+  val deflateSize = FileUtils.sizeOfDirectory(new File(deflateDir))
--- End diff --

Thank you, @MaxGekk. Can we then check the type of compression at least 
`avro.codec deflate`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-23 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21845
  
Are more pull requests failing due to time out right now?

On Mon, Jul 23, 2018 at 6:30 PM Hyukjin Kwon 
wrote:

> @rxin , btw you want me close this one or get
> this in? Will take a look for the build and tests thing again during this
> week for sure anyway.
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21474: [SPARK-24297][CORE] Fetch-to-disk by default for > 2gb

2018-07-23 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21474
  
Hi @squito , would you please also update the changes in the doc, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 484 matches

Mail list logo