date:20181125

[GitHub] spark pull request #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule Repla...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23139#discussion_r236150260
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{And, ArrayExists, 
ArrayFilter, CaseWhen, Expression, If}
+import org.apache.spark.sql.catalyst.expressions.{LambdaFunction, Literal, 
MapFilter, Or}
+import org.apache.spark.sql.catalyst.expressions.Literal.FalseLiteral
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types.BooleanType
+
+
+/**
+ * A rule that replaces `Literal(null, BooleanType)` with `FalseLiteral`, 
if possible, in the search
+ * condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an 
implicit Boolean operator
+ * "(search condition) = TRUE". The replacement is only valid when 
`Literal(null, BooleanType)` is
+ * semantically equivalent to `FalseLiteral` when evaluating the whole 
search condition.
+ *
+ * Please note that FALSE and NULL are not exchangeable in most cases, 
when the search condition
+ * contains NOT and NULL-tolerant expressions. Thus, the rule is very 
conservative and applicable
+ * in very limited cases.
+ *
+ * For example, `Filter(Literal(null, BooleanType))` is equal to 
`Filter(FalseLiteral)`.
+ *
+ * Another example containing branches is `Filter(If(cond, FalseLiteral, 
Literal(null, _)))`;
+ * this can be optimized to `Filter(If(cond, FalseLiteral, 
FalseLiteral))`, and eventually
+ * `Filter(FalseLiteral)`.
+ *
+ * Moreover, this rule also transforms predicates in all [[If]] 
expressions as well as branch
+ * conditions in all [[CaseWhen]] expressions, even if they are not part 
of the search conditions.
+ *
+ * For example, `Project(If(And(cond, Literal(null)), Literal(1), 
Literal(2)))` can be simplified
+ * into `Project(Literal(2))`.
+ */
+object ReplaceNullWithFalseInPredicate extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(cond, _) => f.copy(condition = 
replaceNullWithFalse(cond))
+case j @ Join(_, _, _, Some(cond)) => j.copy(condition = 
Some(replaceNullWithFalse(cond)))
+case p: LogicalPlan => p transformExpressions {
+  case i @ If(pred, _, _) => i.copy(predicate = 
replaceNullWithFalse(pred))
+  case cw @ CaseWhen(branches, _) =>
+val newBranches = branches.map { case (cond, value) =>
+  replaceNullWithFalse(cond) -> value
+}
+cw.copy(branches = newBranches)
+  case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+af.copy(function = newLambda)
+  case ae @ ArrayExists(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+ae.copy(function = newLambda)
+  case mf @ MapFilter(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+mf.copy(function = newLambda)
+}
+  }
+
+  /**
+   * Recursively traverse the Boolean-type expression to replace
+   * `Literal(null, BooleanType)` with `FalseLiteral`, if possible.
+   *
+   * Note that `transformExpressionsDown` can not be used here as we must 
stop as soon as we hit
+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or
+   * `Literal(null, BooleanType)`.
+   */
+  private def replaceNullWithFalse(e: Expression): Expression = {
+if (e.dataType != BooleanType) {
--- End diff --

We don't handle `LambdaFunction` inside this method, it's caller side.

[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...

2018-11-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23135


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5351/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23083
  
**[Test build #99267 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99267/testReport)**
 for PR 23083 at commit 
[`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23130#discussion_r236149203
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -388,7 +388,7 @@ case class FileSourceScanExec(
 logInfo(s"Planning with ${bucketSpec.numBuckets} buckets")
 val filesGroupedToBuckets =
   selectedPartitions.flatMap { p =>
-p.files.map { f =>
+p.files.filter(_.getLen > 0).map { f =>
--- End diff --

do you mean changing `filter...map...` to `flatMap`? I don't have a strong 
preference about it.

The updated test cases and the new test case are for this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23137


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23135
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23083
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23135
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23135
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23137
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23135
  
**[Test build #99257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99257/testReport)**
 for PR 23135 at commit 
[`cd682ff`](https://github.com/apache/spark/commit/cd682ff4377856b969f4745f782b7f49f2fc85c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23142
  
**[Test build #99266 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99266/testReport)**
 for PR 23142 at commit 
[`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23142: [SPARK-26170][SS] Add missing metrics in FlatMapG...

2018-11-25 Thread HeartSaVioR

GitHub user HeartSaVioR opened a pull request:

https://github.com/apache/spark/pull/23142

[SPARK-26170][SS] Add missing metrics in FlatMapGroupsWithState

## What changes were proposed in this pull request?

This patch addresses measuring possible metrics in StateStoreWriter to 
FlatMapGroupsWithStateExec. Please note that some metrics like time to remove 
elements are not addressed because they are coupled with state function.

## How was this patch tested?

Manually tested with 
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala.

Snapshots below:

![screen shot 2018-11-26 at 4 13 40 
pm](https://user-images.githubusercontent.com/1317309/48999346-b5f7b400-f199-11e8-89c7-8795f13470d6.png)
![screen shot 2018-11-26 at 4 13 54 
pm](https://user-images.githubusercontent.com/1317309/48999347-b5f7b400-f199-11e8-91ef-ef0b2f816b2e.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HeartSaVioR/spark SPARK-26170

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23142


commit 56f39cc5838c3f609c8657639ac3a45991fde99f
Author: Jungtaek Lim (HeartSaVioR) 
Date:   2018-11-26T07:33:08Z

SPARK-26170 Add missing metrics in FlatMapGroupsWithState




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23137
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23137
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99256/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23137
  
**[Test build #99256 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99256/testReport)**
 for PR 23137 at commit 
[`5cfe08d`](https://github.com/apache/spark/commit/5cfe08d75383069d0ac62f9603685ea1860b74e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23141: [SPARK-26021][SQL][followup] add test for special...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23141#discussion_r236143942
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -17,14 +17,16 @@ displayTitle: Spark SQL Upgrading Guide
 
   - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`.
 
-  - In Spark version 2.4 and earlier, the `from_json` function produces 
`null`s for JSON strings and JSON datasource skips the same independetly of its 
mode if there is no valid root JSON token in its input (` ` for example). Since 
Spark 3.0, such input is treated as a bad record and handled according to 
specified mode. For example, in the `PERMISSIVE` mode the ` ` input is 
converted to `Row(null, null)` if specified schema is `key STRING, value INT`. 
+  - In Spark version 2.4 and earlier, the `from_json` function produces 
`null`s for JSON strings and JSON datasource skips the same independetly of its 
mode if there is no valid root JSON token in its input (` ` for example). Since 
Spark 3.0, such input is treated as a bad record and handled according to 
specified mode. For example, in the `PERMISSIVE` mode the ` ` input is 
converted to `Row(null, null)` if specified schema is `key STRING, value INT`.
 
   - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
 
   - In Spark version 2.4 and earlier, users can create map values with map 
type key via built-in function like `CreateMap`, `MapFromArrays`, etc. Since 
Spark 3.0, it's not allowed to create map values with map type key with these 
built-in functions. Users can still read map values with map type key from data 
source or Java/Scala collections, though they are not very useful.
-  
+
   - In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a 
grouped dataset with key attribute wrongly named as "value", if the key is 
non-struct type, e.g. int, string, array, etc. This is counterintuitive and 
makes the schema of aggregation queries weird. For example, the schema of 
`ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the 
grouping attribute to "key". The old behaviour is preserved under a newly added 
configuration `spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue` with a 
default value of `false`.
 
+  - In Spark version 2.4 and earlier, float/double -0.0 is semantically 
equal to 0.0, but users can still distinguish them via `Dataset.show`, 
`Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0 
internally, and users can't distinguish them any more.
--- End diff --

I checked presto and postgres, the behaviors are same. Hive distinguishes 
-0.0 and 0.0, but it has the group by bug.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...

2018-11-25 Thread guoxiaolongzte

Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/23104#discussion_r236143436
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
   val newJoin = joinType match {
 case RightOuter => join.copy(right = maybePushLocalLimit(exp, 
right))
 case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+case Cross => join.copy(left = maybePushLocalLimit(exp, left), 
right = maybePushLocalLimit(exp, right))
--- End diff --

I think, if when set spark.sql.crossJoin.enabled=true, if Inner join 
without condition, LeftOuter join without condition, RightOuter join without 
condition, FullOuter join without condition , limit should be pushed down on 
both sides, just like cross join limit in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23138
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23138
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5350/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...

2018-11-25 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/23104
  
> The title has a typo.

Sorry, it has been fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23138
  
**[Test build #99265 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99265/testReport)**
 for PR 23138 at commit 
[`471d114`](https://github.com/apache/spark/commit/471d1144d41f767b3227d78b663eaa79efef738c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22163
  
**[Test build #99264 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99264/testReport)**
 for PR 22163 at commit 
[`90726db`](https://github.com/apache/spark/commit/90726dbcbde2c5f165a870a8038488f09a3c92d2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22163
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5349/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22163
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23104#discussion_r236137768
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
   val newJoin = joinType match {
 case RightOuter => join.copy(right = maybePushLocalLimit(exp, 
right))
 case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+case Cross => join.copy(left = maybePushLocalLimit(exp, left), 
right = maybePushLocalLimit(exp, right))
--- End diff --

@guoxiaolongzte nope. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/23104
  
The title has a typo. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23104#discussion_r236137426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
   val newJoin = joinType match {
 case RightOuter => join.copy(right = maybePushLocalLimit(exp, 
right))
 case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+case Cross => join.copy(left = maybePushLocalLimit(exp, left), 
right = maybePushLocalLimit(exp, right))
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

2018-11-25 Thread guoxiaolongzte

Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/23104#discussion_r236137253
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
   val newJoin = joinType match {
 case RightOuter => join.copy(right = maybePushLocalLimit(exp, 
right))
 case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+case Cross => join.copy(left = maybePushLocalLimit(exp, left), 
right = maybePushLocalLimit(exp, right))
--- End diff --

When set  spark.sql.crossJoin.enabled=true, 
inner join without condition, LeftOuter without condition, RightOuter 
without condition, FullOuter without conditionï¼ all these are  iterally cross 
join?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23141
  
**[Test build #99263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99263/testReport)**
 for PR 23141 at commit 
[`8a9103c`](https://github.com/apache/spark/commit/8a9103c47931eb61cb329ece046d5efc50e855c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23141
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23141
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5348/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23141: [SPARK-26021][SQL][followup] add test for special floati...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23141
  
cc @adoron @kiszk @viirya


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23141: [SPARK-26021][SQL][followup] add test for special...

2018-11-25 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/23141

[SPARK-26021][SQL][followup] add test for special floating point values

## What changes were proposed in this pull request?

a followup of https://github.com/apache/spark/pull/23124 . Add a test to 
show the minor behavior change introduced by #23124 , and add migration guide.

## How was this patch tested?

a new test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark follow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23141


commit 8a9103c47931eb61cb329ece046d5efc50e855c2
Author: Wenchen Fan 
Date:   2018-11-26T06:11:09Z

add test for special floating point values




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23138: [SPARK-23356][SQL][TEST] add new test cases for a...

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23138#discussion_r236136273
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationSuite.scala
 ---
@@ -196,4 +196,31 @@ class SetOperationSuite extends PlanTest {
   ))
 comparePlans(expectedPlan, rewrittenPlan)
   }
+
+  test("SPARK-23356 union: expressions in project list are addition to 
each side") {
+val unionQuery = testUnion.select(('a + 1).as("aa"))
+val unionOptimized = Optimize.execute(unionQuery.analyze)
+val unionCorrectAnswer =
+  Union(testRelation.select(('a + 1).as("aa")) ::
+testRelation2.select(('d + 1).as("aa")) ::
+testRelation3.select(('g + 1).as("aa")) :: Nil).analyze
+comparePlans(unionOptimized, unionCorrectAnswer)
+  }
+
+  test("SPARK-23356 union: expressions in project list are attribute 
addition to each side") {
+val unionQuery = testUnion.select(('a + 'b).as("ab"))
+val unionOptimized = Optimize.execute(unionQuery.analyze)
+val unionCorrectAnswer =
+  Union(testRelation.select(('a + 'b).as("ab")) ::
+testRelation2.select(('d + 'e).as("ab")) ::
+testRelation3.select(('g + 'h).as("ab")) :: Nil).analyze
+comparePlans(unionOptimized, unionCorrectAnswer)
+  }
+
+  test("SPARK-23356 union: project don't each side with non-deterministic 
expression") {
--- End diff --

no pushdown for non-deterministic expression


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23138: [SPARK-23356][SQL][TEST] add new test cases for a...

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23138#discussion_r236136228
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationSuite.scala
 ---
@@ -196,4 +196,31 @@ class SetOperationSuite extends PlanTest {
   ))
 comparePlans(expectedPlan, rewrittenPlan)
   }
+
+  test("SPARK-23356 union: expressions in project list are addition to 
each side") {
+val unionQuery = testUnion.select(('a + 1).as("aa"))
+val unionOptimized = Optimize.execute(unionQuery.analyze)
+val unionCorrectAnswer =
+  Union(testRelation.select(('a + 1).as("aa")) ::
+testRelation2.select(('d + 1).as("aa")) ::
+testRelation3.select(('g + 1).as("aa")) :: Nil).analyze
+comparePlans(unionOptimized, unionCorrectAnswer)
+  }
+
+  test("SPARK-23356 union: expressions in project list are attribute 
addition to each side") {
--- End diff --

the same here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23138: [SPARK-23356][SQL][TEST] add new test cases for a...

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23138#discussion_r236136178
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationSuite.scala
 ---
@@ -196,4 +196,31 @@ class SetOperationSuite extends PlanTest {
   ))
 comparePlans(expectedPlan, rewrittenPlan)
   }
+
+  test("SPARK-23356 union: expressions in project list are addition to 
each side") {
--- End diff --

`are addition to each side` -> `are pushed down`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23083
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99254/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23130#discussion_r236135787
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -388,7 +388,7 @@ case class FileSourceScanExec(
 logInfo(s"Planning with ${bucketSpec.numBuckets} buckets")
 val filesGroupedToBuckets =
   selectedPartitions.flatMap { p =>
-p.files.map { f =>
+p.files.filter(_.getLen > 0).map { f =>
--- End diff --

Do we have a test case for this line?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23130#discussion_r236135647
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -388,7 +388,7 @@ case class FileSourceScanExec(
 logInfo(s"Planning with ${bucketSpec.numBuckets} buckets")
 val filesGroupedToBuckets =
   selectedPartitions.flatMap { p =>
-p.files.map { f =>
+p.files.filter(_.getLen > 0).map { f =>
--- End diff --

do the filtering inside the map?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23083
  
**[Test build #99254 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99254/testReport)**
 for PR 23083 at commit 
[`1723819`](https://github.com/apache/spark/commit/17238196719de1e68cbcb1eb930cb3176308e437).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23139
  
**[Test build #99262 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99262/testReport)**
 for PR 23139 at commit 
[`e416810`](https://github.com/apache/spark/commit/e41681096867cbc6d2556da83ce733092d6df841).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23139
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5347/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23139
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99251/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for resolution of...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23108
  
**[Test build #99251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99251/testReport)**
 for PR 23108 at commit 
[`d6e582b`](https://github.com/apache/spark/commit/d6e582b3ff33f767d41c9c7cf1710107d7901e0f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23138
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23138
  
**[Test build #99261 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99261/testReport)**
 for PR 23138 at commit 
[`ebe10e1`](https://github.com/apache/spark/commit/ebe10e171a8fd6fd8afa4f22eb47ee643562db5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23138
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5346/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23138
  
**[Test build #99260 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99260/testReport)**
 for PR 23138 at commit 
[`ebe10e1`](https://github.com/apache/spark/commit/ebe10e171a8fd6fd8afa4f22eb47ee643562db5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/23138
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23138
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23138
  
**[Test build #99253 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99253/testReport)**
 for PR 23138 at commit 
[`ebe10e1`](https://github.com/apache/spark/commit/ebe10e171a8fd6fd8afa4f22eb47ee643562db5a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23138: [SPARK-23356][SQL][TEST] add new test cases for a + 1,a ...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23138
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99253/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23139
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99255/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23139
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23139
  
**[Test build #99255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99255/testReport)**
 for PR 23139 at commit 
[`6b6997d`](https://github.com/apache/spark/commit/6b6997d6c5eedb9a75af61345ae808c9d98e6f4d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99259/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23127
  
**[Test build #99259 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99259/testReport)**
 for PR 23127 at commit 
[`23c2d91`](https://github.com/apache/spark/commit/23c2d9111f1cff9059746bb7b48bb8ef7ad7027b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait InputRDDCodegen extends CodegenSupport `
  * `case class InputAdapter(child: SparkPlan) extends UnaryExecNode with 
InputRDDCodegen `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99258/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23127
  
**[Test build #99258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99258/testReport)**
 for PR 23127 at commit 
[`23c2d91`](https://github.com/apache/spark/commit/23c2d9111f1cff9059746bb7b48bb8ef7ad7027b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait InputRDDCodegen extends CodegenSupport `
  * `case class InputAdapter(child: SparkPlan) extends UnaryExecNode with 
InputRDDCodegen `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22575: [SPARK-24630][SS] Support SQLStreaming in Spark

2018-11-25 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/22575
  
@stczwd Can you provide a detail design document for this PR, by mentioning 
the cenarios is been handled and constraints if any. this wll give a complete 
pitcture about this PR. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23131: [SPARK-25908][SQL][FOLLOW-UP] Add back unionAll

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/23131
  
Thanks! Merged to master. 

Yes. Adding Distinct over Union is super expensive especially when the 
underlying data set is huge. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23137
  
LGTM, pending Jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23135
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule Repla...

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23139#discussion_r236120731
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{And, ArrayExists, 
ArrayFilter, CaseWhen, Expression, If}
+import org.apache.spark.sql.catalyst.expressions.{LambdaFunction, Literal, 
MapFilter, Or}
+import org.apache.spark.sql.catalyst.expressions.Literal.FalseLiteral
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types.BooleanType
+
+
+/**
+ * A rule that replaces `Literal(null, BooleanType)` with `FalseLiteral`, 
if possible, in the search
+ * condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an 
implicit Boolean operator
+ * "(search condition) = TRUE". The replacement is only valid when 
`Literal(null, BooleanType)` is
+ * semantically equivalent to `FalseLiteral` when evaluating the whole 
search condition.
+ *
+ * Please note that FALSE and NULL are not exchangeable in most cases, 
when the search condition
+ * contains NOT and NULL-tolerant expressions. Thus, the rule is very 
conservative and applicable
+ * in very limited cases.
+ *
+ * For example, `Filter(Literal(null, BooleanType))` is equal to 
`Filter(FalseLiteral)`.
+ *
+ * Another example containing branches is `Filter(If(cond, FalseLiteral, 
Literal(null, _)))`;
+ * this can be optimized to `Filter(If(cond, FalseLiteral, 
FalseLiteral))`, and eventually
+ * `Filter(FalseLiteral)`.
+ *
+ * Moreover, this rule also transforms predicates in all [[If]] 
expressions as well as branch
+ * conditions in all [[CaseWhen]] expressions, even if they are not part 
of the search conditions.
+ *
+ * For example, `Project(If(And(cond, Literal(null)), Literal(1), 
Literal(2)))` can be simplified
+ * into `Project(Literal(2))`.
+ */
+object ReplaceNullWithFalseInPredicate extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(cond, _) => f.copy(condition = 
replaceNullWithFalse(cond))
+case j @ Join(_, _, _, Some(cond)) => j.copy(condition = 
Some(replaceNullWithFalse(cond)))
+case p: LogicalPlan => p transformExpressions {
+  case i @ If(pred, _, _) => i.copy(predicate = 
replaceNullWithFalse(pred))
+  case cw @ CaseWhen(branches, _) =>
+val newBranches = branches.map { case (cond, value) =>
+  replaceNullWithFalse(cond) -> value
+}
+cw.copy(branches = newBranches)
+  case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+af.copy(function = newLambda)
+  case ae @ ArrayExists(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+ae.copy(function = newLambda)
+  case mf @ MapFilter(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+mf.copy(function = newLambda)
+}
+  }
+
+  /**
+   * Recursively traverse the Boolean-type expression to replace
+   * `Literal(null, BooleanType)` with `FalseLiteral`, if possible.
+   *
+   * Note that `transformExpressionsDown` can not be used here as we must 
stop as soon as we hit
+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or
+   * `Literal(null, BooleanType)`.
+   */
+  private def replaceNullWithFalse(e: Expression): Expression = {
+if (e.dataType != BooleanType) {
--- End diff --

How about the LambdaFunction? My major concern is the future changes might

[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-25 Thread shahidki31

Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236120634
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

Yes. If we do, "if (status == "SUCCESS")" for every iterator value, we 
can't do the skip function.
Becuase, earlier we know the exact index we need to take. ie. we can 
directly skip to 25th percentile, 50th percentile and so on.  Now, we don't 
know which index has the 25th percentile of the "SUCCESS" value, unless we 
iterate each.

Otherwise, we have to filter the "SUCCESS" the tasks  prior, like I have 
done in the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23027: [SPARK-26049][SQL][TEST] FilterPushdownBenchmark ...

2018-11-25 Thread wangyum

Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/23027


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23127
  
**[Test build #99259 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99259/testReport)**
 for PR 23127 at commit 
[`23c2d91`](https://github.com/apache/spark/commit/23c2d9111f1cff9059746bb7b48bb8ef7ad7027b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23127
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5345/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23104#discussion_r236118983
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
   val newJoin = joinType match {
 case RightOuter => join.copy(right = maybePushLocalLimit(exp, 
right))
 case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+case Cross => join.copy(left = maybePushLocalLimit(exp, left), 
right = maybePushLocalLimit(exp, right))
--- End diff --

inner join without condition is literally cross join.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule Repla...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23139#discussion_r236118914
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{And, ArrayExists, 
ArrayFilter, CaseWhen, Expression, If}
+import org.apache.spark.sql.catalyst.expressions.{LambdaFunction, Literal, 
MapFilter, Or}
+import org.apache.spark.sql.catalyst.expressions.Literal.FalseLiteral
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types.BooleanType
+
+
+/**
+ * A rule that replaces `Literal(null, BooleanType)` with `FalseLiteral`, 
if possible, in the search
+ * condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an 
implicit Boolean operator
+ * "(search condition) = TRUE". The replacement is only valid when 
`Literal(null, BooleanType)` is
+ * semantically equivalent to `FalseLiteral` when evaluating the whole 
search condition.
+ *
+ * Please note that FALSE and NULL are not exchangeable in most cases, 
when the search condition
+ * contains NOT and NULL-tolerant expressions. Thus, the rule is very 
conservative and applicable
+ * in very limited cases.
+ *
+ * For example, `Filter(Literal(null, BooleanType))` is equal to 
`Filter(FalseLiteral)`.
+ *
+ * Another example containing branches is `Filter(If(cond, FalseLiteral, 
Literal(null, _)))`;
+ * this can be optimized to `Filter(If(cond, FalseLiteral, 
FalseLiteral))`, and eventually
+ * `Filter(FalseLiteral)`.
+ *
+ * Moreover, this rule also transforms predicates in all [[If]] 
expressions as well as branch
+ * conditions in all [[CaseWhen]] expressions, even if they are not part 
of the search conditions.
+ *
+ * For example, `Project(If(And(cond, Literal(null)), Literal(1), 
Literal(2)))` can be simplified
+ * into `Project(Literal(2))`.
+ */
+object ReplaceNullWithFalseInPredicate extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(cond, _) => f.copy(condition = 
replaceNullWithFalse(cond))
+case j @ Join(_, _, _, Some(cond)) => j.copy(condition = 
Some(replaceNullWithFalse(cond)))
+case p: LogicalPlan => p transformExpressions {
+  case i @ If(pred, _, _) => i.copy(predicate = 
replaceNullWithFalse(pred))
+  case cw @ CaseWhen(branches, _) =>
+val newBranches = branches.map { case (cond, value) =>
+  replaceNullWithFalse(cond) -> value
+}
+cw.copy(branches = newBranches)
+  case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+af.copy(function = newLambda)
+  case ae @ ArrayExists(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+ae.copy(function = newLambda)
+  case mf @ MapFilter(_, lf @ LambdaFunction(func, _, _)) =>
+val newLambda = lf.copy(function = replaceNullWithFalse(func))
+mf.copy(function = newLambda)
+}
+  }
+
+  /**
+   * Recursively traverse the Boolean-type expression to replace
+   * `Literal(null, BooleanType)` with `FalseLiteral`, if possible.
+   *
+   * Note that `transformExpressionsDown` can not be used here as we must 
stop as soon as we hit
+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or
+   * `Literal(null, BooleanType)`.
+   */
+  private def replaceNullWithFalse(e: Expression): Expression = {
+if (e.dataType != BooleanType) {
--- End diff --

do we need this? `And`, `Or`, `If` all return boolean, and we already

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23135
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23127
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23127
  
**[Test build #99258 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99258/testReport)**
 for PR 23127 at commit 
[`23c2d91`](https://github.com/apache/spark/commit/23c2d9111f1cff9059746bb7b48bb8ef7ad7027b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23127: [SPARK-26159] Codegen for LocalTableScanExec and RDDScan...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23127
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23127: [SPARK-26159] Codegen for LocalTableScanExec and ...

2018-11-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23127#discussion_r236118569
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -350,6 +350,15 @@ trait CodegenSupport extends SparkPlan {
*/
   def needStopCheck: Boolean = parent.needStopCheck
 
+  /**
+   * Helper default should stop check code.
+   */
+  def shouldStopCheckCode: String = if (needStopCheck) {
--- End diff --

we can use in in more places. This can be done in folllowup.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23135
  
**[Test build #99257 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99257/testReport)**
 for PR 23135 at commit 
[`cd682ff`](https://github.com/apache/spark/commit/cd682ff4377856b969f4745f782b7f49f2fc85c8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23135
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5344/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23135: [SPARK-26168][SQL] Update the code comments in Expressio...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23135
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23140: SPARK-25774 truncate table with partition and path

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23140
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23140: SPARK-25774 truncate table with partition and path

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23140
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23135: [SPARK-26168][SQL] Update the code comments in Ex...

2018-11-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/23135#discussion_r236117773
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -43,9 +43,24 @@ import org.apache.spark.sql.types._
  * There are a few important traits:
  *
  * - [[Nondeterministic]]: an expression that is not deterministic.
+ * - [[Stateful]]: an expression that contains mutable state. For example, 
MonotonicallyIncreasingID
+ * and Rand. A stateful expression is always 
non-deterministic.
  * - [[Unevaluable]]: an expression that is not supposed to be evaluated.
  * - [[CodegenFallback]]: an expression that does not have code gen 
implemented and falls back to
  *interpreted mode.
+ * - [[NullIntolerant]]: an expression that is null intolerant (i.e. any 
null input will result in
+ *   null output).
+ * - [[NonSQLExpression]]: a common base trait for the expressions that 
doesn't have SQL
+ * expressions like representation. For example, 
`ScalaUDF`, `ScalaUDAF`,
+ * and object `MapObjects` and `Invoke`.
+ * - [[UserDefinedExpression]]: a common base trait for user-defined 
functions, including
+ *  UDF/UDAF/UDTF.
+ * - [[HigherOrderFunction]]: a common base trait for higher order 
functions that take one or more
+ *(lambda) functions and applies these to some 
objects. The function
+ *produces a number of variables which can be 
consumed by some lambda
+ *function.
+ * - [[NamedExpression]]: An [[Expression]] that is named.
+ * - [[TimeZoneAwareExpression]]: A common base trait for time zone aware 
expressions.
--- End diff --

Added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23140: SPARK-25774 truncate table with partition and path

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23140
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23140: SPARK-25774 truncate table with partition and pat...

2018-11-25 Thread lcqzte10192193

GitHub user lcqzte10192193 opened a pull request:

https://github.com/apache/spark/pull/23140

SPARK-25774 truncate table with partition and path

## What changes were proposed in this pull request?
when we run SPARK SQL TRUNCATE TABLE command on a  managed table in Hive, 
it deletes the files in HDFS but leaves the partitions and partition folder 
structure,more details refers to SPARK-25774.This pr is to resolve this problem.

## How was this patch tested?

DDLSuite

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lcqzte10192193/spark wid-lcq-1126

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23140


commit a902f4c233d8199d2461dbad9492d34d5179a1cc
Author: lichaoqun 
Date:   2018-11-26T03:20:24Z

SPARK-25774 truncate table with partition and path




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21732: [SPARK-24762][SQL] Enable Option of Product encod...

2018-11-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21732#discussion_r236117313
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -253,10 +247,24 @@ case class ExpressionEncoder[T](
   })
 
   /**
-   * Returns true if the type `T` is serialized as a struct.
+   * Returns true if the type `T` is serialized as a struct by 
`objSerializer`.
*/
   def isSerializedAsStruct: Boolean = 
objSerializer.dataType.isInstanceOf[StructType]
 
+  /**
+   * Returns true if the type `T` is an `Option` type.
+   */
+  def isOptionType: Boolean = 
classOf[Option[_]].isAssignableFrom(clsTag.runtimeClass)
+
+  /**
+   * If the type `T` is serialized as a struct, when it is encoded to a 
Spark SQL row, fields in
+   * the struct are naturally mapped to top-level columns in a row. In 
other words, the serialized
+   * struct is flattened to row. But in case of the `T` is also an 
`Option` type, it can't be
+   * flattened to top-level row, because in Spark SQL top-level row can't 
be null. This method
+   * returns true if `T` is serialized as struct and is not `Option` type.
+   */
+  def isSerializedAsStructForTopLevel: Boolean = isSerializedAsStruct && 
!isOptionType
--- End diff --

ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23137
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22991
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23137
  
**[Test build #99256 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99256/testReport)**
 for PR 23137 at commit 
[`5cfe08d`](https://github.com/apache/spark/commit/5cfe08d75383069d0ac62f9603685ea1860b74e1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23137: [SPARK-26169] Create DataFrameSetOperationsSuite

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23137
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5343/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

2018-11-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22991
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99252/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 298 matches

Mail list logo