[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19916
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84575/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19916
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19893
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84574/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19912#discussion_r155412840
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] {
  * }}}
  *
  * Approach used:
- * - Start from AND operator as the root
- * - Get all the children conjunctive predicates which are EqualTo / 
EqualNullSafe such that they
- *   don't have a `NOT` or `OR` operator in them
  * - Populate a mapping of attribute => constant value by looking at all 
the equals predicates
  * - Using this mapping, replace occurrence of the attributes with the 
corresponding constant values
  *   in the AND node.
  */
 object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
-  private def containsNonConjunctionPredicates(expression: Expression): 
Boolean = expression.find {
-case _: Not | _: Or => true
-case _ => false
-  }.isDefined
-
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case f: Filter => f transformExpressionsUp {
-  case and: And =>
-val conjunctivePredicates =
-  splitConjunctivePredicates(and)
-.filter(expr => expr.isInstanceOf[EqualTo] || 
expr.isInstanceOf[EqualNullSafe])
-.filterNot(expr => containsNonConjunctionPredicates(expr))
-
-val equalityPredicates = conjunctivePredicates.collect {
-  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
-  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
-  case e @ EqualNullSafe(left: AttributeReference, right: Literal) 
=> ((left, right), e)
-  case e @ EqualNullSafe(left: Literal, right: AttributeReference) 
=> ((right, left), e)
-}
-
-val constantsMap = AttributeMap(equalityPredicates.map(_._1))
-val predicates = equalityPredicates.map(_._2).toSet
+case f: Filter =>
+  val (newCondition, _) = traverse(f.condition, replaceChildren = true)
+  if (newCondition.isDefined) {
+f.copy(condition = newCondition.get)
+  } else {
+f
+  }
+  }
 
-def replaceConstants(expression: Expression) = expression 
transform {
-  case a: AttributeReference =>
-constantsMap.get(a) match {
-  case Some(literal) => literal
-  case None => a
-}
+  /**
+   * Traverse a condition as a tree and replace attributes with constant 
values.
+   * - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], 
propagate the mapping
+   *   of attribute => constant.
+   * - If the current [[And]] node is not child of another [[And]], 
replace occurrence of the
+   *   attributes with the corresponding constant values in both children 
with propagated mapping.
--- End diff --

Also add the comments to explain `Or` and `Not` processing below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19893
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19902
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19902
  
**[Test build #84582 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84582/testReport)**
 for PR 19902 at commit 
[`7d87edc`](https://github.com/apache/spark/commit/7d87ed0573554b6b43e0cac17994652397bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84582/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #84586 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84586/testReport)**
 for PR 19222 at commit 
[`2ca374e`](https://github.com/apache/spark/commit/2ca374ee233f5edb9bb206b06d7383c0e22dd25e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84586/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...

2017-12-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19916
  

https://github.com/apache/spark/blob/e98f9647f44d1071a6b070db070841b8cda6bd7a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1123






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
I have added this test case for the URI comparing based on Steve's 
comments. I have tested this in my local vm, it pass the test. 

meanwhile, for the hdfs://namenode1/path1 hdfs://namenode1:8020/path2  , 
the default port number of hdfs can be got. thus, they also matched. 

below is the test case:

test("compare URI for filesystem") {

//case 1
var srcUri = new URI("file:///file1")
var dstUri = new URI("file:///file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 2
srcUri = new URI("file:///c:file1")
dstUri = new URI("file://c:file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 3
srcUri = new URI("file://host/file1")
dstUri = new URI("file://host/file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 4
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket1@user/")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 5
srcUri = new URI("hdfs:/path1")
dstUri = new URI("hdfs:/path2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 6
srcUri = new URI("file:///file1")
dstUri = new URI("file://host/file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 7
srcUri = new URI("file://host/file1")
dstUri = new URI("file:///file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 8
srcUri = new URI("file://host/file1")
dstUri = new URI("file://host2/file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 9
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket2@user/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 10
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket1@user2/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 11
srcUri = new URI("s3a://user@pass:bucket1/")
dstUri = new URI("s3a://user2@pass2:bucket1/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 12
srcUri = new URI("hdfs://namenode1/path1")
dstUri = new URI("hdfs://namenode1:8080/path2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 13
srcUri = new URI("hdfs://namenode1:8020/path1")
dstUri = new URI("hdfs://namenode1:8080/path2")
assert(Client.compareUri(srcUri, dstUri) == false)
  }




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19912
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19917
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19917
  
**[Test build #84590 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84590/testReport)**
 for PR 19917 at commit 
[`2535b21`](https://github.com/apache/spark/commit/2535b214ea5349209aecaa4b93868df548077c31).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19917
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84590/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19916
  
**[Test build #84575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84575/testReport)**
 for PR 19916 at commit 
[`3004a69`](https://github.com/apache/spark/commit/3004a69058e5fce61fb2ea7b9bd7c883c5fa89ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19854: [SPARK-22660][BUILD] Use position() and limit() to fix a...

2017-12-06 Thread kellyzly
Github user kellyzly commented on the issue:

https://github.com/apache/spark/pull/19854
  
thanks @HyukjinKwon ,@srowen, @viirya 's review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19893
  
**[Test build #84574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84574/testReport)**
 for PR 19893 at commit 
[`1a64209`](https://github.com/apache/spark/commit/1a64209e0208c1f6d1e7a54db2e8cae958f409cf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19792: [SPARK-22566][PYTHON] Better error message for `_...

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19792#discussion_r155424467
  
--- Diff: python/pyspark/sql/session.py ---
@@ -405,14 +401,9 @@ def _createFromLocal(self, data, schema):
 data = list(data)
 
 if schema is None or isinstance(schema, (list, tuple)):
-struct = self._inferSchemaFromList(data)
-converter = _create_converter(struct)
+schema = self._inferSchemaFromList(data, names=schema)
+converter = _create_converter(schema)
 data = map(converter, data)
-if isinstance(schema, (list, tuple)):
--- End diff --

```python
>>> spark.createDataFrame([{'a': 1}], ["b"])
DataFrame[a: bigint]
```

Hm.. sorry actually I think we should not remove this one and L385 because 
we should primarily respect user's input and it should be `DataFrame[b: 
bigint]`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19882#discussion_r155429024
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
 ---
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.nio.charset.StandardCharsets
+import java.sql.{Date, Timestamp}
+
+import scala.collection.JavaConverters._
+
+import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument}
+
+import org.apache.spark.sql.{Column, DataFrame}
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, 
HadoopFsRelation, LogicalRelation}
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types._
+
+/**
+ * A test suite that tests Apache ORC filter API based filter pushdown 
optimization.
+ */
+class OrcFilterSuite extends OrcTest with SharedSQLContext {
+
+  private def checkFilterPredicate(
+  df: DataFrame,
+  predicate: Predicate,
+  checker: (SearchArgument) => Unit): Unit = {
+val output = predicate.collect { case a: Attribute => a }.distinct
+val query = df
+  .select(output.map(e => Column(e)): _*)
+  .where(Column(predicate))
+
+var maybeRelation: Option[HadoopFsRelation] = None
+val maybeAnalyzedPredicate = 
query.queryExecution.optimizedPlan.collect {
+  case PhysicalOperation(_, filters, LogicalRelation(orcRelation: 
HadoopFsRelation, _, _, _)) =>
+maybeRelation = Some(orcRelation)
+filters
+}.flatten.reduceLeftOption(_ && _)
+assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from 
the given query")
+
+val (_, selectedFilters, _) =
+  DataSourceStrategy.selectFilters(maybeRelation.get, 
maybeAnalyzedPredicate.toSeq)
+assert(selectedFilters.nonEmpty, "No filter is pushed down")
+
+val maybeFilter = OrcFilters.createFilter(query.schema, 
selectedFilters)
+assert(maybeFilter.isDefined, s"Couldn't generate filter predicate for 
$selectedFilters")
+checker(maybeFilter.get)
+  }
+
+  private def checkFilterPredicate
+  (predicate: Predicate, filterOperator: PredicateLeaf.Operator)
+  (implicit df: DataFrame): Unit = {
+def checkComparisonOperator(filter: SearchArgument) = {
+  val operator = filter.getLeaves.asScala
+  assert(operator.map(_.getOperator).contains(filterOperator))
+}
+checkFilterPredicate(df, predicate, checkComparisonOperator)
+  }
+
+  private def checkFilterPredicate
+  (predicate: Predicate, stringExpr: String)
+  (implicit df: DataFrame): Unit = {
+def checkLogicalOperator(filter: SearchArgument) = {
+  assert(filter.toString == stringExpr)
+}
+checkFilterPredicate(df, predicate, checkLogicalOperator)
+  }
+
+  private def checkNoFilterPredicate
+  (predicate: Predicate)
+  (implicit df: DataFrame): Unit = {
+val output = predicate.collect { case a: Attribute => a }.distinct
+val query = df
+  .select(output.map(e => Column(e)): _*)
+  .where(Column(predicate))
+
+var maybeRelation: Option[HadoopFsRelation] = None
+val maybeAnalyzedPredicate = 
query.queryExecution.optimizedPlan.collect {
+  case PhysicalOperation(_, filters, LogicalRelation(orcRelation: 
HadoopFsRelation, _, _, _)) =>
+maybeRelation = Some(orcRelation)
+filters
+}.flatten.reduceLeftOption(_ && _)
+assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from 
the given query")
+
+val (_, selectedFilters, _) =
+  

[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19882
  
I actually suggested similarly before:

```
org.apache.spark.sql.execution.datasources.csv.InferSchema
org.apache.spark.sql.execution.datasources.json.InferSchema
```

but I actually received an advise at that time and it became as below: 

```
org.apache.spark.sql.execution.datasources.csv.CSVInferSchema
org.apache.spark.sql.execution.datasources.json.JsonInferSchema
```

I actually liked `Hive` prefix it was easier to distinguish.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19683
  
**[Test build #84595 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84595/testReport)**
 for PR 19683 at commit 
[`04b5814`](https://github.com/apache/spark/commit/04b5814ee3c545096c4a0c50148be3975fdead45).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19912
  
**[Test build #84593 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84593/testReport)**
 for PR 19912 at commit 
[`88cd025`](https://github.com/apache/spark/commit/88cd0253465a1c2f8d697b7eaaecd3d756a06e29).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19912
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84593/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19902
  
**[Test build #84578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84578/testReport)**
 for PR 19902 at commit 
[`15e8ce5`](https://github.com/apache/spark/commit/15e8ce584180d014f8b1ffeb8f0e251299cb8f31).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19499: [SPARK-22279][SQL] Turn on spark.sql.hive.convertMetasto...

2017-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19499
  
Good. The failures are expected ones and handled by #19882 .
- org.apache.spark.sql.hive.orc.OrcQuerySuite.SPARK-8501: Avoids discovery 
schema from empty ORC files
  - The test case expects a failure, but it's fixed in new OrcFileFormat.
- org.apache.spark.sql.hive.orc.OrcSourceSuite.SPARK-19459/SPARK-18220: 
read char/varchar column written by Hive
  - This is VARCHAR issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19912
  
**[Test build #84596 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84596/testReport)**
 for PR 19912 at commit 
[`88cd025`](https://github.com/apache/spark/commit/88cd0253465a1c2f8d697b7eaaecd3d756a06e29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19912#discussion_r155415232
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] {
  * }}}
  *
  * Approach used:
- * - Start from AND operator as the root
- * - Get all the children conjunctive predicates which are EqualTo / 
EqualNullSafe such that they
- *   don't have a `NOT` or `OR` operator in them
  * - Populate a mapping of attribute => constant value by looking at all 
the equals predicates
  * - Using this mapping, replace occurrence of the attributes with the 
corresponding constant values
  *   in the AND node.
  */
 object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
-  private def containsNonConjunctionPredicates(expression: Expression): 
Boolean = expression.find {
-case _: Not | _: Or => true
-case _ => false
-  }.isDefined
-
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case f: Filter => f transformExpressionsUp {
-  case and: And =>
-val conjunctivePredicates =
-  splitConjunctivePredicates(and)
-.filter(expr => expr.isInstanceOf[EqualTo] || 
expr.isInstanceOf[EqualNullSafe])
-.filterNot(expr => containsNonConjunctionPredicates(expr))
-
-val equalityPredicates = conjunctivePredicates.collect {
-  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
-  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
-  case e @ EqualNullSafe(left: AttributeReference, right: Literal) 
=> ((left, right), e)
-  case e @ EqualNullSafe(left: Literal, right: AttributeReference) 
=> ((right, left), e)
-}
-
-val constantsMap = AttributeMap(equalityPredicates.map(_._1))
-val predicates = equalityPredicates.map(_._2).toSet
+case f: Filter =>
+  val (newCondition, _) = traverse(f.condition, replaceChildren = true)
+  if (newCondition.isDefined) {
+f.copy(condition = newCondition.get)
+  } else {
+f
+  }
+  }
 
-def replaceConstants(expression: Expression) = expression 
transform {
-  case a: AttributeReference =>
-constantsMap.get(a) match {
-  case Some(literal) => literal
-  case None => a
-}
+  /**
+   * Traverse a condition as a tree and replace attributes with constant 
values.
+   * - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], 
propagate the mapping
+   *   of attribute => constant.
+   * - If the current [[And]] node is not child of another [[And]], 
replace occurrence of the
+   *   attributes with the corresponding constant values in both children 
with propagated mapping.
+   * @param condition condition to be traversed
+   * @param replaceChildren whether to replace attributes with the 
corresponding constant values
+   */
+  private def traverse(condition: Expression, replaceChildren: Boolean)
+: (Option[Expression], Seq[((AttributeReference, Literal), 
BinaryComparison)]) =
--- End diff --

Maybe define a type alias for `Seq[((AttributeReference, Literal), 
BinaryComparison)]`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19893
  
**[Test build #84580 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84580/testReport)**
 for PR 19893 at commit 
[`fe6cd0c`](https://github.com/apache/spark/commit/fe6cd0cc727a9d4516f6af674e616f0b1f8cef93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19882
  
I meant in 
https://github.com/apache/spark/pull/19882#pullrequestreview-81708676, I liked 
this `Hive` prefix here so wondered if we could do the same thing for the main 
codes too. Since this PR only refactors the tests, it's loosely related though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19912
  
**[Test build #84593 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84593/testReport)**
 for PR 19912 at commit 
[`88cd025`](https://github.com/apache/spark/commit/88cd0253465a1c2f8d697b7eaaecd3d756a06e29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19917: Add failing test for select with a splatted stream

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19917
  
**[Test build #84592 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84592/testReport)**
 for PR 19917 at commit 
[`c728535`](https://github.com/apache/spark/commit/c7285358a06f58ca69c75f7cec194ed27c3cf8e8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #84594 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84594/testReport)**
 for PR 19885 at commit 
[`dfec0ac`](https://github.com/apache/spark/commit/dfec0ac812b690b79f416806409d96a65f4d9fe7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...

2017-12-06 Thread liyinan926
Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/19717
  
@vanzin @jerryshao @felixcheung @viirya @mridulm @ueshin @jiangxb1987 
Really appreciate your time reviewing this! Do you have more comments? Given 
that we have at least one more PR (most likely for integration tests) that we 
want to get in before 2.3 code freeze, it would be ideal if this PR can be 
merged this week. Thank you all!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19882
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19882
  
**[Test build #84581 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84581/testReport)**
 for PR 19882 at commit 
[`fcb2ccb`](https://github.com/apache/spark/commit/fcb2ccb64813f0fc37e3321c006937fb9cb870e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19882
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84581/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19893
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19893
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84580/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19882
  
@HyukjinKwon . The PR code and description is updated.
- Update comments
- Rename back to the originals

The main reason I used prefix `Hive` is for naming consistency, but now 
it's restored.

As you see in the PR description, we need to use 
`HiveOrcHadoopFsRelationSuite` inconsistently
because `OrcHadoopFsRelationSuite` already exists in the same package.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2017-12-06 Thread liutang123
Github user liutang123 commented on the issue:

https://github.com/apache/spark/pull/19364
  
@cloud-fan Would you please look at this when you have time?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19917
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19917
  
**[Test build #84592 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84592/testReport)**
 for PR 19917 at commit 
[`c728535`](https://github.com/apache/spark/commit/c7285358a06f58ca69c75f7cec194ed27c3cf8e8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/19912
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19917
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84592/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19901
  
**[Test build #84588 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84588/testReport)**
 for PR 19901 at commit 
[`c4691b6`](https://github.com/apache/spark/commit/c4691b6b4cd8b2ac26dd9c89243372dfec5bb913).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19882
  
LGTM BTW.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19882#discussion_r155410507
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
 ---
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.nio.charset.StandardCharsets
+import java.sql.{Date, Timestamp}
+
+import scala.collection.JavaConverters._
+
+import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument}
+
+import org.apache.spark.sql.{Column, DataFrame}
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, 
HadoopFsRelation, LogicalRelation}
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types._
+
+/**
+ * A test suite that tests Apache ORC filter API based filter pushdown 
optimization.
+ */
+class OrcFilterSuite extends OrcTest with SharedSQLContext {
+
+  private def checkFilterPredicate(
+  df: DataFrame,
+  predicate: Predicate,
+  checker: (SearchArgument) => Unit): Unit = {
+val output = predicate.collect { case a: Attribute => a }.distinct
+val query = df
+  .select(output.map(e => Column(e)): _*)
+  .where(Column(predicate))
+
+var maybeRelation: Option[HadoopFsRelation] = None
+val maybeAnalyzedPredicate = 
query.queryExecution.optimizedPlan.collect {
+  case PhysicalOperation(_, filters, LogicalRelation(orcRelation: 
HadoopFsRelation, _, _, _)) =>
+maybeRelation = Some(orcRelation)
+filters
+}.flatten.reduceLeftOption(_ && _)
+assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from 
the given query")
+
+val (_, selectedFilters, _) =
+  DataSourceStrategy.selectFilters(maybeRelation.get, 
maybeAnalyzedPredicate.toSeq)
+assert(selectedFilters.nonEmpty, "No filter is pushed down")
+
+val maybeFilter = OrcFilters.createFilter(query.schema, 
selectedFilters)
+assert(maybeFilter.isDefined, s"Couldn't generate filter predicate for 
$selectedFilters")
+checker(maybeFilter.get)
+  }
+
+  private def checkFilterPredicate
+  (predicate: Predicate, filterOperator: PredicateLeaf.Operator)
+  (implicit df: DataFrame): Unit = {
+def checkComparisonOperator(filter: SearchArgument) = {
+  val operator = filter.getLeaves.asScala
+  assert(operator.map(_.getOperator).contains(filterOperator))
+}
+checkFilterPredicate(df, predicate, checkComparisonOperator)
+  }
+
+  private def checkFilterPredicate
+  (predicate: Predicate, stringExpr: String)
+  (implicit df: DataFrame): Unit = {
+def checkLogicalOperator(filter: SearchArgument) = {
+  assert(filter.toString == stringExpr)
+}
+checkFilterPredicate(df, predicate, checkLogicalOperator)
+  }
+
+  private def checkNoFilterPredicate
+  (predicate: Predicate)
+  (implicit df: DataFrame): Unit = {
+val output = predicate.collect { case a: Attribute => a }.distinct
+val query = df
+  .select(output.map(e => Column(e)): _*)
+  .where(Column(predicate))
+
+var maybeRelation: Option[HadoopFsRelation] = None
+val maybeAnalyzedPredicate = 
query.queryExecution.optimizedPlan.collect {
+  case PhysicalOperation(_, filters, LogicalRelation(orcRelation: 
HadoopFsRelation, _, _, _)) =>
+maybeRelation = Some(orcRelation)
+filters
+}.flatten.reduceLeftOption(_ && _)
+assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from 
the given query")
+
+val (_, selectedFilters, _) =
+  

[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...

2017-12-06 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/19676
  
It's good to have this, sorry for late response, I will make a pass 
tomorrow. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19882#discussion_r155428391
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
 ---
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.nio.charset.StandardCharsets
+import java.sql.{Date, Timestamp}
+
+import scala.collection.JavaConverters._
+
+import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument}
+
+import org.apache.spark.sql.{Column, DataFrame}
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, 
HadoopFsRelation, LogicalRelation}
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types._
+
+/**
+ * A test suite that tests Apache ORC filter API based filter pushdown 
optimization.
+ */
+class OrcFilterSuite extends OrcTest with SharedSQLContext {
+
+  private def checkFilterPredicate(
+  df: DataFrame,
+  predicate: Predicate,
+  checker: (SearchArgument) => Unit): Unit = {
+val output = predicate.collect { case a: Attribute => a }.distinct
+val query = df
+  .select(output.map(e => Column(e)): _*)
+  .where(Column(predicate))
+
+var maybeRelation: Option[HadoopFsRelation] = None
+val maybeAnalyzedPredicate = 
query.queryExecution.optimizedPlan.collect {
+  case PhysicalOperation(_, filters, LogicalRelation(orcRelation: 
HadoopFsRelation, _, _, _)) =>
+maybeRelation = Some(orcRelation)
+filters
+}.flatten.reduceLeftOption(_ && _)
+assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from 
the given query")
+
+val (_, selectedFilters, _) =
+  DataSourceStrategy.selectFilters(maybeRelation.get, 
maybeAnalyzedPredicate.toSeq)
+assert(selectedFilters.nonEmpty, "No filter is pushed down")
+
+val maybeFilter = OrcFilters.createFilter(query.schema, 
selectedFilters)
+assert(maybeFilter.isDefined, s"Couldn't generate filter predicate for 
$selectedFilters")
+checker(maybeFilter.get)
+  }
+
+  private def checkFilterPredicate
+  (predicate: Predicate, filterOperator: PredicateLeaf.Operator)
+  (implicit df: DataFrame): Unit = {
+def checkComparisonOperator(filter: SearchArgument) = {
+  val operator = filter.getLeaves.asScala
+  assert(operator.map(_.getOperator).contains(filterOperator))
+}
+checkFilterPredicate(df, predicate, checkComparisonOperator)
+  }
+
+  private def checkFilterPredicate
+  (predicate: Predicate, stringExpr: String)
+  (implicit df: DataFrame): Unit = {
+def checkLogicalOperator(filter: SearchArgument) = {
+  assert(filter.toString == stringExpr)
+}
+checkFilterPredicate(df, predicate, checkLogicalOperator)
+  }
+
+  private def checkNoFilterPredicate
+  (predicate: Predicate)
+  (implicit df: DataFrame): Unit = {
+val output = predicate.collect { case a: Attribute => a }.distinct
+val query = df
+  .select(output.map(e => Column(e)): _*)
+  .where(Column(predicate))
+
+var maybeRelation: Option[HadoopFsRelation] = None
+val maybeAnalyzedPredicate = 
query.queryExecution.optimizedPlan.collect {
+  case PhysicalOperation(_, filters, LogicalRelation(orcRelation: 
HadoopFsRelation, _, _, _)) =>
+maybeRelation = Some(orcRelation)
+filters
+}.flatten.reduceLeftOption(_ && _)
+assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from 
the given query")
+
+val (_, selectedFilters, _) =
+  

[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19882
  
Thank you so much, @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19917: Add failing test for select with a splatted strea...

2017-12-06 Thread ash211
GitHub user ash211 opened a pull request:

https://github.com/apache/spark/pull/19917

Add failing test for select with a splatted stream

## What changes were proposed in this pull request?

Add additional test.

## How was this patch tested?

Additional test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ash211/spark aash/PDS-59284

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19917


commit 2535b214ea5349209aecaa4b93868df548077c31
Author: Andrew Ash 
Date:   2017-12-07T00:50:27Z

Add test for PDS-59284




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19882
  
Yup but I actually received an advise before. I suggested:

```
```

```
org.apache.spark.sql.execution.datasources.csv.CSVInferSchema
org.apache.spark.sql.execution.datasources.json.JsonInferSchema
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19917: Add failing test for select with a splatted stream

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19917
  
**[Test build #84590 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84590/testReport)**
 for PR 19917 at commit 
[`2535b21`](https://github.com/apache/spark/commit/2535b214ea5349209aecaa4b93868df548077c31).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19882
  
Definitely, my bad.

For the main code, we can do later in a separate PR if needed. I hope this 
PR contains tests only.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19882
  
I am sorry, I had to clarify this ahead ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres

2017-12-06 Thread danielvdende
Github user danielvdende commented on the issue:

https://github.com/apache/spark/pull/19911
  
Sure, I'll make the changes :). I'll use this PR, seems to make sense, as 
it would fix truncate for postgres


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres

2017-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19911
  
Great! I'm looking forward your new PR, @danielvdende . I can help you a 
little bit during review. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #84594 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84594/testReport)**
 for PR 19885 at commit 
[`dfec0ac`](https://github.com/apache/spark/commit/dfec0ac812b690b79f416806409d96a65f4d9fe7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84594/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84578/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19912#discussion_r155411680
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] {
  * }}}
  *
  * Approach used:
- * - Start from AND operator as the root
- * - Get all the children conjunctive predicates which are EqualTo / 
EqualNullSafe such that they
- *   don't have a `NOT` or `OR` operator in them
  * - Populate a mapping of attribute => constant value by looking at all 
the equals predicates
  * - Using this mapping, replace occurrence of the attributes with the 
corresponding constant values
  *   in the AND node.
  */
 object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
-  private def containsNonConjunctionPredicates(expression: Expression): 
Boolean = expression.find {
-case _: Not | _: Or => true
-case _ => false
-  }.isDefined
-
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case f: Filter => f transformExpressionsUp {
-  case and: And =>
-val conjunctivePredicates =
-  splitConjunctivePredicates(and)
-.filter(expr => expr.isInstanceOf[EqualTo] || 
expr.isInstanceOf[EqualNullSafe])
-.filterNot(expr => containsNonConjunctionPredicates(expr))
-
-val equalityPredicates = conjunctivePredicates.collect {
-  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
-  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
-  case e @ EqualNullSafe(left: AttributeReference, right: Literal) 
=> ((left, right), e)
-  case e @ EqualNullSafe(left: Literal, right: AttributeReference) 
=> ((right, left), e)
-}
-
-val constantsMap = AttributeMap(equalityPredicates.map(_._1))
-val predicates = equalityPredicates.map(_._2).toSet
+case f: Filter =>
+  val (newCondition, _) = traverse(f.condition, replaceChildren = true)
+  if (newCondition.isDefined) {
+f.copy(condition = newCondition.get)
+  } else {
+f
+  }
+  }
 
-def replaceConstants(expression: Expression) = expression 
transform {
-  case a: AttributeReference =>
-constantsMap.get(a) match {
-  case Some(literal) => literal
-  case None => a
-}
+  /**
+   * Traverse a condition as a tree and replace attributes with constant 
values.
+   * - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], 
propagate the mapping
+   *   of attribute => constant.
+   * - If the current [[And]] node is not child of another [[And]], 
replace occurrence of the
+   *   attributes with the corresponding constant values in both children 
with propagated mapping.
+   * @param condition condition to be traversed
+   * @param replaceChildren whether to replace attributes with the 
corresponding constant values
+   */
--- End diff --

Add doc to explain return value?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19499: [SPARK-22279][SQL] Turn on spark.sql.hive.convertMetasto...

2017-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19499
  
I'll retrigger after merging #19882 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19902
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84579/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-12-06 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19525#discussion_r155414954
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -2769,6 +2769,20 @@ class LogisticRegressionSuite
   LogisticRegressionSuite.allParamSettings, checkModelData)
   }
 
+  test("read/write with BoundsOnCoefficients") {
+def checkModelData(model: LogisticRegressionModel, model2: 
LogisticRegressionModel): Unit = {
+  assert(model.getLowerBoundsOnCoefficients === 
model2.getLowerBoundsOnCoefficients)
+  assert(model.getUpperBoundsOnCoefficients === 
model2.getUpperBoundsOnCoefficients)
--- End diff --

Sure, you can update existing read/write test or your test, as long as to 
check model itself rather than parameters.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-12-06 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19525#discussion_r155413347
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -122,17 +124,33 @@ private[ml] object Param {
 
   /** Decodes a param value from JSON. */
   def jsonDecode[T](json: String): T = {
-parse(json) match {
+val jValue = parse(json)
+jValue match {
   case JString(x) =>
 x.asInstanceOf[T]
   case JObject(v) =>
 val keys = v.map(_._1)
-assert(keys.contains("type") && keys.contains("values"),
-  s"Expect a JSON serialized vector but cannot find fields 'type' 
and 'values' in $json.")
-JsonVectorConverter.fromJson(json).asInstanceOf[T]
+if (keys.contains("class")) {
+  implicit val formats = DefaultFormats
+  val className = (jValue \ "class").extract[String]
+  className match {
+case JsonMatrixConverter.className =>
+  val checkFields = Array("numRows", "numCols", "values", 
"isTransposed")
+  require(checkFields.forall(keys.contains), s"Expect a JSON 
serialized Matrix" +
+s" but cannot find fields ${checkFields.mkString(", ")} in 
$json.")
+  JsonMatrixConverter.fromJson(json).asInstanceOf[T]
+
+case s => throw new SparkException(s"unrecognized class $s in 
$json")
+  }
+} else { // Vector does not have class info in json
--- End diff --

I'd suggest to add more comment here to clarify why vector doesn't have 
_class_ info in json, it should facilitate the code maintenance.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-12-06 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19525#discussion_r155411609
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.linalg
+
+import org.json4s.DefaultFormats
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render}
+
+private[ml] object JsonMatrixConverter {
+
+  /** Unique class name for identifying JSON object encoded by this class. 
*/
+  val className = "org.apache.spark.ml.linalg.Matrix"
--- End diff --

@hhbyyh You have got a point there, agree.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-12-06 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19525#discussion_r155412287
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -122,17 +124,33 @@ private[ml] object Param {
 
   /** Decodes a param value from JSON. */
   def jsonDecode[T](json: String): T = {
-parse(json) match {
+val jValue = parse(json)
+jValue match {
   case JString(x) =>
 x.asInstanceOf[T]
   case JObject(v) =>
 val keys = v.map(_._1)
-assert(keys.contains("type") && keys.contains("values"),
-  s"Expect a JSON serialized vector but cannot find fields 'type' 
and 'values' in $json.")
-JsonVectorConverter.fromJson(json).asInstanceOf[T]
+if (keys.contains("class")) {
+  implicit val formats = DefaultFormats
+  val className = (jValue \ "class").extract[String]
+  className match {
+case JsonMatrixConverter.className =>
+  val checkFields = Array("numRows", "numCols", "values", 
"isTransposed")
--- End diff --

Should we check _type_ as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19912#discussion_r155414906
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] {
  * }}}
  *
  * Approach used:
- * - Start from AND operator as the root
- * - Get all the children conjunctive predicates which are EqualTo / 
EqualNullSafe such that they
- *   don't have a `NOT` or `OR` operator in them
  * - Populate a mapping of attribute => constant value by looking at all 
the equals predicates
  * - Using this mapping, replace occurrence of the attributes with the 
corresponding constant values
  *   in the AND node.
  */
 object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
-  private def containsNonConjunctionPredicates(expression: Expression): 
Boolean = expression.find {
-case _: Not | _: Or => true
-case _ => false
-  }.isDefined
-
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case f: Filter => f transformExpressionsUp {
--- End diff --

Oh, this doesn't work because it still replaces the replaced `And`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19912
  
New algorithm looks correct and good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...

2017-12-06 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19525#discussion_r155414284
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") (
 @Since("2.0.0")
 object DenseMatrix {
 
+  @Since("2.3.0")
+  private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, 
Array[Double], Boolean)] =
--- End diff --

I'm neutral on this issue. It's ok to let it private but should remove 
```Since```. We can make it public later when there is clear requirement.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19901
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84588/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19901
  
**[Test build #84588 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84588/testReport)**
 for PR 19901 at commit 
[`c4691b6`](https://github.com/apache/spark/commit/c4691b6b4cd8b2ac26dd9c89243372dfec5bb913).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19901
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19882
  
**[Test build #84589 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84589/testReport)**
 for PR 19882 at commit 
[`1828571`](https://github.com/apache/spark/commit/18285717edcbefe9a79d51a6fdcb112c3d0bc5c4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19882
  
Oh. I'll bring back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nested expr...

2017-12-06 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19813
  
LGTM, good job!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19882
  
**[Test build #84589 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84589/testReport)**
 for PR 19882 at commit 
[`1828571`](https://github.com/apache/spark/commit/18285717edcbefe9a79d51a6fdcb112c3d0bc5c4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OrcFilterSuite extends OrcTest with TestHiveSingleton `
  * `implicit class IntToBinary(int: Int) `
  * `class OrcPartitionDiscoverySuite extends OrcPartitionDiscoveryTest 
with TestHiveSingleton  `
  * `class OrcQuerySuite extends OrcQueryTest with TestHiveSingleton `
  * `class OrcSourceSuite extends OrcSuite with TestHiveSingleton `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19882
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19882
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84589/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19902
  
**[Test build #84579 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84579/testReport)**
 for PR 19902 at commit 
[`5de7962`](https://github.com/apache/spark/commit/5de7962755f2175d6d11dec262644e8b9afae08b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19902
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19882#discussion_r155398123
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
 ---
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.nio.charset.StandardCharsets
+import java.sql.{Date, Timestamp}
+
+import scala.collection.JavaConverters._
+
+import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument}
+
+import org.apache.spark.sql.{Column, DataFrame}
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, 
HadoopFsRelation, LogicalRelation}
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types._
+
+/**
+ * A test suite that tests Apache ORC filter API based filter pushdown 
optimization.
+ */
+class OrcFilterSuite extends OrcTest with SharedSQLContext {
--- End diff --

Could we leave some comments to explain that reason? 
Seems there are many duplications and I would wonder why.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19912#discussion_r155413508
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] {
  * }}}
  *
  * Approach used:
- * - Start from AND operator as the root
- * - Get all the children conjunctive predicates which are EqualTo / 
EqualNullSafe such that they
- *   don't have a `NOT` or `OR` operator in them
  * - Populate a mapping of attribute => constant value by looking at all 
the equals predicates
  * - Using this mapping, replace occurrence of the attributes with the 
corresponding constant values
  *   in the AND node.
  */
 object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
-  private def containsNonConjunctionPredicates(expression: Expression): 
Boolean = expression.find {
-case _: Not | _: Or => true
-case _ => false
-  }.isDefined
-
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-case f: Filter => f transformExpressionsUp {
--- End diff --

Will it be the same effect if we turn `transformExpressionsUp` to 
`transformExpressionsDown`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19901
  
**[Test build #84584 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84584/testReport)**
 for PR 19901 at commit 
[`31ab853`](https://github.com/apache/spark/commit/31ab853c6d496ac5d6a0a4c4e6b9cf03e7cbf466).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19901
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19901
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84584/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19041
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84585/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19041
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19041
  
**[Test build #84585 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84585/testReport)**
 for PR 19041 at commit 
[`6a54b86`](https://github.com/apache/spark/commit/6a54b8652888d5e214b7941773644868091fe5f0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19882
  
**[Test build #84591 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84591/testReport)**
 for PR 19882 at commit 
[`f56423d`](https://github.com/apache/spark/commit/f56423dd9168e781fad6e02e2aa319d8593d98cc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation

2017-12-06 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/19912
  
@gatorsmile @viirya thanks for reviewing! I have addressed your comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19905: [SPARK-22710] ConfigBuilder.fallbackConf should trigger ...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19905
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84551/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19905: [SPARK-22710] ConfigBuilder.fallbackConf should trigger ...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19905
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19893
  
**[Test build #4006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4006/testReport)**
 for PR 19893 at commit 
[`0d45a5b`](https://github.com/apache/spark/commit/0d45a5b7a225e33eedf13ee1f379ea64d6f53c18).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19911
  
**[Test build #4005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4005/testReport)**
 for PR 19911 at commit 
[`40bd8ac`](https://github.com/apache/spark/commit/40bd8acbe6571a9dea39a6ae9083959e00110979).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >