date:20180725

[GitHub] spark issue #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatche...

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21863
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21822
  
**[Test build #93533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93533/testReport)**
 for PR 21822 at commit 
[`f2f1a97`](https://github.com/apache/spark/commit/f2f1a97e447a41e8b9b6c094376d32b32af00991).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21822
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93533/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mi...

2018-07-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21863


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21869
  
**[Test build #93534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93534/testReport)**
 for PR 21869 at commit 
[`a45bf36`](https://github.com/apache/spark/commit/a45bf360d923e8b187f6579b4a73d24a9222198a).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21834
  
**[Test build #93532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93532/testReport)**
 for PR 21834 at commit 
[`1041a38`](https://github.com/apache/spark/commit/1041a38571eb4daf66a23d37d5bf51a1abb8d74c).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21834
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93532/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21869
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21834
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21869
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93534/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21869
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21869
  
cc @maryannxue @gengliangwang 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21834
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21866
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21866
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21869
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21834
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1303/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21834
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21869
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1302/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21866
  
**[Test build #93536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93536/testReport)**
 for PR 21866 at commit 
[`cff6f2a`](https://github.com/apache/spark/commit/cff6f2a0459e8cc4e48f28bde8103ea44ce5a1ab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21834
  
**[Test build #93537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93537/testReport)**
 for PR 21834 at commit 
[`1041a38`](https://github.com/apache/spark/commit/1041a38571eb4daf66a23d37d5bf51a1abb8d74c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21869
  
**[Test build #93535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93535/testReport)**
 for PR 21869 at commit 
[`a45bf36`](https://github.com/apache/spark/commit/a45bf360d923e8b187f6579b4a73d24a9222198a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21803
  
@MaxGekk Please include the test case for SHOW CREATE TABLE. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1304/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21866
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21850
  
Personally, I do not think we need this extra case. 

> If primitive has more opportunities for further optimization.

Could you explain more?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21870: Branch 2.3

2018-07-25 Thread lovezeropython

GitHub user lovezeropython opened a pull request:

https://github.com/apache/spark/pull/21870

Branch 2.3

EOFError
# ConnectionResetError: [Errno 54] Connection reset by peer
(Please fill in changes proposed in this fix)

```
/pyspark.zip/pyspark/worker.py", line 255, in main
if read_int(infile) == SpecialLengths.END_OF_STREAM:
  File 
"/Users/songhao/apps/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py",
 line 683, in read_int
length = stream.read(4)
ConnectionResetError: [Errno 54] Connection reset by peerenter code here
```


![markdown20180725153615](https://user-images.githubusercontent.com/35518020/43186593-ba74b8ea-9021-11e8-8c02-c013d987f204.png)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21870


commit 3737c3d32bb92e73cadaf3b1b9759d9be00b288d
Author: gatorsmile 
Date:   2018-02-13T06:05:13Z

[SPARK-20090][FOLLOW-UP] Revert the deprecation of `names` in PySpark

## What changes were proposed in this pull request?
Deprecating the field `name` in PySpark is not expected. This PR is to 
revert the change.

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #20595 from gatorsmile/removeDeprecate.

(cherry picked from commit 407f67249639709c40c46917700ed6dd736daa7d)
Signed-off-by: hyukjinkwon 

commit 1c81c0c626f115fbfe121ad6f6367b695e9f3b5f
Author: guoxiaolong 
Date:   2018-02-13T12:23:10Z

[SPARK-23384][WEB-UI] When it has no incomplete(completed) applications 
found, the last updated time is not formatted and client local time zone is not 
show in history server web ui.

## What changes were proposed in this pull request?

When it has no incomplete(completed) applications found, the last updated 
time is not formatted and client local time zone is not show in history server 
web ui. It is a bug.

fix before:

![1](https://user-images.githubusercontent.com/26266482/36070635-264d7cf0-0f3a-11e8-8426-14135ffedb16.png)

fix after:

![2](https://user-images.githubusercontent.com/26266482/36070651-8ec3800e-0f3a-11e8-991c-6122cc9539fe.png)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Author: guoxiaolong 

Closes #20573 from guoxiaolongzte/SPARK-23384.

(cherry picked from commit 300c40f50ab4258d697f06a814d1491dc875c847)
Signed-off-by: Sean Owen 

commit dbb1b399b6cf8372a3659c472f380142146b1248
Author: huangtengfei 
Date:   2018-02-13T15:59:21Z

[SPARK-23053][CORE] taskBinarySerialization and task partitions calculate 
in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status

## What changes were proposed in this pull request?

When we run concurrent jobs using the same rdd which is marked to do 
checkpoint. If one job has finished running the job, and start the process of 
RDD.doCheckpoint, while another job is submitted, then submitStage and 
submitMissingTasks will be called. In 
[submitMissingTasks](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L961),
 will serialize taskBinaryBytes and calculate task partitions which are both 
affected by the status of checkpoint, if the former is calculated before 
doCheckpoint finished, while the latter is calculated after doCheckpoint 
finished, when run task, rdd.compute will be called, for some rdds with 
particular partition type such as 
[UnionRDD](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala)
 who will do partition type cast, will get a ClassCastException because the 
part params is actually a CheckpointRDDPartition.
This error occurs  because rdd.doCheckpoint occurs in the same thread that 
called sc.runJob, while the task serialization occurs in the DAGSchedulers 
event loop.

## How was this patch tested?

the exist uts and also add a test case in DAGScheduerSuite to show the 
exception case.

Author: huangtengfei 

Closes #20244 from ivoson/branch-taskpart-mistype.

(cherry picked from commit 091a000d27f324de8c5c527880854ecfcf5de9a4)
Signed-off-by: Imran Rashid 

commit ab01ba718c7752b564e801a1ea546aedc2055dc0
Author: Bogdan Raducanu 
Date:   2018-02-13T17:49:52Z

[SPARK-23316][SQL]

[GitHub] spark issue #21870: Branch 2.3

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21870
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-07-25 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21860
  
@kiszk, I have add a ignore test case to verifies the newly added code 
generation. can you help me to review it if you have some time. thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21870: Branch 2.3

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21870
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21870: Branch 2.3

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21870
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r204977789
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1275,6 +1276,64 @@ object ReplaceExceptWithAntiJoin extends 
Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces logical [[ExceptAll]] operator using a combination of Union, 
Aggregate
+ * and Generate operator.
+ *
+ * Input Query :
+ * {{{
+ *SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2
+ * }}}
+ *
+ * Rewritten Query:
+ * {{{
+ *   SELECT c1
+ *   FROM (
+ * SELECT replicate_rows(sum_val, c1) AS (sum_val, c1)
+ *   FROM (
+ * SELECT c1, cnt, sum_val
--- End diff --

nit: there is no `cnt`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...

2018-07-25 Thread DazhuangSu

Github user DazhuangSu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19691#discussion_r205013855
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -510,40 +511,86 @@ case class AlterTableRenamePartitionCommand(
  *
  * The syntax of this command is:
  * {{{
- *   ALTER TABLE table DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, 
...] [PURGE];
+ *   ALTER TABLE table DROP [IF EXISTS] PARTITION (spec1, expr1)
+ * [, PARTITION (spec2, expr2), ...] [PURGE];
  * }}}
  */
 case class AlterTableDropPartitionCommand(
 tableName: TableIdentifier,
-specs: Seq[TablePartitionSpec],
+partitions: Seq[(TablePartitionSpec, Seq[Expression])],
 ifExists: Boolean,
 purge: Boolean,
 retainData: Boolean)
-  extends RunnableCommand {
+  extends RunnableCommand with PredicateHelper {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 val table = catalog.getTableMetadata(tableName)
+val resolver = sparkSession.sessionState.conf.resolver
 DDLUtils.verifyAlterTableType(catalog, table, isView = false)
 DDLUtils.verifyPartitionProviderIsHive(sparkSession, table, "ALTER 
TABLE DROP PARTITION")
 
-val normalizedSpecs = specs.map { spec =>
-  PartitioningUtils.normalizePartitionSpec(
-spec,
-table.partitionColumnNames,
-table.identifier.quotedString,
-sparkSession.sessionState.conf.resolver)
+val toDrop = partitions.flatMap { partition =>
+  if (partition._1.isEmpty && !partition._2.isEmpty) {
+// There are only expressions in this drop condition.
+extractFromPartitionFilter(partition._2, catalog, table, resolver)
+  } else if (!partition._1.isEmpty && partition._2.isEmpty) {
+// There are only partitionSpecs in this drop condition.
+extractFromPartitionSpec(partition._1, table, resolver)
+  } else if (!partition._1.isEmpty && !partition._2.isEmpty) {
+// This drop condition has both partitionSpecs and expressions.
+extractFromPartitionFilter(partition._2, catalog, table, 
resolver).intersect(
--- End diff --

@mgaido91 I understand your point, yes it would be inefficient. I will work 
on this soon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21869
  
LGTM, pending Jenkins.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205020970
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
@@ -71,9 +80,22 @@ private[parquet] class ParquetReadSupport(val convertTz: 
Option[TimeZone])
   StructType.fromString(schemaString)
 }
 
-val parquetRequestedSchema =
+val clippedParquetSchema =
   ParquetReadSupport.clipParquetSchema(context.getFileSchema, 
catalystRequestedSchema)
 
+val parquetRequestedSchema = if (parquetMrCompatibility) {
+  // Parquet-mr will throw an exception if we try to read a superset 
of the file's schema.
--- End diff --

This change is not part of this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205021140
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
 ---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.execution.FileSchemaPruningTest
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSQLContext
+
+class ParquetSchemaPruningSuite
+extends QueryTest
+with ParquetTest
+with FileSchemaPruningTest
+with SharedSQLContext {
--- End diff --

No comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205021282
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that projects an expression over a given schema. Data 
types,
+ * field indexes and field counts of complex type extractors and attributes
+ * are adjusted to fit the schema. All other expressions are left as-is. 
This
+ * class is motivated by columnar nested schema pruning.
+ */
+case class ProjectionOverSchema(schema: StructType) {
--- End diff --

Okay. So...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205021469
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
 ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.planning.{PhysicalOperation, 
ProjectionOverSchema, SelectedField}
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{ArrayType, DataType, MapType, 
StructField, StructType}
+
+/**
+ * Prunes unnecessary Parquet columns given a [[PhysicalOperation]] over a
+ * [[ParquetRelation]]. By "Parquet column", we mean a column as defined 
in the
+ * Parquet format. In Spark SQL, a root-level Parquet column corresponds 
to a
+ * SQL column, and a nested Parquet column corresponds to a 
[[StructField]].
+ */
+private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan =
+if (SQLConf.get.nestedSchemaPruningEnabled) {
+  apply0(plan)
+} else {
+  plan
+}
+
+  private def apply0(plan: LogicalPlan): LogicalPlan =
+plan transformDown {
+  case op @ PhysicalOperation(projects, filters,
+  l @ LogicalRelation(hadoopFsRelation @ HadoopFsRelation(_, _,
+dataSchema, _, parquetFormat: ParquetFileFormat, _), _, _, _)) 
=>
+val projectionRootFields = projects.flatMap(getRootFields)
+val filterRootFields = filters.flatMap(getRootFields)
+val requestedRootFields = (projectionRootFields ++ 
filterRootFields).distinct
+
+// If [[requestedRootFields]] includes a nested field, continue. 
Otherwise,
+// return [[op]]
+if (requestedRootFields.exists { case RootField(_, derivedFromAtt) 
=> !derivedFromAtt }) {
+  val prunedSchema = requestedRootFields
+.map { case RootField(field, _) => StructType(Array(field)) }
+.reduceLeft(_ merge _)
+  val dataSchemaFieldNames = dataSchema.fieldNames.toSet
+  val prunedDataSchema =
+StructType(prunedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name)))
+
+  // If the data schema is different from the pruned data schema, 
continue. Otherwise,
+  // return [[op]]. We effect this comparison by counting the 
number of "leaf" fields in
+  // each schemata, assuming the fields in [[prunedDataSchema]] 
are a subset of the fields
+  // in [[dataSchema]].
+  if (countLeaves(dataSchema) > countLeaves(prunedDataSchema)) {
+val prunedParquetRelation =
+  hadoopFsRelation.copy(dataSchema = 
prunedDataSchema)(hadoopFsRelation.sparkSession)
+
+// We need to replace the expression ids of the pruned 
relation output attributes
+// with the expression ids of the original relation output 
attributes so that
+// references to the original relation's output are not broken
+val outputIdMap = l.output.map(att => (att.name, 
att.exprId)).toMap
+val prunedRelationOutput =
+  prunedParquetRelation
+.schema
+.toAttributes
+.map {
+  case att if outputIdMap.contains(att.name) =>
+att.withExprId(outputIdMap(att.name))
+  case att => att
+}
+val prunedRelation =
+  l.copy(relation = prunedParquetRe

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205021712
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
 ---
@@ -0,0 +1,388 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.scalatest.BeforeAndAfterAll
+import org.scalatest.exceptions.TestFailedException
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.NamedExpression
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types._
+
+// scalastyle:off line.size.limit
--- End diff --

No comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205022799
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala
 ---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.execution.FileSchemaPruningTest
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSQLContext
+
+class ParquetSchemaPruningSuite
+extends QueryTest
+with ParquetTest
+with FileSchemaPruningTest
+with SharedSQLContext {
+  case class FullName(first: String, middle: String, last: String)
+  case class Contact(name: FullName, address: String, pets: Int, friends: 
Array[FullName] = Array(),
+relatives: Map[String, FullName] = Map())
+
+  val contacts =
+Contact(FullName("Jane", "X.", "Doe"), "123 Main Street", 1) ::
+Contact(FullName("John", "Y.", "Doe"), "321 Wall Street", 3) :: Nil
+
+  case class Name(first: String, last: String)
+  case class BriefContact(name: Name, address: String)
+
+  val briefContacts =
+BriefContact(Name("Janet", "Jones"), "567 Maple Drive") ::
+BriefContact(Name("Jim", "Jones"), "6242 Ash Street") :: Nil
+
+  case class ContactWithDataPartitionColumn(name: FullName, address: 
String, pets: Int,
+friends: Array[FullName] = Array(), relatives: Map[String, FullName] = 
Map(), p: Int)
+
+  case class BriefContactWithDataPartitionColumn(name: Name, address: 
String, p: Int)
+
+  val contactsWithDataPartitionColumn =
+contacts.map { case Contact(name, address, pets, friends, relatives) =>
+  ContactWithDataPartitionColumn(name, address, pets, friends, 
relatives, 1) }
+  val briefContactsWithDataPartitionColumn =
+briefContacts.map { case BriefContact(name: Name, address: String) =>
+  BriefContactWithDataPartitionColumn(name, address, 2) }
+
+  testSchemaPruning("select a single complex field") {
+val query = sql("select name.middle from contacts")
+checkScanSchemata(query, "struct>")
+checkAnswer(query, Row("X.") :: Row("Y.") :: Row(null) :: Row(null) :: 
Nil)
+  }
+
+  testSchemaPruning("select a single complex field and the partition 
column") {
+val query = sql("select name.middle, p from contacts")
+checkScanSchemata(query, "struct>")
+checkAnswer(query, Row("X.", 1) :: Row("Y.", 1) :: Row(null, 2) :: 
Row(null, 2) :: Nil)
+  }
+
+  ignore("partial schema intersection - select missing subfield") {
+val query = sql("select name.middle, address from contacts where p=2")
+checkScanSchemata(query, 
"struct,address:string>")
+checkAnswer(query,
+  Row(null, "567 Maple Drive") ::
+  Row(null, "6242 Ash Street") :: Nil)
+  }
+
+  testSchemaPruning("no unnecessary schema pruning") {
+val query =
+  sql("select name.last, name.middle, name.first, relatives[''].last, 
" +
+"relatives[''].middle, relatives[''].first, friends[0].last, 
friends[0].middle, " +
+"friends[0].first, pets, address from contacts where p=2")
+// We've selected every field in the schema. Therefore, no schema 
pruning should be performed.
+// We check this by asserting that the scanned schema of the query is 
identical to the schema
+// of the contacts relation, even though the fields are selected in 
different orders.
+checkScanSchemata(query,
+  
"struct,address:string,pets:int,"
 +
+  "friends:array>," +
+  
"relatives:map>>")
+checkAnswer(query,
+  Row("Jones", null, "Janet", null, null, null, null, null, null, 
null, "567 Maple Drive") ::
+  Row("Jones", null, "Jim", null, null, null, null, null, null, null, 
"6242 Ash

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205022895
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
 ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.planning.{PhysicalOperation, 
ProjectionOverSchema, SelectedField}
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{ArrayType, DataType, MapType, 
StructField, StructType}
+
+/**
+ * Prunes unnecessary Parquet columns given a [[PhysicalOperation]] over a
+ * [[ParquetRelation]]. By "Parquet column", we mean a column as defined 
in the
+ * Parquet format. In Spark SQL, a root-level Parquet column corresponds 
to a
+ * SQL column, and a nested Parquet column corresponds to a 
[[StructField]].
+ */
+private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan =
+if (SQLConf.get.nestedSchemaPruningEnabled) {
+  apply0(plan)
+} else {
+  plan
+}
+
+  private def apply0(plan: LogicalPlan): LogicalPlan =
+plan transformDown {
+  case op @ PhysicalOperation(projects, filters,
+  l @ LogicalRelation(hadoopFsRelation @ HadoopFsRelation(_, _,
+dataSchema, _, parquetFormat: ParquetFileFormat, _), _, _, _)) 
=>
+val projectionRootFields = projects.flatMap(getRootFields)
+val filterRootFields = filters.flatMap(getRootFields)
+val requestedRootFields = (projectionRootFields ++ 
filterRootFields).distinct
+
+// If [[requestedRootFields]] includes a nested field, continue. 
Otherwise,
+// return [[op]]
+if (requestedRootFields.exists { case RootField(_, derivedFromAtt) 
=> !derivedFromAtt }) {
+  val prunedSchema = requestedRootFields
+.map { case RootField(field, _) => StructType(Array(field)) }
+.reduceLeft(_ merge _)
+  val dataSchemaFieldNames = dataSchema.fieldNames.toSet
+  val prunedDataSchema =
+StructType(prunedSchema.filter(f => 
dataSchemaFieldNames.contains(f.name)))
+
+  // If the data schema is different from the pruned data schema, 
continue. Otherwise,
+  // return [[op]]. We effect this comparison by counting the 
number of "leaf" fields in
+  // each schemata, assuming the fields in [[prunedDataSchema]] 
are a subset of the fields
+  // in [[dataSchema]].
+  if (countLeaves(dataSchema) > countLeaves(prunedDataSchema)) {
--- End diff --

No comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...

2018-07-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21850#discussion_r205022884
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -414,6 +414,16 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case CaseWhen(branches, elseValue) if branches.length == 1 =>
+// Using pattern matching like `CaseWhen((cond, branchValue) :: 
Nil, elseValue)` will not
+// work since the implementation of `branches` can be 
`ArrayBuffer`. A full test is in
--- End diff --

How about:

```scala
case CaseWhen(Seq((cond, trueValue)), elseValue) =>
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21139: [SPARK-24066][SQL]Add a window exchange rule to eliminat...

2018-07-25 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/21139
  
@hvanhovell, can you help review it again if you have some time, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205022974
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala
 ---
@@ -0,0 +1,388 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.scalatest.BeforeAndAfterAll
+import org.scalatest.exceptions.TestFailedException
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.NamedExpression
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.types._
+
+// scalastyle:off line.size.limit
+class SelectedFieldSuite extends SparkFunSuite with BeforeAndAfterAll {
+  // The test schema as a tree string, i.e. `schema.treeString`
+  // root
+  //  |-- col1: string (nullable = false)
+  //  |-- col2: struct (nullable = true)
+  //  ||-- field1: integer (nullable = true)
+  //  ||-- field2: array (nullable = true)
+  //  |||-- element: integer (containsNull = false)
+  //  ||-- field3: array (nullable = false)
+  //  |||-- element: struct (containsNull = true)
+  //  ||||-- subfield1: integer (nullable = true)
+  //  ||||-- subfield2: integer (nullable = true)
+  //  ||||-- subfield3: array (nullable = true)
+  //  |||||-- element: integer (containsNull = true)
+  //  ||-- field4: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: struct (valueContainsNull = false)
+  //  ||||-- subfield1: integer (nullable = true)
+  //  ||||-- subfield2: array (nullable = true)
+  //  |||||-- element: integer (containsNull = false)
+  //  ||-- field5: array (nullable = false)
+  //  |||-- element: struct (containsNull = true)
+  //  ||||-- subfield1: struct (nullable = false)
+  //  |||||-- subsubfield1: integer (nullable = true)
+  //  |||||-- subsubfield2: integer (nullable = true)
+  //  ||||-- subfield2: struct (nullable = true)
+  //  |||||-- subsubfield1: struct (nullable = true)
+  //  ||||||-- subsubsubfield1: string (nullable = 
true)
+  //  |||||-- subsubfield2: integer (nullable = true)
+  //  ||-- field6: struct (nullable = true)
+  //  |||-- subfield1: string (nullable = false)
+  //  |||-- subfield2: string (nullable = true)
+  //  ||-- field7: struct (nullable = true)
+  //  |||-- subfield1: struct (nullable = true)
+  //  ||||-- subsubfield1: integer (nullable = true)
+  //  ||||-- subsubfield2: integer (nullable = true)
+  //  ||-- field8: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: array (valueContainsNull = false)
+  //  ||||-- element: struct (containsNull = true)
+  //  |||||-- subfield1: integer (nullable = true)
+  //  |||||-- subfield2: array (nullable = true)
+  //  ||||||-- element: integer (containsNull = false)
+  //  ||-- field9: map (nullable = true)
+  //  |||-- key: string
+  //  |||-- value: integer (valueContainsNull = false)
+  //  |-- col3: array (nullable = false)
+  //  ||-- element: struct (containsNull = false)
+  //  |||-- field1: struct (nullable = true)
+  //  ||||-- subfield1: integer (nullable = false)
+  //  ||||-- subfield2: integer (nullable = true)
+  //  |||-- field2: map (nullable = true)
+  //  ||||-- key: string
+  //  ||||-- value: integer (valueContainsN

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21320
  
> Regarding #21320 (comment), can you at least set this enable by default 
and see if some existing tests are broken or not?

I have no intention to at this point, no.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21650#discussion_r205025311
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5060,6 +5049,147 @@ def test_type_annotation(self):
 df = self.spark.range(1).select(pandas_udf(f=_locals['noop'], 
returnType='bigint')('id'))
 self.assertEqual(df.first()[0], 0)
 
+def test_mixed_udf(self):
+import pandas as pd
+from pyspark.sql.functions import col, udf, pandas_udf
+
+df = self.spark.range(0, 1).toDF('v')
+
+# Test mixture of multiple UDFs and Pandas UDFs
+
+@udf('int')
+def f1(x):
+assert type(x) == int
+return x + 1
+
+@pandas_udf('int')
+def f2(x):
+assert type(x) == pd.Series
+return x + 10
+
+@udf('int')
+def f3(x):
+assert type(x) == int
+return x + 100
+
+@pandas_udf('int')
+def f4(x):
+assert type(x) == pd.Series
+return x + 1000
+
+# Test mixed udfs in a single projection
+df1 = df \
+.withColumn('f1', f1(col('v'))) \
+.withColumn('f2', f2(col('v'))) \
+.withColumn('f3', f3(col('v'))) \
+.withColumn('f4', f4(col('v'))) \
+.withColumn('f2_f1', f2(col('f1'))) \
+.withColumn('f3_f1', f3(col('f1'))) \
+.withColumn('f4_f1', f4(col('f1'))) \
+.withColumn('f3_f2', f3(col('f2'))) \
+.withColumn('f4_f2', f4(col('f2'))) \
+.withColumn('f4_f3', f4(col('f3'))) \
+.withColumn('f3_f2_f1', f3(col('f2_f1'))) \
+.withColumn('f4_f2_f1', f4(col('f2_f1'))) \
+.withColumn('f4_f3_f1', f4(col('f3_f1'))) \
+.withColumn('f4_f3_f2', f4(col('f3_f2'))) \
+.withColumn('f4_f3_f2_f1', f4(col('f3_f2_f1')))
+
+# Test mixed udfs in a single expression
+df2 = df \
+.withColumn('f1', f1(col('v'))) \
+.withColumn('f2', f2(col('v'))) \
+.withColumn('f3', f3(col('v'))) \
+.withColumn('f4', f4(col('v'))) \
+.withColumn('f2_f1', f2(f1(col('v' \
+.withColumn('f3_f1', f3(f1(col('v' \
+.withColumn('f4_f1', f4(f1(col('v' \
+.withColumn('f3_f2', f3(f2(col('v' \
+.withColumn('f4_f2', f4(f2(col('v' \
+.withColumn('f4_f3', f4(f3(col('v' \
+.withColumn('f3_f2_f1', f3(f2(f1(col('v') \
+.withColumn('f4_f2_f1', f4(f2(f1(col('v') \
+.withColumn('f4_f3_f1', f4(f3(f1(col('v') \
+.withColumn('f4_f3_f2', f4(f3(f2(col('v') \
+.withColumn('f4_f3_f2_f1', f4(f3(f2(f1(col('v'))
+
+# expected result
+df3 = df \
+.withColumn('f1', df['v'] + 1) \
+.withColumn('f2', df['v'] + 10) \
+.withColumn('f3', df['v'] + 100) \
+.withColumn('f4', df['v'] + 1000) \
+.withColumn('f2_f1', df['v'] + 11) \
+.withColumn('f3_f1', df['v'] + 101) \
+.withColumn('f4_f1', df['v'] + 1001) \
+.withColumn('f3_f2', df['v'] + 110) \
+.withColumn('f4_f2', df['v'] + 1010) \
+.withColumn('f4_f3', df['v'] + 1100) \
+.withColumn('f3_f2_f1', df['v'] + 111) \
+.withColumn('f4_f2_f1', df['v'] + 1011) \
+.withColumn('f4_f3_f1', df['v'] + 1101) \
+.withColumn('f4_f3_f2', df['v'] + 1110) \
+.withColumn('f4_f3_f2_f1', df['v'] + )
+
+self.assertEquals(df3.collect(), df1.collect())
+self.assertEquals(df3.collect(), df2.collect())
+
+def test_mixed_udf_and_sql(self):
+import pandas as pd
+from pyspark.sql.functions import udf, pandas_udf
+
+df = self.spark.range(0, 1).toDF('v')
+
+# Test mixture of UDFs, Pandas UDFs and SQL expression.
+
+@udf('int')
+def f1(x):
+assert type(x) == int
+return x + 1
+
+def f2(x):
+return x + 10
+
+@pandas_udf('int')
+def f3(x):
+assert type(x) == pd.Series
+return x + 100
+
+df1 = df.withColumn('f1', f1(df['v'])) \
+.withColumn('f2', f2(df['v'])) \
+.withColumn('f3', f3(df['v'])) \
+.withColumn('f1_f2', f1(f2(df['v']))) \
+.withColumn('f1_f3', f1(f3(df['v']))) \
+.w

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21650#discussion_r205024958
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5060,6 +5049,147 @@ def test_type_annotation(self):
 df = self.spark.range(1).select(pandas_udf(f=_locals['noop'], 
returnType='bigint')('id'))
 self.assertEqual(df.first()[0], 0)
 
+def test_mixed_udf(self):
+import pandas as pd
+from pyspark.sql.functions import col, udf, pandas_udf
+
+df = self.spark.range(0, 1).toDF('v')
+
+# Test mixture of multiple UDFs and Pandas UDFs
+
+@udf('int')
+def f1(x):
+assert type(x) == int
+return x + 1
+
+@pandas_udf('int')
+def f2(x):
+assert type(x) == pd.Series
+return x + 10
+
+@udf('int')
+def f3(x):
+assert type(x) == int
+return x + 100
+
+@pandas_udf('int')
+def f4(x):
+assert type(x) == pd.Series
+return x + 1000
+
+# Test mixed udfs in a single projection
+df1 = df \
+.withColumn('f1', f1(col('v'))) \
+.withColumn('f2', f2(col('v'))) \
+.withColumn('f3', f3(col('v'))) \
+.withColumn('f4', f4(col('v'))) \
+.withColumn('f2_f1', f2(col('f1'))) \
+.withColumn('f3_f1', f3(col('f1'))) \
--- End diff --

This looks testing udf + udf


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21650#discussion_r205025755
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5487,6 +5617,22 @@ def dummy_pandas_udf(df):
  F.col('temp0.key') == 
F.col('temp1.key'))
 self.assertEquals(res.count(), 5)
 
+def test_mixed_scalar_udfs_followed_by_grouby_apply(self):
+# Test Pandas UDF and scalar Python UDF followed by groupby apply
+from pyspark.sql.functions import udf, pandas_udf, PandasUDFType
+import pandas as pd
--- End diff --

not a big deal at all really .. but I would swap the import order 
(thridparty, pyspark)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21650#discussion_r205030035
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -5060,6 +5049,147 @@ def test_type_annotation(self):
 df = self.spark.range(1).select(pandas_udf(f=_locals['noop'], 
returnType='bigint')('id'))
 self.assertEqual(df.first()[0], 0)
 
+def test_mixed_udf(self):
+import pandas as pd
+from pyspark.sql.functions import col, udf, pandas_udf
+
+df = self.spark.range(0, 1).toDF('v')
+
+# Test mixture of multiple UDFs and Pandas UDFs
+
+@udf('int')
+def f1(x):
+assert type(x) == int
+return x + 1
+
+@pandas_udf('int')
+def f2(x):
+assert type(x) == pd.Series
+return x + 10
+
+@udf('int')
+def f3(x):
+assert type(x) == int
+return x + 100
+
+@pandas_udf('int')
+def f4(x):
+assert type(x) == pd.Series
+return x + 1000
+
+# Test mixed udfs in a single projection
+df1 = df \
+.withColumn('f1', f1(col('v'))) \
+.withColumn('f2', f2(col('v'))) \
+.withColumn('f3', f3(col('v'))) \
+.withColumn('f4', f4(col('v'))) \
+.withColumn('f2_f1', f2(col('f1'))) \
+.withColumn('f3_f1', f3(col('f1'))) \
+.withColumn('f4_f1', f4(col('f1'))) \
+.withColumn('f3_f2', f3(col('f2'))) \
+.withColumn('f4_f2', f4(col('f2'))) \
+.withColumn('f4_f3', f4(col('f3'))) \
+.withColumn('f3_f2_f1', f3(col('f2_f1'))) \
+.withColumn('f4_f2_f1', f4(col('f2_f1'))) \
+.withColumn('f4_f3_f1', f4(col('f3_f1'))) \
+.withColumn('f4_f3_f2', f4(col('f3_f2'))) \
+.withColumn('f4_f3_f2_f1', f4(col('f3_f2_f1')))
+
+# Test mixed udfs in a single expression
+df2 = df \
+.withColumn('f1', f1(col('v'))) \
+.withColumn('f2', f2(col('v'))) \
+.withColumn('f3', f3(col('v'))) \
+.withColumn('f4', f4(col('v'))) \
+.withColumn('f2_f1', f2(f1(col('v' \
+.withColumn('f3_f1', f3(f1(col('v' \
+.withColumn('f4_f1', f4(f1(col('v' \
+.withColumn('f3_f2', f3(f2(col('v' \
+.withColumn('f4_f2', f4(f2(col('v' \
+.withColumn('f4_f3', f4(f3(col('v' \
+.withColumn('f3_f2_f1', f3(f2(f1(col('v') \
+.withColumn('f4_f2_f1', f4(f2(f1(col('v') \
+.withColumn('f4_f3_f1', f4(f3(f1(col('v') \
+.withColumn('f4_f3_f2', f4(f3(f2(col('v') \
+.withColumn('f4_f3_f2_f1', f4(f3(f2(f1(col('v'))
+
+# expected result
+df3 = df \
+.withColumn('f1', df['v'] + 1) \
+.withColumn('f2', df['v'] + 10) \
+.withColumn('f3', df['v'] + 100) \
+.withColumn('f4', df['v'] + 1000) \
+.withColumn('f2_f1', df['v'] + 11) \
+.withColumn('f3_f1', df['v'] + 101) \
+.withColumn('f4_f1', df['v'] + 1001) \
+.withColumn('f3_f2', df['v'] + 110) \
+.withColumn('f4_f2', df['v'] + 1010) \
+.withColumn('f4_f3', df['v'] + 1100) \
+.withColumn('f3_f2_f1', df['v'] + 111) \
+.withColumn('f4_f2_f1', df['v'] + 1011) \
+.withColumn('f4_f3_f1', df['v'] + 1101) \
+.withColumn('f4_f3_f2', df['v'] + 1110) \
+.withColumn('f4_f3_f2_f1', df['v'] + )
+
+self.assertEquals(df3.collect(), df1.collect())
+self.assertEquals(df3.collect(), df2.collect())
+
+def test_mixed_udf_and_sql(self):
+import pandas as pd
+from pyspark.sql.functions import udf, pandas_udf
+
+df = self.spark.range(0, 1).toDF('v')
+
+# Test mixture of UDFs, Pandas UDFs and SQL expression.
+
+@udf('int')
+def f1(x):
+assert type(x) == int
+return x + 1
+
+def f2(x):
--- End diff --

Ah, I see why it looks confusing. Can we add an assert here too (check if 
it's a column)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21320
  
> gentle ping @mallman since the code freeze is close

Outside of my primary occupation, my top priority on this PR right now is 
investigating 
https://github.com/apache/spark/pull/21320#issuecomment-396498487. I don't 
think I'm going to get a test file from the OP, so I'm going to try to 
reproduce the issue on my own.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...

2018-07-25 Thread crafty-coder

Github user crafty-coder commented on the issue:

https://github.com/apache/spark/pull/20949
  
@HyukjinKwon and @MaxGekk thanks for your help in this PR!

My JIRA Id is also **crafty-coder**



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatche...

2018-07-25 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21863
  
a late LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21871: [SPARK-24916][SQL] Fix type coercion for IN expre...

2018-07-25 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/21871

[SPARK-24916][SQL] Fix type coercion for IN expression with subquery

## What changes were proposed in this pull request?

The below SQL will throw `AnalysisException`. but it can success on Spark 
2.1.x. This pr fix this issue.
```sql
CREATE TEMPORARY VIEW t4 AS SELECT * FROM VALUES
  (CAST(1 AS DOUBLE), CAST(2 AS STRING), CAST(3 AS STRING))
AS t1(t4a, t4b, t4c);

CREATE TEMPORARY VIEW t5 AS SELECT * FROM VALUES
  (CAST(1 AS DECIMAL(18, 0)), CAST(2 AS STRING), CAST(3 AS BIGINT))
AS t1(t5a, t5b, t5c);

SELECT * FROM t4
WHERE
(t4a, t4b, t4c) IN (SELECT t5a, t5b, t5c FROM t5);
```

## How was this patch tested?

unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-24916

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21871.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21871


commit d60185b159ffc1a0d74a8c5bfba0c11ceac4241b
Author: Yuming Wang 
Date:   2018-07-25T07:33:45Z

default: findCommonTypeForBinaryComparison(l.dataType, r.dataType, 
conf).orElse(findTightestCommonType(l.dataType, r.dataType))

commit 8a118b5bdf63a7f4b0f0033c0783aa220c9c1eb1
Author: Yuming Wang 
Date:   2018-07-25T07:37:03Z

findCommonTypeForBinaryComparison(l.dataType, r.dataType, 
conf).orElse(findWiderTypeForTwo(l.dataType, r.dataType))

commit c306810f0a0e701e6b46434db75bbd9813c7337c
Author: Yuming Wang 
Date:   2018-07-25T07:39:40Z

findWiderTypeForTwo(l.dataType, r.dataType)

commit daa120e15153c77c17a7966df7b727fcea4bb02b
Author: Yuming Wang 
Date:   2018-07-25T07:42:46Z

findCommonTypeForBinaryComparison(l.dataType, r.dataType, 
conf).orElse(findWiderTypeWithoutStringPromotionForTwo(l.dataType, r.dataType))

commit c84ba4d9823a50953f560e110638a9d4e094b17a
Author: Yuming Wang 
Date:   2018-07-25T07:45:43Z

findWiderTypeWithoutStringPromotionForTwo(l.dataType, r.dataType)

commit bc41b99b7548a22db1ed278fda1c741fd08b78ef
Author: Yuming Wang 
Date:   2018-07-25T08:03:26Z

findCommonTypeForBinaryComparison(l.dataType, r.dataType, 
conf).orElse(findTightestCommonType(l.dataType, 
r.dataType)).orElse(findWiderTypeForDecimal(l.dataType, r.dataType))

commit 8ef142f78c22b980fe60d836c56d7d18d221a958
Author: Yuming Wang 
Date:   2018-07-25T09:27:51Z

Fix type coercion for IN expression with subquery




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21871: [SPARK-24916][SQL] Fix type coercion for IN expression w...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21871
  
**[Test build #93538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93538/testReport)**
 for PR 21871 at commit 
[`8ef142f`](https://github.com/apache/spark/commit/8ef142f78c22b980fe60d836c56d7d18d221a958).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21871: [SPARK-24916][SQL] Fix type coercion for IN expression w...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21871
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1305/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21871: [SPARK-24916][SQL] Fix type coercion for IN expression w...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21871
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-07-25 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21403
  
@maryannxue that is feasible too and indeed it was the original 
implementation I did, I switched to this approach according to [this 
discussion](https://github.com/apache/spark/pull/21403#discussion_r198826199): 
the goal was to avoid to change the `In` signature and the parsing logic in the 
many places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...

2018-07-25 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19691#discussion_r205047473
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -510,40 +511,86 @@ case class AlterTableRenamePartitionCommand(
  *
  * The syntax of this command is:
  * {{{
- *   ALTER TABLE table DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, 
...] [PURGE];
+ *   ALTER TABLE table DROP [IF EXISTS] PARTITION (spec1, expr1)
+ * [, PARTITION (spec2, expr2), ...] [PURGE];
  * }}}
  */
 case class AlterTableDropPartitionCommand(
 tableName: TableIdentifier,
-specs: Seq[TablePartitionSpec],
+partitions: Seq[(TablePartitionSpec, Seq[Expression])],
 ifExists: Boolean,
 purge: Boolean,
 retainData: Boolean)
-  extends RunnableCommand {
+  extends RunnableCommand with PredicateHelper {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 val table = catalog.getTableMetadata(tableName)
+val resolver = sparkSession.sessionState.conf.resolver
 DDLUtils.verifyAlterTableType(catalog, table, isView = false)
 DDLUtils.verifyPartitionProviderIsHive(sparkSession, table, "ALTER 
TABLE DROP PARTITION")
 
-val normalizedSpecs = specs.map { spec =>
-  PartitioningUtils.normalizePartitionSpec(
-spec,
-table.partitionColumnNames,
-table.identifier.quotedString,
-sparkSession.sessionState.conf.resolver)
+val toDrop = partitions.flatMap { partition =>
+  if (partition._1.isEmpty && !partition._2.isEmpty) {
+// There are only expressions in this drop condition.
+extractFromPartitionFilter(partition._2, catalog, table, resolver)
+  } else if (!partition._1.isEmpty && partition._2.isEmpty) {
+// There are only partitionSpecs in this drop condition.
+extractFromPartitionSpec(partition._1, table, resolver)
+  } else if (!partition._1.isEmpty && !partition._2.isEmpty) {
+// This drop condition has both partitionSpecs and expressions.
+extractFromPartitionFilter(partition._2, catalog, table, 
resolver).intersect(
--- End diff --

thank you @DazhuangSu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21403
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1306/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21403
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21403
  
**[Test build #93539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93539/testReport)**
 for PR 21403 at commit 
[`0412829`](https://github.com/apache/spark/commit/04128292e6d145ec608166b532c960cac72a500c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21871: [SPARK-24916][SQL] Fix type coercion for IN expression w...

2018-07-25 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21871
  
I think this is basically the same of what I proposed in 
https://github.com/apache/spark/pull/19635. Unfortunately, that PR got a bit 
stuck...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21858: [SPARK-24899][SQL][DOC] Add example of monotonica...

2018-07-25 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/21858#discussion_r205058875
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1150,16 +1150,48 @@ object functions {
   /**
* A column expression that generates monotonically increasing 64-bit 
integers.
*
-   * The generated ID is guaranteed to be monotonically increasing and 
unique, but not consecutive.
+   * The generated IDs are guaranteed to be monotonically increasing and 
unique, but not
+   * consecutive (unless all rows are in the same single partition which 
you rarely want due to
+   * the volume of the data).
* The current implementation puts the partition ID in the upper 31 
bits, and the record number
* within each partition in the lower 33 bits. The assumption is that 
the data frame has
* less than 1 billion partitions, and each partition has less than 8 
billion records.
*
-   * As an example, consider a `DataFrame` with two partitions, each with 
3 records.
-   * This expression would return the following IDs:
-   *
* {{{
-   * 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
+   * // Create a dataset with four partitions, each with two rows.
+   * val q = spark.range(start = 0, end = 8, step = 1, numPartitions = 4)
+   *
+   * // Make sure that every partition has the same number of rows
+   * q.mapPartitions(rows => Iterator(rows.size)).foreachPartition(rows => 
assert(rows.next == 2))
+   * q.select(monotonically_increasing_id).show
--- End diff --

I thought about explaining the "internals" of the operator through a more 
involved example and actually thought about removing the line 1166 (but 
forgot). I think the following lines make for a very in-depth explanation and 
use other operators in use.

In other words, I'm in favour of removing the line 1166 and leaving the 
others with no changes. Possible?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21803
  
**[Test build #93540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93540/testReport)**
 for PR 21803 at commit 
[`60f663d`](https://github.com/apache/spark/commit/60f663d7b12fcb3141eff774a9120f049d837112).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21869
  
**[Test build #93535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93535/testReport)**
 for PR 21869 at commit 
[`a45bf36`](https://github.com/apache/spark/commit/a45bf360d923e8b187f6579b4a73d24a9222198a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21869
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21869
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93535/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21650#discussion_r205061160
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -94,36 +95,59 @@ object ExtractPythonUDFFromAggregate extends 
Rule[LogicalPlan] {
  */
 object ExtractPythonUDFs extends Rule[SparkPlan] with PredicateHelper {
 
-  private def hasPythonUDF(e: Expression): Boolean = {
+  private def hasScalarPythonUDF(e: Expression): Boolean = {
 e.find(PythonUDF.isScalarPythonUDF).isDefined
   }
 
-  private def canEvaluateInPython(e: PythonUDF): Boolean = {
-e.children match {
-  // single PythonUDF child could be chained and evaluated in Python
-  case Seq(u: PythonUDF) => canEvaluateInPython(u)
-  // Python UDF can't be evaluated directly in JVM
-  case children => !children.exists(hasPythonUDF)
+  private def canEvaluateInPython(e: PythonUDF, evalType: Int): Boolean = {
+if (e.evalType != evalType) {
--- End diff --

Can we rename this function or write a comment since Scalar both Vectorized 
UDF and normal UDF can be evaluated in Python each but it returns `false` in 
this case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21650
  
I'm okay with 
https://github.com/apache/spark/pull/21650#issuecomment-407506043's way too but 
should be really simplified. Either way LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21858: [SPARK-24899][SQL][DOC] Add example of monotonica...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21858#discussion_r205062686
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1150,16 +1150,48 @@ object functions {
   /**
* A column expression that generates monotonically increasing 64-bit 
integers.
*
-   * The generated ID is guaranteed to be monotonically increasing and 
unique, but not consecutive.
+   * The generated IDs are guaranteed to be monotonically increasing and 
unique, but not
+   * consecutive (unless all rows are in the same single partition which 
you rarely want due to
+   * the volume of the data).
* The current implementation puts the partition ID in the upper 31 
bits, and the record number
* within each partition in the lower 33 bits. The assumption is that 
the data frame has
* less than 1 billion partitions, and each partition has less than 8 
billion records.
*
-   * As an example, consider a `DataFrame` with two partitions, each with 
3 records.
-   * This expression would return the following IDs:
-   *
* {{{
-   * 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
+   * // Create a dataset with four partitions, each with two rows.
+   * val q = spark.range(start = 0, end = 8, step = 1, numPartitions = 4)
+   *
+   * // Make sure that every partition has the same number of rows
+   * q.mapPartitions(rows => Iterator(rows.size)).foreachPartition(rows => 
assert(rows.next == 2))
+   * q.select(monotonically_increasing_id).show
--- End diff --

I know you're exploring the internals but .. to be honest I was wondering 
if users are usually interested in such in-deep explanation since I guess most 
of them wouldn't care about the details.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21320
  
> I have no intention to at this point, no.

Yup, but I guess we should do when we are about to be complete to avoid 
breaking things by switching this feature on.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205063684
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that projects an expression over a given schema. Data 
types,
+ * field indexes and field counts of complex type extractors and attributes
+ * are adjusted to fit the schema. All other expressions are left as-is. 
This
+ * class is motivated by columnar nested schema pruning.
+ */
+case class ProjectionOverSchema(schema: StructType) {
--- End diff --

can we move it to `catalyst`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21320
  
Few comments like 
https://github.com/apache/spark/pull/21320#discussion_r203933307 are not minor 
or nits. I leave hard -1 if they are not addressed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...

2018-07-25 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21789
  
LGTM, I checked and the same hack is done also in Hive.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21866
  
**[Test build #93536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93536/testReport)**
 for PR 21866 at commit 
[`cff6f2a`](https://github.com/apache/spark/commit/cff6f2a0459e8cc4e48f28bde8103ea44ce5a1ab).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21866
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93536/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21871: [SPARK-24916][SQL] Fix type coercion for IN expre...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21871#discussion_r205068186
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/inConversion.sql
 ---
@@ -328,3 +328,159 @@ SELECT cast('2017-12-12 09:30:00' as date) in 
(cast('2017-12-12 09:30:00' as dat
 SELECT cast('2017-12-12 09:30:00' as date) in (cast('2017-12-12 09:30:00' 
as date), cast(1 as boolean)) FROM t;
 SELECT cast('2017-12-12 09:30:00' as date) in (cast('2017-12-12 09:30:00' 
as date), cast('2017-12-11 09:30:00.0' as timestamp)) FROM t;
 SELECT cast('2017-12-12 09:30:00' as date) in (cast('2017-12-12 09:30:00' 
as date), cast('2017-12-11 09:30:00' as date)) FROM t;
+
+SELECT * FROM t WHERE (cast(1 as tinyint)) IN (SELECT cast(1 as tinyint) 
FROM t);
--- End diff --

Do we really need to test all the combinations? We need most of such logics 
should be tested in `findWiderTypeWithoutStringPromotionForTwo` and we could 
have just few end to end tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21870: Branch 2.3

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21870
  
@lovezeropython, we usually file an issue in JIRA (please see 
https://spark.apache.org/contributing.html) or ask a question to mailing list 
(please see https://spark.apache.org/community.html).

Mind closing this PR please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21869
  
Merged to branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21866
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1307/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21871: [SPARK-24916][SQL] Fix type coercion for IN expression w...

2018-07-25 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21871
  
Oh. It turns out that @dilipbiswal is talking about that PR.  I didn't find 
it in your recent PR. Letâs wait if the test can pass.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21866
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21866
  
**[Test build #93541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93541/testReport)**
 for PR 21866 at commit 
[`cff6f2a`](https://github.com/apache/spark/commit/cff6f2a0459e8cc4e48f28bde8103ea44ce5a1ab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21866
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1308/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21834
  
**[Test build #93537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93537/testReport)**
 for PR 21834 at commit 
[`1041a38`](https://github.com/apache/spark/commit/1041a38571eb4daf66a23d37d5bf51a1abb8d74c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21834
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21834
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93537/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-07-25 Thread edwinalu

Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r205095575
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala 
---
@@ -251,6 +261,215 @@ class EventLoggingListenerSuite extends SparkFunSuite 
with LocalSparkContext wit
 }
   }
 
+  /**
+   * Test stage executor metrics logging functionality. This checks that 
peak
+   * values from SparkListenerExecutorMetricsUpdate events during a stage 
are
+   * logged in a StageExecutorMetrics event for each executor at stage 
completion.
+   */
+  private def testStageExecutorMetricsEventLogging() {
+val conf = getLoggingConf(testDirPath, None)
+val logName = "stageExecutorMetrics-test"
+val eventLogger = new EventLoggingListener(logName, None, 
testDirPath.toUri(), conf)
+val listenerBus = new LiveListenerBus(conf)
+
+// expected StageExecutorMetrics, for the given stage id and executor 
id
+val expectedMetricsEvents: Map[(Int, String), 
SparkListenerStageExecutorMetrics] =
+  Map(
+((0, "1"),
+  new SparkListenerStageExecutorMetrics("1", 0, 0,
+  Array(5000L, 50L, 50L, 20L, 50L, 10L, 100L, 30L, 70L, 20L))),
+((0, "2"),
+  new SparkListenerStageExecutorMetrics("2", 0, 0,
+  Array(7000L, 70L, 50L, 20L, 10L, 10L, 50L, 30L, 80L, 40L))),
+((1, "1"),
+  new SparkListenerStageExecutorMetrics("1", 1, 0,
+  Array(7000L, 70L, 50L, 30L, 60L, 30L, 80L, 55L, 50L, 0L))),
+((1, "2"),
+  new SparkListenerStageExecutorMetrics("2", 1, 0,
+  Array(7000L, 70L, 50L, 40L, 10L, 30L, 50L, 60L, 40L, 40L
+
+// Events to post.
+val events = Array(
+  SparkListenerApplicationStart("executionMetrics", None,
+1L, "update", None),
+  createExecutorAddedEvent(1),
+  createExecutorAddedEvent(2),
+  createStageSubmittedEvent(0),
+  // receive 3 metric updates from each executor with just stage 0 
running,
+  // with different peak updates for each executor
+  createExecutorMetricsUpdateEvent(1,
+  Array(4000L, 50L, 20L, 0L, 40L, 0L, 60L, 0L, 70L, 20L)),
+  createExecutorMetricsUpdateEvent(2,
+  Array(1500L, 50L, 20L, 0L, 0L, 0L, 20L, 0L, 70L, 0L)),
+  // exec 1: new stage 0 peaks for metrics at indexes: 2, 4, 6
+  createExecutorMetricsUpdateEvent(1,
+  Array(4000L, 50L, 50L, 0L, 50L, 0L, 100L, 0L, 70L, 20L)),
+  // exec 2: new stage 0 peaks for metrics at indexes: 0, 4, 6
+  createExecutorMetricsUpdateEvent(2,
+  Array(2000L, 50L, 10L, 0L, 10L, 0L, 30L, 0L, 70L, 0L)),
+  // exec 1: new stage 0 peaks for metrics at indexes: 5, 7
+  createExecutorMetricsUpdateEvent(1,
+  Array(2000L, 40L, 50L, 0L, 40L, 10L, 90L, 10L, 50L, 0L)),
+  // exec 2: new stage 0 peaks for metrics at indexes: 0, 5, 6, 7, 8
+  createExecutorMetricsUpdateEvent(2,
+  Array(3500L, 50L, 15L, 0L, 10L, 10L, 35L, 10L, 80L, 0L)),
+  // now start stage 1, one more metric update for each executor, and 
new
+  // peaks for some stage 1 metrics (as listed), initialize stage 1 
peaks
+  createStageSubmittedEvent(1),
+  // exec 1: new stage 0 peaks for metrics at indexes: 0, 3, 7
--- End diff --

Stage 0 is still running, and these are new peaks for that stage. It is 
also initializing all the stage 1 metric values, since these are the first 
executor metrics seen for stage 1 (I'll add this to the comments).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205096607
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDDBarrier.scala ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rdd
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.BarrierTaskContext
+import org.apache.spark.TaskContext
+import org.apache.spark.annotation.{Experimental, Since}
+
+/** Represents an RDD barrier, which forces Spark to launch tasks of this 
stage together. */
+class RDDBarrier[T: ClassTag](rdd: RDD[T]) {
+
+  /**
+   * :: Experimental ::
+   * Maps partitions together with a provided BarrierTaskContext.
+   *
+   * `preservesPartitioning` indicates whether the input function 
preserves the partitioner, which
+   * should be `false` unless `rdd` is a pair RDD and the input function 
doesn't modify the keys.
+   */
+  @Experimental
+  @Since("2.4.0")
+  def mapPartitions[S: ClassTag](
--- End diff --

`RDDBarrier` is actually expected to be used like a builder, we shall 
provide more options for the barrier stage in the future, eg. config a timeout 
of a barrier stage.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21872: [WIP] merge upstream

2018-07-25 Thread onursatici

GitHub user onursatici opened a pull request:

https://github.com/apache/spark/pull/21872

[WIP] merge upstream

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/palantir/spark os/merge-upstream

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21872.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21872


commit a4697f7d4d5cbe31934f04c788c58c3fd598e0d5
Author: Robert Kruszewski 
Date:   2018-03-12T15:08:13Z

v2

commit 48ecf0b688d00cbf57495dec9c338d479d8f55f3
Author: Robert Kruszewski 
Date:   2018-03-12T15:11:26Z

more changes

commit e7429d45e086fa82f5c214a74255232aedb727da
Author: Dan Sanduleac 
Date:   2018-03-12T15:18:00Z

Propagate hadoop exclusions to additional hadoop deps

commit adeb886d94c0a018382f49340962500c1d3498c2
Author: Robert Kruszewski 
Date:   2018-03-12T16:21:17Z

R tests

commit 50e7e92cb29fd914d32b002dc96303360427aff4
Author: Robert Kruszewski 
Date:   2018-03-12T16:25:56Z

nesting

commit 2d3ef4a5199b401697d415520cf8aa9cd4a65862
Author: Dan Sanduleac 
Date:   2018-03-12T16:28:05Z

Undo mistake

commit 62deec487f2c48680eb86148abc5d818da696a26
Author: Robert Kruszewski 
Date:   2018-03-12T17:04:47Z

maybe

commit 9a4cf19c0070ce7d9bf7dfbde7f6646bb0fe2f1a
Author: Dan Sanduleac 
Date:   2018-03-12T17:38:49Z

Try to add better logging for test_conda

commit 1b7a75835d3abcd7ba55504b56e269a3f061969a
Author: Dan Sanduleac 
Date:   2018-03-12T17:39:39Z

Comment out saving the build-sbt-* cache

commit a10bdff90cae0fc593d9275dca6193b22e3a2c51
Author: Robert Kruszewski 
Date:   2018-03-12T18:06:59Z

move r tests reporting

commit 0ece2d6dfa4a21c2bc05de5b545ba52d5193234a
Author: Dan Sanduleac 
Date:   2018-03-12T18:19:32Z

Remove old cache restore

commit 570e0e6536fd34a22059e4758ef38144f12d3413
Author: Dan Sanduleac 
Date:   2018-03-12T19:31:40Z

Limit python parallelism in an attempt to reduce flakes

commit 020adda18d642d27b3dd43b25260fb37ff6d4692
Author: Dan Sanduleac 
Date:   2018-03-12T20:24:04Z

Make run_python_tests verbose so circle doesn't time it out

commit e970d355f66bbe6df3da45213ddf06b797364d30
Author: Dan Sanduleac 
Date:   2018-03-12T21:25:38Z

Use circleci machinery to split test classes by timings

commit 87d3173a13c57d0f99bb1710c97e78db224e2ca2
Author: Dan Sanduleac 
Date:   2018-03-12T22:00:53Z

Install unishark into python image

commit 7eaafedb755740d9019eda20f6796c203b84e6c7
Author: Dan Sanduleac 
Date:   2018-03-12T22:01:06Z

Run python tests using unishark

commit ae3f14e11d7de4095174f09401e222659cf9e0ef
Author: Dan Sanduleac 
Date:   2018-03-12T22:08:45Z

don't expect every project to have an entry in circleTestsForProject

commit 6e5b03c79a710159e883bcfcff42bad4cdd125b2
Author: Dan Sanduleac 
Date:   2018-03-13T12:03:45Z

Also resolve the oldDeps project before saving ivy-dependency-cache

commit 45cec9d9e3f97848acea84631824a22d5cd59d0f
Author: Dan Sanduleac 
Date:   2018-03-13T12:32:47Z

Revert "Also resolve the oldDeps project before saving ivy-dependency-cache"

This reverts commit 6e5b03c79a710159e883bcfcff42bad4cdd125b2.

commit 6c7809cec248bb778ad54117db85868222c38a3f
Author: Dan Sanduleac 
Date:   2018-03-13T13:43:32Z

Pipe test output for python/run-tests.py

commit 487796d07e24cfa4534a243cbb7451e3397b7137
Author: Dan Sanduleac 
Date:   2018-03-13T16:00:09Z

Try to set respectSessionTimeZone to fix python tests

commit bb2076d1b80820fc9280af27c9a63c8ca2bbefb4
Author: Dan Sanduleac 
Date:   2018-03-13T18:46:46Z

Revert "Try to set respectSessionTimeZone to fix python tests"

This reverts commit 487796d

commit 52cbda2e67cf37346f0ca7f90a2f5a8659447d1d
Author: Dan Sanduleac 
Date:   2018-03-13T18:50:28Z

Skip bad arrow tests for now - https://github.com/palantir/spark/issues/328

commit 51858bf187c53aefb123b4f56b7e63b63c544771
Author: Dan Sanduleac 
Date:   2018-03-13T19:04:27Z

Make python tests output xml results in PYSPARK_PYTHON subdir

commit 2620ab3fbd3d013a1070a5bd6b1c4250d5e4
Author: Dan Sanduleac 
Date:   2018-03-13T19:05:15Z

Try to use unishark instead of xmlrunner in pyspark.streaming.tests too

commit 77eaa80b4f8de046b20f14456bb7c9b5cb223016
Author: Dan Sanduleac 
Date:   2018-03-13T19:20:25Z

Only the basename of PYSPARK_PYTHON please...

commit 19abd29d2806842ccb1

[GitHub] spark pull request #21872: [WIP] merge upstream

2018-07-25 Thread onursatici

Github user onursatici closed the pull request at:

https://github.com/apache/spark/pull/21872


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205100534
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag](
 }
   }
 
+  /**
+   * :: Experimental ::
+   * Indicates that Spark must launch the tasks together for the current 
stage.
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): RDDBarrier[T] = withScope(new RDDBarrier[T](this))
--- End diff --

Em, thanks for raising this question. IMO we indeed require users be aware 
of how many tasks they may launch for a barrier stage, and tasks may exchange 
internal data between each other in the middle, so users really care about the 
task numbers. I agree it shall be very useful to enable specify the number of 
tasks in a barrier stage, maybe we can have `RDDBarrier.coalesce(numPartitions: 
Int)` to enforce the number of tasks to be launched together in a barrier stage.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205102656
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl(
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
 for (taskSet <- sortedTaskSets) {
-  var launchedAnyTask = false
-  var launchedTaskAtCurrentMaxLocality = false
-  for (currentMaxLocality <- taskSet.myLocalityLevels) {
-do {
-  launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
-taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
-  launchedAnyTask |= launchedTaskAtCurrentMaxLocality
-} while (launchedTaskAtCurrentMaxLocality)
-  }
-  if (!launchedAnyTask) {
-taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
+  if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
--- End diff --

As listed in the design doc, support running barrier stage with dynamic 
resource allocation is Non-Goal of this task. However, we do plan to better 
integrate this feature with dynamic resource allocation, hopefully we can get 
to work on this before Spark 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21871: [SPARK-24916][SQL] Fix type coercion for IN expression w...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21871
  
**[Test build #93538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93538/testReport)**
 for PR 21871 at commit 
[`8ef142f`](https://github.com/apache/spark/commit/8ef142f78c22b980fe60d836c56d7d18d221a958).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21871: [SPARK-24916][SQL] Fix type coercion for IN expression w...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21871
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21871: [SPARK-24916][SQL] Fix type coercion for IN expression w...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21871
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93538/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 506 matches

Mail list logo