GitHub user jaceklaskowski opened a pull request:
https://github.com/apache/spark/pull/20855
[SPARK-23731][SQL] FileSourceScanExec throws NullPointerException in
subexpression elimination
## What changes were proposed in this pull request?
Avoids (not necessarily fixes) a NullPointerException in subexpression
elimination for subqueries with FileSourceScanExec.
## How was this patch tested?
Local build. No new tests as I could not reproduce it other than using the
query and data under NDA. Waiting for Jenkins.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jaceklaskowski/spark
SPARK-23731-FileSourceScanExec-throws-NPE
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20855.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20855
commit 8ef323c572cee181e3bdbddeeb7119eda03d78f4
Author: Dongjoon Hyun
Date: 2018-01-17T06:32:18Z
[SPARK-23072][SQL][TEST] Add a Unicode schema test for file-based data
sources
## What changes were proposed in this pull request?
After [SPARK-20682](https://github.com/apache/spark/pull/19651), Apache
Spark 2.3 is able to read ORC files with Unicode schema. Previously, it raises
`org.apache.spark.sql.catalyst.parser.ParseException`.
This PR adds a Unicode schema test for CSV/JSON/ORC/Parquet file-based data
sources. Note that TEXT data source only has [a single column with a fixed name
'value'](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala#L71).
## How was this patch tested?
Pass the newly added test case.
Author: Dongjoon Hyun
Closes #20266 from dongjoon-hyun/SPARK-23072.
(cherry picked from commit a0aedb0ded4183cc33b27e369df1cbf862779e26)
Signed-off-by: Wenchen Fan
commit bfbc2d41b8a9278b347b6df2d516fe4679b41076
Author: Henry Robinson
Date: 2018-01-17T08:01:41Z
[SPARK-23062][SQL] Improve EXCEPT documentation
## What changes were proposed in this pull request?
Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.
Author: Henry Robinson
Closes #20254 from henryr/spark-23062.
(cherry picked from commit 1f3d933e0bd2b1e934a233ed699ad39295376e71)
Signed-off-by: gatorsmile
commit cbb6bda437b0d2832496b5c45f8264e5527f1cce
Author: Dongjoon Hyun
Date: 2018-01-17T13:53:36Z
[SPARK-21783][SQL] Turn on ORC filter push-down by default
## What changes were proposed in this pull request?
ORC filter push-down is disabled by default from the beginning,
[SPARK-2883](https://github.com/apache/spark/commit/aa31e431fc09f0477f1c2351c6275769a31aca90#diff-41ef65b9ef5b518f77e2a03559893f4dR149
).
Now, Apache Spark starts to depend on Apache ORC 1.4.1. For Apache Spark
2.3, this PR turns on ORC filter push-down by default like Parquet
([SPARK-9207](https://issues.apache.org/jira/browse/SPARK-21783)) as a part of
[SPARK-20901](https://issues.apache.org/jira/browse/SPARK-20901), "Feature
parity for ORC with Parquet".
## How was this patch tested?
Pass the existing tests.
Author: Dongjoon Hyun
Closes #20265 from dongjoon-hyun/SPARK-21783.
(cherry picked from commit 0f8a28617a0742d5a99debfbae91222c2e3b5cec)
Signed-off-by: Wenchen Fan
commit aae73a21a42fa366a09c2be1a4b91308ef211beb
Author: Wang Gengliang
Date: 2018-01-17T16:05:26Z
[SPARK-23079][SQL] Fix query constraints propagation with aliases
## What changes were proposed in this pull request?
Previously, PR #19201 fix the problem of non-converging constraints.
After that PR #19149 improve the loop and constraints is inferred only once.
So the problem of non-converging constraints is gone.
However, the case below will fail.
```
spark.range(5).write.saveAsTable("t")
val t = spark.read.table("t")
val left = t.withColumn("xid", $"id" + lit(1)).as("x")
val right = t.withColumnRenamed("id", "xid").as("y")
val df = left.join(right, "xid").filter("id = 3").toDF()
checkAnswer(df, Row(4, 3))
```
Because `aliasMap` replace all the aliased child. See the test case in PR
for details.
This PR is to fix this bug by removing useless code for preventing
non-converging constraints.
It can be also fixed with #20270, but this is much simpler and clean up the
code.
## How was this patch tested?
Unit test
Author: Wang Gengliang
Closes #20278 from gengliangwang/FixConstraintSimple.
(cherry picked from commit 8598a982b4147abe5f1aae005fea0