GitHub user laixiaohang opened a pull request:
https://github.com/apache/spark/pull/16348
Branch 2.0.4399
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/laixiaohang/spark branch-2.0.4399
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16348.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16348
commit c9c36fa0c7bccefde808bdbc32b04e8555356001
Author: Davies Liu
Date: 2016-09-02T22:10:12Z
[SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in
DataFrameWriter
Some analyzer rules have assumptions on logical plans, optimizer may break
these assumption, we should not pass an optimized query plan into
QueryExecution (will be analyzed again), otherwise we may some weird bugs.
For example, we have a rule for decimal calculation to promote the
precision before binary operations, use PromotePrecision as placeholder to
indicate that this rule should not apply twice. But a Optimizer rule will
remove this placeholder, that break the assumption, then the rule applied
twice, cause wrong result.
Ideally, we should make all the analyzer rules all idempotent, that may
require lots of effort to double checking them one by one (may be not easy).
An easier approach could be never feed a optimized plan into Analyzer, this
PR fix the case for RunnableComand, they will be optimized, during execution,
the passed `query` will also be passed into QueryExecution again. This PR make
these `query` not part of the children, so they will not be optimized and
analyzed again.
Right now, we did not know a logical plan is optimized or not, we could
introduce a flag for that, and make sure a optimized logical plan will not be
analyzed again.
Added regression tests.
Author: Davies Liu
Closes #14797 from davies/fix_writer.
(cherry picked from commit ed9c884dcf925500ceb388b06b33bd2c95cd2ada)
Signed-off-by: Davies Liu
commit a3930c3b9afa9f7eba2a5c8b8f279ca38e348e9b
Author: Sameer Agarwal
Date: 2016-09-02T22:16:16Z
[SPARK-16334] Reusing same dictionary column for decoding consecutive row
groups shouldn't throw an error
This patch fixes a bug in the vectorized parquet reader that's caused by
re-using the same dictionary column vector while reading consecutive row
groups. Specifically, this issue manifests for a certain distribution of
dictionary/plain encoded data while we read/populate the underlying bit packed
dictionary data into a column-vector based data structure.
Manually tested on datasets provided by the community. Thanks to Chris
Perluss and Keith Kraus for their invaluable help in tracking down this issue!
Author: Sameer Agarwal
Closes #14941 from sameeragarwal/parquet-exception-2.
(cherry picked from commit a2c9acb0e54b2e38cb8ee6431f1ea0e0b4cd959a)
Signed-off-by: Davies Liu
commit b8f65dad7be22231e982aaec3bbd69dbeacc20da
Author: Davies Liu
Date: 2016-09-02T22:40:02Z
Fix build
commit c0ea7707127c92ecb51794b96ea40d7cdb28b168
Author: Davies Liu
Date: 2016-09-02T23:05:37Z
Revert "[SPARK-16334] Reusing same dictionary column for decoding
consecutive row groups shouldn't throw an error"
This reverts commit a3930c3b9afa9f7eba2a5c8b8f279ca38e348e9b.
commit 12a2e2a5ab5db12f39a7b591e914d52058e1581b
Author: Junyang Qian
Date: 2016-09-03T04:11:57Z
[SPARKR][MINOR] Fix docs for sparkR.session and count
## What changes were proposed in this pull request?
This PR tries to add some more explanation to `sparkR.session`. It also
modifies doc for `count` so when grouped in one doc, the description doesn't
confuse users.
## How was this patch tested?
Manual test.
![screen shot 2016-09-02 at 1 21 36
pm](https://cloud.githubusercontent.com/assets/15318264/18217198/409613ac-7110-11e6-8dae-cb0c8df557bf.png)
Author: Junyang Qian
Closes #14942 from junyangq/fixSparkRSessionDoc.
(cherry picked from commit d2fde6b72c4aede2e7edb4a7e6653fb1e7b19924)
Signed-off-by: Shivaram Venkataraman
commit 949544d017ab25b43b683cd5c1e6783d87bfce45
Author: CodingCat
Date: 2016-09-03T09:03:40Z
[SPARK-17347][SQL][EXAMPLES] Encoder in Dataset example has incorrect type