[GitHub] spark pull request #16348: Branch 2.0.4399

2016-12-19 Thread laixiaohang
Github user laixiaohang closed the pull request at:

https://github.com/apache/spark/pull/16348


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16348: Branch 2.0.4399

2016-12-19 Thread laixiaohang
GitHub user laixiaohang opened a pull request:

https://github.com/apache/spark/pull/16348

Branch 2.0.4399

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/laixiaohang/spark branch-2.0.4399

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16348.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16348


commit c9c36fa0c7bccefde808bdbc32b04e8555356001
Author: Davies Liu 
Date:   2016-09-02T22:10:12Z

[SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in 
DataFrameWriter

Some analyzer rules have assumptions on logical plans, optimizer may break 
these assumption, we should not pass an optimized query plan into 
QueryExecution (will be analyzed again), otherwise we may some weird bugs.

For example, we have a rule for decimal calculation to promote the 
precision before binary operations, use PromotePrecision as placeholder to 
indicate that this rule should not apply twice. But a Optimizer rule will 
remove this placeholder, that break the assumption, then the rule applied 
twice, cause wrong result.

Ideally, we should make all the analyzer rules all idempotent, that may 
require lots of effort to double checking them one by one (may be not easy).

An easier approach could be never feed a optimized plan into Analyzer, this 
PR fix the case for RunnableComand, they will be optimized, during execution, 
the passed `query` will also be passed into QueryExecution again. This PR make 
these `query` not part of the children, so they will not be optimized and 
analyzed again.

Right now, we did not know a logical plan is optimized or not, we could 
introduce a flag for that, and make sure a optimized logical plan will not be 
analyzed again.

Added regression tests.

Author: Davies Liu 

Closes #14797 from davies/fix_writer.

(cherry picked from commit ed9c884dcf925500ceb388b06b33bd2c95cd2ada)
Signed-off-by: Davies Liu 

commit a3930c3b9afa9f7eba2a5c8b8f279ca38e348e9b
Author: Sameer Agarwal 
Date:   2016-09-02T22:16:16Z

[SPARK-16334] Reusing same dictionary column for decoding consecutive row 
groups shouldn't throw an error

This patch fixes a bug in the vectorized parquet reader that's caused by 
re-using the same dictionary column vector while reading consecutive row 
groups. Specifically, this issue manifests for a certain distribution of 
dictionary/plain encoded data while we read/populate the underlying bit packed 
dictionary data into a column-vector based data structure.

Manually tested on datasets provided by the community. Thanks to Chris 
Perluss and Keith Kraus for their invaluable help in tracking down this issue!

Author: Sameer Agarwal 

Closes #14941 from sameeragarwal/parquet-exception-2.

(cherry picked from commit a2c9acb0e54b2e38cb8ee6431f1ea0e0b4cd959a)
Signed-off-by: Davies Liu 

commit b8f65dad7be22231e982aaec3bbd69dbeacc20da
Author: Davies Liu 
Date:   2016-09-02T22:40:02Z

Fix build

commit c0ea7707127c92ecb51794b96ea40d7cdb28b168
Author: Davies Liu 
Date:   2016-09-02T23:05:37Z

Revert "[SPARK-16334] Reusing same dictionary column for decoding 
consecutive row groups shouldn't throw an error"

This reverts commit a3930c3b9afa9f7eba2a5c8b8f279ca38e348e9b.

commit 12a2e2a5ab5db12f39a7b591e914d52058e1581b
Author: Junyang Qian 
Date:   2016-09-03T04:11:57Z

[SPARKR][MINOR] Fix docs for sparkR.session and count

## What changes were proposed in this pull request?

This PR tries to add some more explanation to `sparkR.session`. It also 
modifies doc for `count` so when grouped in one doc, the description doesn't 
confuse users.

## How was this patch tested?

Manual test.

![screen shot 2016-09-02 at 1 21 36 
pm](https://cloud.githubusercontent.com/assets/15318264/18217198/409613ac-7110-11e6-8dae-cb0c8df557bf.png)

Author: Junyang Qian 

Closes #14942 from junyangq/fixSparkRSessionDoc.

(cherry picked from commit d2fde6b72c4aede2e7edb4a7e6653fb1e7b19924)
Signed-off-by: Shivaram Venkataraman 

commit 949544d017ab25b43b683cd5c1e6783d87bfce45
Author: CodingCat 
Date:   2016-09-03T09:03:40Z

[SPARK-17347][SQL][EXAMPLES] Encoder in Dataset example has incorrect type