[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-08-02 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18725 @viirya @cloud-fan @gatorsmile @baibaichen thank you for review code. I've changed the code again by splitting it into two projections. please review it again. --- If your project

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-28 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18725 ``` Project [a] Filter [rand() > 1] TableScan [a, b, c] ``` The father is project, the son is LeafNode, and myself is the filter or other, this can't fix the case.

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18725 Can the current fix work for the case like the following? Project [a] Filter [rand() > 1] TableScan [a, b, c] `PhysicalOperation` still fails for

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-27 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18725 @gatorsmile @cloud-fan I've split two projects. And has been validated in my environment. But I'm not sure if the code changes are well thought out. can your review code again?

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18725 @baibaichen Okay, I try to modify this particular scenario by split it to two Projects. thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread baibaichen
Github user baibaichen commented on the issue: https://github.com/apache/spark/pull/18725 @heary-cao your fix is wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18725 @baibaichen yes, In my test environment `Time taken: 557.276 seconds, Fetched 1 row(s)` VS `Time taken: 5997.238 seconds, Fetched 1 row(s)` But I'm not sure about the

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread baibaichen
Github user baibaichen commented on the issue: https://github.com/apache/spark/pull/18725 @heary-cao, is the better performance with your fix? e.g. changing RDG's deterministic property from false to true? ``` override def deterministic: Boolean = true ``` --- If

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18725 @viirya @baibaichen thank your for review it. I made a comparison test: ``` select k,k,sum(id) from (select d004 as id, floor(c010 * 1) as k, ceil(c010) as cceila from

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18725 @baibaichen I agree. Looks correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread baibaichen
Github user baibaichen commented on the issue: https://github.com/apache/spark/pull/18725 The `HiveTableScans` strategy need `CatalogRelation`, but it's `LogicalRelation` in my case. Actually, the hive table is external table in my test, I guess that's the reason. I believe

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18725 I think it's a `HiveTableScan`, rather than `FileSourceScanExec`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-26 Thread baibaichen
Github user baibaichen commented on the issue: https://github.com/apache/spark/pull/18725 It's another issue about non-deterministic. When generating SparkPlan in `FileSourceStrategy` , `PhysicalOperation` is used to extract projects and filters on top of relation. But with

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-25 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18725 Are you sure it is caused by rand? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18725 Seems this is a corner case that column pruning doesn't work well, we should investigate more about why this happens, e.g. check the query plan and how the `ColumnPruning` rule change the plan.

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-25 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18725 @gatorsmile thanks you for review it. What you mean we can split it to two projects?. Similar to ``` +- Project [FLOOR((rand(8828525941469309371) * 1.0)) AS k#403L]

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-24 Thread heary-cao
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18725 cc, @gatorsmile, @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled