Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/18725
@viirya @cloud-fan @gatorsmile @baibaichen
thank you for review code.
I've changed the code again by splitting it into two projections.
please review it again.
---
If your project
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/18725
```
Project [a]
Filter [rand() > 1]
TableScan [a, b, c]
```
The father is project, the son is LeafNode, and myself is the filter or
other, this can't fix the case.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/18725
Can the current fix work for the case like the following?
Project [a]
Filter [rand() > 1]
TableScan [a, b, c]
`PhysicalOperation` still fails for
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/18725
@gatorsmile @cloud-fan
I've split two projects. And has been validated in my environment.
But I'm not sure if the code changes are well thought out.
can your review code again?
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/18725
@baibaichen
Okay, I try to modify this particular scenario by split it to two Projects.
thanks.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user baibaichen commented on the issue:
https://github.com/apache/spark/pull/18725
@heary-cao your fix is wrong.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/18725
@baibaichen
yes, In my test environment
`Time taken: 557.276 seconds, Fetched 1 row(s)`
VS
`Time taken: 5997.238 seconds, Fetched 1 row(s)`
But I'm not sure about the
Github user baibaichen commented on the issue:
https://github.com/apache/spark/pull/18725
@heary-cao, is the better performance with your fix? e.g. changing RDG's
deterministic property from false to true?
```
override def deterministic: Boolean = true
```
---
If
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/18725
@viirya @baibaichen
thank your for review it.
I made a comparison test:
```
select k,k,sum(id) from (select d004 as id, floor(c010 * 1) as k,
ceil(c010) as cceila from
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/18725
@baibaichen I agree. Looks correct.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user baibaichen commented on the issue:
https://github.com/apache/spark/pull/18725
The `HiveTableScans` strategy need `CatalogRelation`, but it's
`LogicalRelation` in my case. Actually, the hive table is external table in my
test, I guess that's the reason.
I believe
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/18725
I think it's a `HiveTableScan`, rather than `FileSourceScanExec`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user baibaichen commented on the issue:
https://github.com/apache/spark/pull/18725
It's another issue about non-deterministic. When generating SparkPlan in
`FileSourceStrategy` , `PhysicalOperation` is used to extract projects and
filters on top of relation. But with
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/18725
Are you sure it is caused by rand?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18725
Seems this is a corner case that column pruning doesn't work well, we
should investigate more about why this happens, e.g. check the query plan and
how the `ColumnPruning` rule change the plan.
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/18725
@gatorsmile
thanks you for review it.
What you mean we can split it to two projects?.
Similar to
```
+- Project [FLOOR((rand(8828525941469309371) * 1.0)) AS k#403L]
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/18725
cc, @gatorsmile, @cloud-fan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
17 matches
Mail list logo