Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]
rui-mo commented on issue #9965:
URL:
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2999428736
I can repro this issue by setting `spark.sql.hive.convertMetastoreParquet`
as false. While the generated plan, where a filter node is added after scan, is
not a plan intended to be created in Gluten. I'll take a look whether there is
a bug in Velox for this kind of plan.
```
-- Project[2][expressions: (n2_2:INTEGER, "n0_0"), (n2_3:INTEGER, "n0_0")]
-> n2_2:INTEGER, n2_3:INTEGER
-- Filter[1][expression: and(isnotnull("n0_1"),equalto("n0_1","wukong"))]
-> n0_0:INTEGER, n0_1:VARCHAR
-- TableScan[0][table: hive_table, data columns:
ROW] -> n0_0:INTEGER, n0_1:VARCHAR
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]
wenfang6 commented on issue #9965: URL: https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2998501738 @rui-mo In our scenario, we set the parameter spark.sql.hive.convertMetastoreParquet=false, and Scan transformer is HiveTableScanExecTransformer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]
rui-mo commented on issue #9965:
URL:
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2996687305
@wenfang6 I find there's a plan difference. In the plan you provide, no
filter is pushed down to the TableScan, and there's one Filter operator to
handle the filtering. While in my plan, filter pushdown works well and no
Filter operator. Would you please take a look which Scan transformer was used
in your testing? We have the `HiveTableScanExecTransformer.scala`,
`BatchScanExecTransformer.scala` and `FileSourceScanExecTransformer.scala`.
The error plan:
```
-- Project[4][expressions: (n4_3:INTEGER, hash_with_seed(42,"n2_4")),
(n4_4:INTEGER, "n2_2"), (n4_5:VARCHAR, "n2_3"), (n4_6:INTEGER, "n2_4")] ->
n4_3:INTEGER, n4_4:INTEGER, n4_5:VARCHAR, n4_6:INTEGER
-- TopNRowNumber[3][partition by (n2_4) order by (n2_3 DESC NULLS LAST)
limit 1] -> n2_2:INTEGER, n2_3:VARCHAR, n2_4:INTEGER
-- Project[2][expressions: (n2_2:INTEGER, "n0_0"), (n2_3:VARCHAR,
"n0_1"), (n2_4:INTEGER, "n0_0")] -> n2_2:INTEGER, n2_3:VARCHAR, n2_4:INTEGER
-- Filter[1][expression:
and(isnotnull("n0_1"),equalto("n0_1","wukong"))] -> n0_0:INTEGER, n0_1:VARCHAR
-- TableScan[0][table: hive_table] -> n0_0:INTEGER, n0_1:VARCHAR
```
Plan in my testing:
```
-- Project[3][expressions: (n3_3:INTEGER, hash_with_seed(42,"n1_4")),
(n3_4:INTEGER, "n1_2"), (n3_5:VARCHAR, "n1_3"), (n3_6:INTEGER, "n1_4")] ->
n3_3:INTEGER, n3_4:INTEGER, n3_5:VARCHAR, n3_6:INTEGER
-- TopNRowNumber[2][partition by (n1_4) order by (n1_3 DESC NULLS LAST)
limit 1] -> n1_2:INTEGER, n1_3:VARCHAR, n1_4:INTEGER
-- Project[1][expressions: (n1_2:INTEGER, "n0_0"), (n1_3:VARCHAR,
"n0_1"), (n1_4:INTEGER, "n0_0")] -> n1_2:INTEGER, n1_3:VARCHAR, n1_4:INTEGER
-- TableScan[0][table: hive_table, remaining filter:
(and(isnotnull("name"),equalto("name","wukong"))), data columns:
ROW] -> n0_0:INTEGER, n0_1:VARCHAR
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]
rui-mo commented on issue #9965:
URL:
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2996635599
@wenfang6 I tried above query and got the same Velox plan as the one
mentioned in issue description, but still didn't repro this issue.
```
-- Project[5][expressions: (n5_4:INTEGER, "n0_0"), (n5_5:VARCHAR, "n0_1"),
(n5_6:INTEGER, "rk_9")] -> n5_4:INTEGER, n5_5:VARCHAR, n5_6:INTEGER
-- Filter[4][expression: equalto("rk_9",1)] -> n0_0:INTEGER, n0_1:VARCHAR,
n0_2:INTEGER, rk_9:INTEGER
-- Window[3][STREAMING partition by [n0_2] order by [n0_1 DESC NULLS
LAST] rk_9 := row_number() ROWS between UNBOUNDED PRECEDING and CURRENT ROW] ->
n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER, rk_9:INTEGER
-- OrderBy[2][n0_2 ASC NULLS FIRST, n0_1 DESC NULLS LAST] ->
n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
-- TopNRowNumber[1][partition by (n0_2) order by (n0_1 DESC NULLS
LAST) limit 1] -> n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
-- ValueStream[0][] -> n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]
rui-mo commented on issue #9965: URL: https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2996638263 Hi @zml1206, actually I'm still trying to repro this issue. Do you happen to have any clue? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]
wenfang6 commented on issue #9965:
URL:
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2995095474
It works successfully with Spark 3.2.1. I think that Spark 3.2.1 and Spark
3.5.4 generate different Velox plans for the same query. This inconsistency
appears to be the root cause of the issue
velox plan like this:
```
-- Project[2][expressions: (n2_2:INTEGER, hash_with_seed(42,"n0_0")),
(n2_3:INTEGER, "n0_0"), (n2_4:VARCHAR, "n0_1"), (n2_5:INTEGER, "n0_0")] ->
n2_2:INTEGER, n2_3:INTEGER, n2_4:VARCHAR, n2_5:INTEGER
-- Filter[1][expression: and(isnotnull("n0_1"),equalto("n0_1","wukong"))]
-> n0_0:INTEGER, n0_1:VARCHAR
-- TableScan[0][table: hive_table] -> n0_0:INTEGER, n0_1:VARCHAR
--Project[4][expressions: (n4_4:INTEGER, "n0_0"), (n4_5:VARCHAR, "n0_1"),
(n4_6:INTEGER, "rk_11")] -> n4_4:INTEGER, n4_5:VARCHAR, n4_6:INTEGER
-- Filter[3][expression: equalto("rk_11",1)] -> n0_0:INTEGER,
n0_1:VARCHAR, n0_2:INTEGER, rk_11:INTEGER
-- Window[2][STREAMING partition by [n0_2] order by [n0_1 DESC NULLS
LAST] rk_11 := row_number() ROWS between UNBOUNDED PRECEDING and CURRENT ROW]
-> n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER, rk_11:INTEGER
-- OrderBy[1][n0_2 ASC NULLS FIRST, n0_1 DESC NULLS LAST] ->
n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
-- ValueStream[0][] -> n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]
rui-mo commented on issue #9965: URL: https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2979842632 Yes, I will follow up, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]
FelixYBW commented on issue #9965: URL: https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2977905680 @rui-mo Looks it can be common case. Could you fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
