Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]

2025-06-24 Thread via GitHub


rui-mo commented on issue #9965:
URL: 
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2999428736

   I can repro this issue by setting `spark.sql.hive.convertMetastoreParquet` 
as false. While the generated plan, where a filter node is added after scan, is 
not a plan intended to be created in Gluten. I'll take a look whether there is 
a bug in Velox for this kind of plan.
   
   ```
   -- Project[2][expressions: (n2_2:INTEGER, "n0_0"), (n2_3:INTEGER, "n0_0")] 
-> n2_2:INTEGER, n2_3:INTEGER
 -- Filter[1][expression: and(isnotnull("n0_1"),equalto("n0_1","wukong"))] 
-> n0_0:INTEGER, n0_1:VARCHAR
   -- TableScan[0][table: hive_table, data columns: 
ROW] -> n0_0:INTEGER, n0_1:VARCHAR
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]

2025-06-23 Thread via GitHub


wenfang6 commented on issue #9965:
URL: 
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2998501738

   @rui-mo In our scenario, we set the parameter 
spark.sql.hive.convertMetastoreParquet=false, and Scan transformer  is 
HiveTableScanExecTransformer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]

2025-06-23 Thread via GitHub


rui-mo commented on issue #9965:
URL: 
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2996687305

   @wenfang6 I find there's a plan difference.  In the plan you provide, no 
filter is pushed down to the TableScan, and there's one Filter operator to 
handle the filtering. While in my plan, filter pushdown works well and no 
Filter operator. Would you please take a look which Scan transformer was used 
in your testing? We have the `HiveTableScanExecTransformer.scala`, 
`BatchScanExecTransformer.scala` and `FileSourceScanExecTransformer.scala`.
   
   The error plan:
   ```
   -- Project[4][expressions: (n4_3:INTEGER, hash_with_seed(42,"n2_4")), 
(n4_4:INTEGER, "n2_2"), (n4_5:VARCHAR, "n2_3"), (n4_6:INTEGER, "n2_4")] -> 
n4_3:INTEGER, n4_4:INTEGER, n4_5:VARCHAR, n4_6:INTEGER
 -- TopNRowNumber[3][partition by (n2_4) order by (n2_3 DESC NULLS LAST) 
limit 1] -> n2_2:INTEGER, n2_3:VARCHAR, n2_4:INTEGER
   -- Project[2][expressions: (n2_2:INTEGER, "n0_0"), (n2_3:VARCHAR, 
"n0_1"), (n2_4:INTEGER, "n0_0")] -> n2_2:INTEGER, n2_3:VARCHAR, n2_4:INTEGER
 -- Filter[1][expression: 
and(isnotnull("n0_1"),equalto("n0_1","wukong"))] -> n0_0:INTEGER, n0_1:VARCHAR
   -- TableScan[0][table: hive_table] -> n0_0:INTEGER, n0_1:VARCHAR
   ```
   
   Plan in my testing:
   
   ```
   -- Project[3][expressions: (n3_3:INTEGER, hash_with_seed(42,"n1_4")), 
(n3_4:INTEGER, "n1_2"), (n3_5:VARCHAR, "n1_3"), (n3_6:INTEGER, "n1_4")] -> 
n3_3:INTEGER, n3_4:INTEGER, n3_5:VARCHAR, n3_6:INTEGER
 -- TopNRowNumber[2][partition by (n1_4) order by (n1_3 DESC NULLS LAST) 
limit 1] -> n1_2:INTEGER, n1_3:VARCHAR, n1_4:INTEGER
   -- Project[1][expressions: (n1_2:INTEGER, "n0_0"), (n1_3:VARCHAR, 
"n0_1"), (n1_4:INTEGER, "n0_0")] -> n1_2:INTEGER, n1_3:VARCHAR, n1_4:INTEGER
 -- TableScan[0][table: hive_table, remaining filter: 
(and(isnotnull("name"),equalto("name","wukong"))), data columns: 
ROW] -> n0_0:INTEGER, n0_1:VARCHAR
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]

2025-06-23 Thread via GitHub


rui-mo commented on issue #9965:
URL: 
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2996635599

   @wenfang6 I tried above query and got the same Velox plan as the one 
mentioned in issue description, but still didn't repro this issue.
   
   ```
   -- Project[5][expressions: (n5_4:INTEGER, "n0_0"), (n5_5:VARCHAR, "n0_1"), 
(n5_6:INTEGER, "rk_9")] -> n5_4:INTEGER, n5_5:VARCHAR, n5_6:INTEGER
 -- Filter[4][expression: equalto("rk_9",1)] -> n0_0:INTEGER, n0_1:VARCHAR, 
n0_2:INTEGER, rk_9:INTEGER
   -- Window[3][STREAMING partition by [n0_2] order by [n0_1 DESC NULLS 
LAST] rk_9 := row_number() ROWS between UNBOUNDED PRECEDING and CURRENT ROW] -> 
n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER, rk_9:INTEGER
 -- OrderBy[2][n0_2 ASC NULLS FIRST, n0_1 DESC NULLS LAST] -> 
n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
   -- TopNRowNumber[1][partition by (n0_2) order by (n0_1 DESC NULLS 
LAST) limit 1] -> n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
 -- ValueStream[0][] -> n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]

2025-06-23 Thread via GitHub


rui-mo commented on issue #9965:
URL: 
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2996638263

   Hi @zml1206, actually I'm still trying to repro this issue. Do you happen to 
have any clue? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]

2025-06-22 Thread via GitHub


wenfang6 commented on issue #9965:
URL: 
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2995095474

   It works successfully with Spark 3.2.1. I think that Spark 3.2.1 and Spark 
3.5.4 generate different Velox plans for the same query. This inconsistency 
appears to be the root cause of the issue
   velox plan like this: 
   ```
   -- Project[2][expressions: (n2_2:INTEGER, hash_with_seed(42,"n0_0")), 
(n2_3:INTEGER, "n0_0"), (n2_4:VARCHAR, "n0_1"), (n2_5:INTEGER, "n0_0")] -> 
n2_2:INTEGER, n2_3:INTEGER, n2_4:VARCHAR, n2_5:INTEGER
 -- Filter[1][expression: and(isnotnull("n0_1"),equalto("n0_1","wukong"))] 
-> n0_0:INTEGER, n0_1:VARCHAR
   -- TableScan[0][table: hive_table] -> n0_0:INTEGER, n0_1:VARCHAR
   
   
--Project[4][expressions: (n4_4:INTEGER, "n0_0"), (n4_5:VARCHAR, "n0_1"), 
(n4_6:INTEGER, "rk_11")] -> n4_4:INTEGER, n4_5:VARCHAR, n4_6:INTEGER
 -- Filter[3][expression: equalto("rk_11",1)] -> n0_0:INTEGER, 
n0_1:VARCHAR, n0_2:INTEGER, rk_11:INTEGER
   -- Window[2][STREAMING partition by [n0_2] order by [n0_1 DESC NULLS 
LAST] rk_11 := row_number() ROWS between UNBOUNDED PRECEDING and CURRENT ROW] 
-> n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER, rk_11:INTEGER
 -- OrderBy[1][n0_2 ASC NULLS FIRST, n0_1 DESC NULLS LAST] -> 
n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
   -- ValueStream[0][] -> n0_0:INTEGER, n0_1:VARCHAR, n0_2:INTEGER
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]

2025-06-17 Thread via GitHub


rui-mo commented on issue #9965:
URL: 
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2979842632

   Yes, I will follow up, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] An unloaded lazy vector cannot be wrapped by two different top level vectors [incubator-gluten]

2025-06-16 Thread via GitHub


FelixYBW commented on issue #9965:
URL: 
https://github.com/apache/incubator-gluten/issues/9965#issuecomment-2977905680

   @rui-mo Looks it can be common case. Could you fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]