Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-23 Thread via GitHub


JkSelf commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2903822360

   @FelixYBW It appears that Spark utilizes `UnsafeOutput ` for data 
serialization, leading to significant stack memory usage. The configuration 
option `spark.kryo.unsafe` can be used to control whether `UnsafeOutput ` is 
enabled. How about testing by disabling this configuration?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-22 Thread via GitHub


FelixYBW commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2902687459

   the full stack:
   ```
   StackOverflowError: full depth = 4016
   frame[0] (where it recursed back to): 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   frame[end] (root of recursion): 
java.base/java.lang.Thread.run(Thread.java:829)
 head[0]: 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 head[1]: 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 head[2]: 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 head[3]: java.base/java.lang.reflect.Method.invoke(Method.java:566)
 head[4]: 
com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:62)
 head[5]: com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
 head[6]: 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361)
 head[7]: 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302)
 head[8]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
 head[9]: 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
 head[10]: 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
 head[11]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557)
 head[12]: 
com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:65)
 head[13]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
 head[14]: 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
 head[15]: 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
 head[16]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
 head[17]: 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
 head[18]: 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
 head[19]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
 tail[0]: 
org.apache.spark.broadcast.TorrentBroadcast$.$anonfun$blockifyObject$4(TorrentBroadcast.scala:319)
 tail[1]: org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1471)
 tail[2]: 
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:321)
 tail[3]: 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:138)
 tail[4]: 
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:91)
 tail[5]: 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 tail[6]: 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:74)
 tail[7]: org.apache.spark.SparkContext.broadcast(SparkContext.scala:1564)
 tail[8]: 
org.apache.gluten.sql.shims.SparkShims.broadcastInternal(SparkShims.scala:182)
 tail[9]: 
org.apache.gluten.sql.shims.SparkShims.broadcastInternal$(SparkShims.scala:178)
 tail[10]: 
org.apache.gluten.sql.shims.spark32.Spark32Shims.broadcastInternal(Spark32Shims.scala:64)
 tail[11]: 
org.apache.spark.sql.execution.ColumnarBroadcastExchangeExec.$anonfun$relationFuture$3(ColumnarBroadcastExchangeExec.scala:84)
 tail[12]: org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
 tail[13]: 
org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
 tail[14]: 
org.apache.spark.sql.execution.ColumnarBroadcastExchangeExec.$anonfun$relationFuture$1(ColumnarBroadcastExchangeExec.scala:82)
 tail[15]: 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:194)
 tail[16]: 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
 tail[17]: 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 tail[18]: 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 tail[19]: java.base/java.lang.Thread.run(Thread.java:829)
   25/05/22 17:12:28 INFO [broadcast-exchange-0] spark.SparkContext: Starting 
job: collect at VeloxSparkPlanExecApi.scala:629
   ```
   
   @JkSelf it's triggered here: 
https://github.com/apache/incubator-gluten/blob/414d82e459c91d1cdfeea358a62ed60da95c5f3a/gluten-substrait/src/main/scala/org/apache/spark/sql/execution/ColumnarBroadcastExchangeExec.scala#L86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


--

Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-22 Thread via GitHub


zhztheplayer commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900753562

   It might imply something that the stack frames ends with 
`DefaultArraySerializers` / `ClosureSerializer` which are not common in the 
bottom frames.
   
   ```
   at 
com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:62)
   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361)
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-22 Thread via GitHub


wForget commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900580627

   You may be able to enable `TRACE` logging to get more information
   ```
   com.esotericsoftware.minlog.Log.TRACE()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-22 Thread via GitHub


wForget commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900426659

   Is it possible that there is a circular reference that causes stackoverflow? 
However, I have not reproduced this issue on spark 3.2.0. What is your kryo 
version?
   
   
https://github.com/apache/spark/blob/7ee8f63c7e14ce5b27a051e22af2805cf63d5687/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L1168-L1170


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-22 Thread via GitHub


JkSelf commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900170107

   It seems that `UnsafeColumnarBuildSideRelation ` is being used to construct 
a hash table from the stack. And the related code is 
https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/main/scala/org/apache/spark/sql/execution/unsafe/UnsafeColumnarBuildSideRelation.scala#L90-L112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-21 Thread via GitHub


FelixYBW commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900127854

   The full stack, there are lots of recursive call, but not sure where it's 
triggered
   
   ···
   25/05/20 16:33:15 INFO [broadcast-exchange-1] memory.MemoryStore: Block 
broadcast_11 stored as values in memory (estimated size 2.7 KiB, free 15.9 GiB)
   25/05/20 16:33:15 ERROR [broadcast-exchange-1] broadcast.TorrentBroadcast: 
Store broadcast broadcast_11 fail, remove all pieces of the broadcast
   25/05/20 16:33:15 ERROR [broadcast-exchange-1] 
execution.ColumnarBroadcastExchangeExec: Fatal error in broadcast exchange:
 |  Exception type: java.lang.StackOverflowError
 |  Message: 
 |  Cause: 
 |  Thread: broadcast-exchange-1
 |  Stacktrace:
   java.lang.StackOverflowError
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   at 
com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:62)
   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361)
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557)
   at 
com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:65)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   
   ···


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-19 Thread via GitHub


FelixYBW commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2889930724

   > [@FelixYBW](https://github.com/FelixYBW) It appears this is a known issue 
[#7750](https://github.com/apache/incubator-gluten/issues/7750). 
`ColumnarBroadcastExchangeExec` utilizes on-heap memory to construct the hash 
table when using `ColumnarBuildSideRelation`, which can lead to significant 
memory usage and potentially cause an OOM error.
   
   No, it's not the onheap OOM, it's stack overflow which means we allocated 
large memory block in stack.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-18 Thread via GitHub


JkSelf commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2889701318

   @FelixYBW It appears this is a known issue 
https://github.com/apache/incubator-gluten/issues/7750. 
`ColumnarBroadcastExchangeExec` utilizes on-heap memory to construct the hash 
table when using `ColumnarBuildSideRelation`, which can lead to significant 
memory usage and potentially cause an OOM error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]

2025-05-16 Thread via GitHub


FelixYBW commented on issue #9671:
URL: 
https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2887875340

   @JkSelf do you know where the large memory block is allocated in stack?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]