Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
JkSelf commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2903822360 @FelixYBW It appears that Spark utilizes `UnsafeOutput ` for data serialization, leading to significant stack memory usage. The configuration option `spark.kryo.unsafe` can be used to control whether `UnsafeOutput ` is enabled. How about testing by disabling this configuration? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
FelixYBW commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2902687459 the full stack: ``` StackOverflowError: full depth = 4016 frame[0] (where it recursed back to): java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) frame[end] (root of recursion): java.base/java.lang.Thread.run(Thread.java:829) head[0]: java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) head[1]: java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) head[2]: java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) head[3]: java.base/java.lang.reflect.Method.invoke(Method.java:566) head[4]: com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:62) head[5]: com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) head[6]: com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361) head[7]: com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302) head[8]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) head[9]: com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) head[10]: com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) head[11]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557) head[12]: com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:65) head[13]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) head[14]: com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) head[15]: com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) head[16]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) head[17]: com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) head[18]: com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) head[19]: com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) tail[0]: org.apache.spark.broadcast.TorrentBroadcast$.$anonfun$blockifyObject$4(TorrentBroadcast.scala:319) tail[1]: org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1471) tail[2]: org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:321) tail[3]: org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:138) tail[4]: org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:91) tail[5]: org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) tail[6]: org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:74) tail[7]: org.apache.spark.SparkContext.broadcast(SparkContext.scala:1564) tail[8]: org.apache.gluten.sql.shims.SparkShims.broadcastInternal(SparkShims.scala:182) tail[9]: org.apache.gluten.sql.shims.SparkShims.broadcastInternal$(SparkShims.scala:178) tail[10]: org.apache.gluten.sql.shims.spark32.Spark32Shims.broadcastInternal(Spark32Shims.scala:64) tail[11]: org.apache.spark.sql.execution.ColumnarBroadcastExchangeExec.$anonfun$relationFuture$3(ColumnarBroadcastExchangeExec.scala:84) tail[12]: org.apache.gluten.utils.Arm$.withResource(Arm.scala:25) tail[13]: org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37) tail[14]: org.apache.spark.sql.execution.ColumnarBroadcastExchangeExec.$anonfun$relationFuture$1(ColumnarBroadcastExchangeExec.scala:82) tail[15]: org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:194) tail[16]: java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) tail[17]: java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) tail[18]: java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) tail[19]: java.base/java.lang.Thread.run(Thread.java:829) 25/05/22 17:12:28 INFO [broadcast-exchange-0] spark.SparkContext: Starting job: collect at VeloxSparkPlanExecApi.scala:629 ``` @JkSelf it's triggered here: https://github.com/apache/incubator-gluten/blob/414d82e459c91d1cdfeea358a62ed60da95c5f3a/gluten-substrait/src/main/scala/org/apache/spark/sql/execution/ColumnarBroadcastExchangeExec.scala#L86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
zhztheplayer commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900753562 It might imply something that the stack frames ends with `DefaultArraySerializers` / `ClosureSerializer` which are not common in the bottom frames. ``` at com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:62) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
wForget commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900580627 You may be able to enable `TRACE` logging to get more information ``` com.esotericsoftware.minlog.Log.TRACE() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
wForget commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900426659 Is it possible that there is a circular reference that causes stackoverflow? However, I have not reproduced this issue on spark 3.2.0. What is your kryo version? https://github.com/apache/spark/blob/7ee8f63c7e14ce5b27a051e22af2805cf63d5687/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L1168-L1170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
JkSelf commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900170107 It seems that `UnsafeColumnarBuildSideRelation ` is being used to construct a hash table from the stack. And the related code is https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/main/scala/org/apache/spark/sql/execution/unsafe/UnsafeColumnarBuildSideRelation.scala#L90-L112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
FelixYBW commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2900127854 The full stack, there are lots of recursive call, but not sure where it's triggered ··· 25/05/20 16:33:15 INFO [broadcast-exchange-1] memory.MemoryStore: Block broadcast_11 stored as values in memory (estimated size 2.7 KiB, free 15.9 GiB) 25/05/20 16:33:15 ERROR [broadcast-exchange-1] broadcast.TorrentBroadcast: Store broadcast broadcast_11 fail, remove all pieces of the broadcast 25/05/20 16:33:15 ERROR [broadcast-exchange-1] execution.ColumnarBroadcastExchangeExec: Fatal error in broadcast exchange: | Exception type: java.lang.StackOverflowError | Message: | Cause: | Thread: broadcast-exchange-1 | Stacktrace: java.lang.StackOverflowError at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:62) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:557) at com.esotericsoftware.kryo.serializers.ClosureSerializer.write(ClosureSerializer.java:65) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575) ··· -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
FelixYBW commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2889930724 > [@FelixYBW](https://github.com/FelixYBW) It appears this is a known issue [#7750](https://github.com/apache/incubator-gluten/issues/7750). `ColumnarBroadcastExchangeExec` utilizes on-heap memory to construct the hash table when using `ColumnarBuildSideRelation`, which can lead to significant memory usage and potentially cause an OOM error. No, it's not the onheap OOM, it's stack overflow which means we allocated large memory block in stack. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
JkSelf commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2889701318 @FelixYBW It appears this is a known issue https://github.com/apache/incubator-gluten/issues/7750. `ColumnarBroadcastExchangeExec` utilizes on-heap memory to construct the hash table when using `ColumnarBuildSideRelation`, which can lead to significant memory usage and potentially cause an OOM error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [I] [VL] BHJ caused stackoverflow [incubator-gluten]
FelixYBW commented on issue #9671: URL: https://github.com/apache/incubator-gluten/issues/9671#issuecomment-2887875340 @JkSelf do you know where the large memory block is allocated in stack? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
