Github user liutang123 commented on the issue:
https://github.com/apache/spark/pull/20184
Hi, @jerryshao , we can produce this issue as follows:
```
$ bin/spark-shell --master local --conf
spark.sql.windowExec.buffer.spill.threshold=1 --driver-memory 1G
scala>sc.range(1, 2000).toDF.registerTempTable("test_table")
scala>spark.sql("select row_number() over (partition by 1) from
test_table").collect
```
This will cause OOM.
The above case is an extreme case.
Normally, the spark.sql.windowExec.buffer.spill.threshold is 4096 by
default and the cache used in UnsafeSorterSpillReader is more than 1MB. When
the rows in a window is more than 4096000, UnsafeExternalSorter.ChainedIterator
will has a queue witch contains 1000 UnsafeSorterSpillReader(s). So, the queue
costs a lot of memory and is liable to cause OOM.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]