Github user liutang123 commented on the issue:

    https://github.com/apache/spark/pull/20184
  
    Hi, @jerryshao , we can  produce this issue as follows:
    ```
    $ bin/spark-shell --master local --conf 
spark.sql.windowExec.buffer.spill.threshold=1 --driver-memory 1G 
    scala>sc.range(1, 2000).toDF.registerTempTable("test_table")
    scala>spark.sql("select row_number() over (partition by 1)  from 
test_table").collect
    ```
    This will cause OOM.
    The above case is an extreme case.
    Normally, the spark.sql.windowExec.buffer.spill.threshold is 4096 by 
default and the cache used in UnsafeSorterSpillReader is more than 1MB. When 
the rows in a window is more than 4096000, UnsafeExternalSorter.ChainedIterator 
will has a queue witch contains 1000 UnsafeSorterSpillReader(s). So, the queue 
costs a lot of memory and is liable to cause OOM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to