[GitHub] spark pull request: [SPARK-10474] [SQL] Aggregation fails to alloc...

andrewor14 Fri, 18 Sep 2015 13:21:12 -0700

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/8827


    [SPARK-10474] [SQL] Aggregation fails to allocate memory for pointer array

    When `TungstenAggregation` hits memory pressure, it switches from 
hash-based to sort-based aggregation in-place. However, in the process we try 
to allocate the pointer array for writing to the new `UnsafeExternalSorter` 
*before* actually freeing the memory from the hash map. This lead to the 
following exception:
    ```
     java.io.IOException: Could not acquire 65536 bytes of memory
            at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169)
            at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220)
            at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:126)
            at 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257)
            at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435)
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark allocate-pointer-array

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8827.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8827
    
----
commit f23370fc6f2860e2317b4f7b8841024c33052b89
Author: Andrew Or <[email protected]>
Date:   2015-09-18T19:17:31Z

    Allocate pointer array only after releasing memory
    
    Currently we allocate the pointer array after spilling, which,
    however, does not actually release any memory. Instead we need to
    do it after we free the map.
    
    This happens when we use TungstenAggregate and switch to sort
    based aggregation due to memory pressure.

commit f7dd17fca10d33e5afceb9df934d5a2f2290abae
Author: Andrew Or <[email protected]>
Date:   2015-09-18T20:15:05Z

    Add tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10474] [SQL] Aggregation fails to alloc...

Reply via email to