[GitHub] spark pull request: [SPARK-8202] [PYSPARK] fix infinite loop durin...

JoshRosen Wed, 17 Jun 2015 23:53:41 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/6714#issuecomment-113055264
  
    After this patch, it looks like `_next_limit` is only called once per 
`ExternalSorter.sorted` call.  Since `_next_limit` is only called from there, I 
wonder whether we should just remove that method and inline its code at the 
call site in order to make things a bit easier to read.
    
    This makes me wonder, though: is it actually safe to not allow the memory 
limit to raise after spilling like it did before?  Here's the comment in 
`_next_limit`:
    
    ```python
        def _next_limit(self):
            """
            Return the next memory limit. If the memory is not released
            after spilling, it will dump the data only when the used memory
            starts to increase.
            """
            return max(self.memory_limit, get_used_memory() * 1.05)
    ```
    
    If we no longer call `_next_limit()` after spilling, what will happen in 
cases where spilling somehow prevented memory from being freed?  Does this mean 
that we'll spill every `batch` items and pay a huge cost during merge?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8202] [PYSPARK] fix infinite loop durin...

Reply via email to