[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

BryanCutler Tue, 21 Aug 2018 12:08:38 -0700

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/21977
  
    @holdenk pyarrow uses a C++ based memory pool, so I'm not sure how that 
exactly works with rlimit but I ran some tests and looks like an error is 
thrown when the limit is set.
    
    **with setrlimit**
    ```python
    >>> import pyarrow as pa
    >>> import resource
    >>> resource.setrlimit(resource.RLIMIT_AS, (1000 * 1024 * 1024, 1000 * 1024 
* 1024))
    >>> a = list(range(1 << 20))
    >>> b = [pa.array(a) for i in range(10)]
    >>> c = [pa.array(a) for i in range(10)]
    >>> pa.total_allocated_bytes()
    170393600
    >>> d = [pa.array(a) for i in range(100)]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 1, in <listcomp>
      File "pyarrow/array.pxi", line 186, in pyarrow.lib.array
      File "pyarrow/array.pxi", line 26, in pyarrow.lib._sequence_to_array
      File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
    pyarrow.lib.ArrowMemoryError: malloc of size 8388608 failed
    ```
    
    **no limit**
    ```python
    >>> import pyarrow as pa
    >>> a = list(range(1 << 20))
    >>> b = [pa.array(a) for i in range(10)]
    >>> c = [pa.array(a) for i in range(10)]
    >>> pa.total_allocated_bytes()
    170393600
    >>> d = [pa.array(a) for i in range(100)]
    >>> pa.total_allocated_bytes()
    1022361600
    ```
    
    One thing I wasn't expecting is it seems like importing pyarrow and it's 
shared libraries after setting rlimit can fail if it is set too low, and it is 
not a clean failure - is this expected?
    
    ```python
    >>> import resource
    >>> resource.setrlimit(resource.RLIMIT_AS, (100 * 1024 * 1024, 100 * 1024 * 
1024))
    >>> import pyarrow
    Traceback (most recent call last):
      File 
"/home/bryan/miniconda2/envs/pa010py35/lib/python3.5/site-packages/numpy/core/__init__.py",
 line 16, in <module>
        from . import multiarray
    ImportError: libopenblas.so.0: failed to map segment from shared object
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

Reply via email to