It should be a bug, the Python worker did not exit normally, could you
file a JIRA for this?

Also, could you show how to reproduce this behavior?

On Fri, Jan 23, 2015 at 11:45 PM, Sven Krasser <kras...@gmail.com> wrote:
> Hey Adam,
>
> I'm not sure I understand just yet what you have in mind. My takeaway from
> the logs is that the container actually was above its allotment of about
> 14G. Since 6G of that are for overhead, I assumed there to be plenty of
> space for Python workers, but there seem to be more of those than I'd
> expect.
>
> Does anyone know if that is actually the intended behavior, i.e. in this
> case over 90 Python processes on a 2 core executor?
>
> Best,
> -Sven
>
>
> On Fri, Jan 23, 2015 at 10:04 PM, Adam Diaz <adam.h.d...@gmail.com> wrote:
>>
>> Yarn only has the ability to kill not checkpoint or sig suspend.  If you
>> use too much memory it will simply kill tasks based upon the yarn config.
>> https://issues.apache.org/jira/browse/YARN-2172
>>
>>
>> On Friday, January 23, 2015, Sandy Ryza <sandy.r...@cloudera.com> wrote:
>>>
>>> Hi Sven,
>>>
>>> What version of Spark are you running?  Recent versions have a change
>>> that allows PySpark to share a pool of processes instead of starting a new
>>> one for each task.
>>>
>>> -Sandy
>>>
>>> On Fri, Jan 23, 2015 at 9:36 AM, Sven Krasser <kras...@gmail.com> wrote:
>>>>
>>>> Hey all,
>>>>
>>>> I am running into a problem where YARN kills containers for being over
>>>> their memory allocation (which is about 8G for executors plus 6G for
>>>> overhead), and I noticed that in those containers there are tons of
>>>> pyspark.daemon processes hogging memory. Here's a snippet from a container
>>>> with 97 pyspark.daemon processes. The total sum of RSS usage across all of
>>>> these is 1,764,956 pages (i.e. 6.7GB on the system).
>>>>
>>>> Any ideas what's happening here and how I can get the number of
>>>> pyspark.daemon processes back to a more reasonable count?
>>>>
>>>> 2015-01-23 15:36:53,654 INFO  [Reporter] yarn.YarnAllocationHandler
>>>> (Logging.scala:logInfo(59)) - Container marked as failed:
>>>> container_1421692415636_0052_01_000030. Exit status: 143. Diagnostics:
>>>> Container [pid=35211,containerID=container_1421692415636_0052_01_000030] is
>>>> running beyond physical memory limits. Current usage: 14.9 GB of 14.5 GB
>>>> physical memory used; 41.3 GB of 72.5 GB virtual memory used. Killing
>>>> container.
>>>> Dump of the process-tree for container_1421692415636_0052_01_000030 :
>>>>    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>>>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>>>>    |- 54101 36625 36625 35211 (python) 78 1 332730368 16834 python -m
>>>> pyspark.daemon
>>>>    |- 52140 36625 36625 35211 (python) 58 1 332730368 16837 python -m
>>>> pyspark.daemon
>>>>    |- 36625 35228 36625 35211 (python) 65 604 331685888 17694 python -m
>>>> pyspark.daemon
>>>>
>>>>    [...]
>>>>
>>>>
>>>> Full output here: https://gist.github.com/skrasser/e3e2ee8dede5ef6b082c
>>>>
>>>> Thank you!
>>>> -Sven
>>>>
>>>> --
>>>> krasser
>>>
>>>
>
>
>
> --
> http://sites.google.com/site/krasser/?utm_source=sig

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to