Github user rajeshbalamohan commented on the issue:
https://github.com/apache/spark/pull/10846
SoftRef causes lots of mem-pressure on thrift server. To be precise, when
executing query with large dataset, it can very soon run at 1200% CPU and all
threads carrying out just GC activities. That is for the HadoopRDD conf
caching. Due to softRef they reach till GC threshold and gets cleared up. It
does not OOM, but runs at very high CPU due to GC.
JobProgress* does not cleanup the data fast enough in some cases (e.g too
many queries are executed continuously) and in such cases the memory pressure
on thrift server increases.
Both of them contribute to the high CPU usage. I am afraid that fixing one
of them would still have the high-CPU usage issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]