GitHub user zuotingbing opened a pull request:

    https://github.com/apache/spark/pull/20029

    [SPARK-22793][SQL]Memory leak in Spark Thrift Server

    # What changes were proposed in this pull request?
    1. Start HiveThriftServer2.
    2. Connect to thriftserver through beeline.
    3. Close the beeline.
    4. repeat step2 and step 3 for several times.
    we found there are many directories never be dropped under the path 
`hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the 
scratchdir has been added to deleteOnExit when it be created. So it means that 
the cache size of FileSystem `deleteOnExit` will keep increasing until JVM 
terminated.
    
    In addition, we use `jmap -histo:live [PID]`
    to printout the size of objects in HiveThriftServer2 Process, we can find 
the object `org.apache.spark.sql.hive.client.HiveClientImpl` and 
`org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we 
closed all the beeline connections, which may caused the leak of Memory.
    
    # How was this patch tested?
    manual tests
    
    This PR follw-up the https://github.com/apache/spark/pull/19989

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zuotingbing/spark SPARK-22793

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20029
    
----
commit 2b1e166f4f43b300d272fc7ce1d9d7997f7ae3cd
Author: zuotingbing <zuo.tingbing9@...>
Date:   2017-12-20T07:52:21Z

    [SPARK-22793][SQL]Memory leak in Spark Thrift Server

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to