[GitHub] spark pull request #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Ser...

zuotingbing Fri, 15 Dec 2017 02:24:50 -0800

GitHub user zuotingbing opened a pull request:

    https://github.com/apache/spark/pull/19989


    [SPARK-22793][SQL]Memory leak in Spark Thrift Server

    ## What changes were proposed in this pull request?
    
    1. Start HiveThriftServer2.
    2. Connect to thriftserver through beeline.
    3. Close the beeline.
    4. repeat step2 and step 3 for several times, which caused the leak of 
Memory.
    
    we found there are many directories never be dropped under the path 
`hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the 
scratchdir has been added to deleteOnExit when it be created. So it means that 
the cache size of FileSystem deleteOnExit will keep increasing until JVM 
terminated.
    
    In addition, we use `jmap -histo:live [PID]`
    to printout the size of objects in HiveThriftServer2 Process, we can find 
the object `org.apache.spark.sql.hive.client.HiveClientImpl` and 
`org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we 
closed all the beeline connections, which caused the leak of Memory.
    
    ## How was this patch tested?
    
    manual tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zuotingbing/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19989.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19989
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19989: [SPARK-22793][SQL]Memory leak in Spark Thrift Ser...

Reply via email to