Github user gaborgsomogyi commented on the issue:

    https://github.com/apache/spark/pull/19893
  
    I've analysed the hive related test flow and found SparkSession and 
SQLContext sharing between suites as you mentioned. Here is the execution flow:
    
    1. The first hive test suite instantiates TestHive which creates 
SparkSession and SQLContext
    2. SparkFunSuite.beforeAll creates a thread snapshot
    3. Test code runs
    4. TestHiveSingleton.afterAll resets SparkSession
    5. SparkFunSuite.afterAll prints out the possible leaks
    
    Step one executed only by the first hive suite and never again.
    
    Here I do not see false positives in big scale. The only possible false 
positive threads what I foresee could come from lazy initialisation within 
SparkSession or SQLContext. On the leftover side we're not tracking 
SparkSession and SQLContext threads but because of the singleton nature my 
suggestion is to leave it like that.
    
    In this case you mentioned
    ```
    $ grep 'POSSIBLE THREAD LEAK' unit-tests.log  | wc -l
    158
    ```
    I can imagine the following situations:
    1. Test doesn't call hiveContext.reset()
    2. Test creates thread but not frees up
    3. Production code issue
    4. ...
    Of course there could be other issues which I've not considered, please 
share your ideas.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to