Github user gaborgsomogyi commented on the issue:
https://github.com/apache/spark/pull/19893
I've analysed the hive related test flow and found SparkSession and
SQLContext sharing between suites as you mentioned. Here is the execution flow:
1. The first hive test suite instantiates TestHive which creates
SparkSession and SQLContext
2. SparkFunSuite.beforeAll creates a thread snapshot
3. Test code runs
4. TestHiveSingleton.afterAll resets SparkSession
5. SparkFunSuite.afterAll prints out the possible leaks
Step one executed only by the first hive suite and never again.
Here I do not see false positives in big scale. The only possible false
positive threads what I foresee could come from lazy initialisation within
SparkSession or SQLContext. On the leftover side we're not tracking
SparkSession and SQLContext threads but because of the singleton nature my
suggestion is to leave it like that.
In this case you mentioned
```
$ grep 'POSSIBLE THREAD LEAK' unit-tests.log | wc -l
158
```
I can imagine the following situations:
1. Test doesn't call hiveContext.reset()
2. Test creates thread but not frees up
3. Production code issue
4. ...
Of course there could be other issues which I've not considered, please
share your ideas.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]