[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2018-01-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20029
  
`addJar ` is cross-session. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2018-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85736/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2018-01-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20029
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2018-01-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20029
  
**[Test build #85736 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85736/testReport)**
 for PR 20029 at commit 
[`2b1e166`](https://github.com/apache/spark/commit/2b1e166f4f43b300d272fc7ce1d9d7997f7ae3cd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2018-01-05 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20029
  
> The hiveClient created for the resourceLoader is only used to addJar, 
which is, in turn, to add Jar to the shared IsolatedClientLoader. Then we can 
just use the shared hive client for this purpose.

Shouldn't `addJar` be session-based? At least seems in Hive it is: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources

Although looks like in `SessionResourceLoader` for `SessionState`, `addJar` 
isn't session-based too. So at least seems we have consistent behavior.







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2018-01-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20029
  
**[Test build #85736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85736/testReport)**
 for PR 20029 at commit 
[`2b1e166`](https://github.com/apache/spark/commit/2b1e166f4f43b300d272fc7ce1d9d7997f7ae3cd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2018-01-05 Thread liufengdb
Github user liufengdb commented on the issue:

https://github.com/apache/spark/pull/20029
  
lgtm!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2018-01-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20029
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-29 Thread liufengdb
Github user liufengdb commented on the issue:

https://github.com/apache/spark/pull/20029
  
By [this 
line](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L78),
 yes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-29 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20029
  
> The hiveClient created for the resourceLoader is only used to addJar, 
which is, in turn, to add Jar to the shared IsolatedClientLoader. Then we can 
just use the shared hive client for this purpose.

@liufengdb does it mean that we are creating more than one SparkSession in 
the thriftserver?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-28 Thread liufengdb
Github user liufengdb commented on the issue:

https://github.com/apache/spark/pull/20029
  
@zuotingbing I took a close look at the related code and thought the issue 
you raised is valid:

1. The hiveClient created for the 
[resourceLoader](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L45)
 is only used to addJar, which is, in turn, to add Jar to the shared 
[`IsolatedClientLoader`](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L817).
 Then we can just use the shared hive client for this purpose.

2. Another possible reason to use a new hive client is to run [this hive 
statement](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L818).
 But I think it just some leftovers from old spark and should be removed. So 
overall it is fined to use the shared `client` from `HiveExternalCatalog` 
without creating a new hive client.

3. Currrently, there are no ways to cleanup the resource created by a [new 
session of 
SQLContext/SparkSession](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L78).
 I couldn't understand the design tradeoff behind 
[this](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L716)
 (@srowen ). So it is not easy to remove the temp dirs when a session is closed.

4. To what extent, does spark need these scratch dirs? Is it possible we 
can make this step optional, if it is not used for all the deployment modes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-28 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20029
  
What it seems is never closed by your analysis is the client used to 
interact with the metastore. This might be a problem which we are not aware of 
in normal SQL applications, since we have only one client in those cases.

What you are doing in your fix is avoiding creating a client for each 
`HiveSessionBuilder`, thus:

 1. this would mean that we are creating more than one `SessionBuilder`, 
ie. more than one `SparkSession`, which is not true as far as I know.
 2. any session would share the same client to connect to the metastore, 
which is wrong IMHO.

Please let me know if I misunderstood or I was wrong with something.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-27 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/20029
  
Could you please to check this PR? Thanks @liufengdb


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20029
  
cc @liufengdb 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-21 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/20029
  
`override protected lazy val resourceLoader: HiveSessionResourceLoader = {
val client: HiveClient = externalCatalog.client.newSession()
new HiveSessionResourceLoader(session, client)
  }`
Is it necessary to create a new HiveClient here since there has a 
hiveClient in externalCatalog already?
If it is necessary, we need to supply a method to close the hiveClient 
which be created here and in this method we alsO need to clean up the 
scratchdir(`hdfsSessionPath` and `localSessionPath`)  which are created by 
HiveClientImpl.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-20 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/20029
  
It seems each time when connect to thrift server through beeline, the 
`SessionState.start(state)` will be called two times. one is in 
`HiveSessionImpl:open` , another is in `HiveClientImpl.newSession()` for 
`sql("use default")` . When close the beeline connection, only close the 
HiveSession with `HiveSessionImpl.close()`, but the object of 
`HiveClientImpl.newSession()` will be left over.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-20 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20029
  
> we can find the object org.apache.spark.sql.hive.client.HiveClientImpl 
and org.apache.hadoop.hive.ql.session.SessionState keep increasing

Can you check the GC root and explain why they are increasing? The fix 
looks not correct to me as we should create new session.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-20 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20029
  
I'm asking you to respond to 
https://github.com/apache/spark/pull/19989#issuecomment-351985114


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-20 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/20029
  
Thanks @srowen , so whom could i ping to make sure this change has no side 
effects? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-20 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20029
  
This indeed is the primary change as it's open vs master. 
https://github.com/apache/spark/pull/19989 had some concerns about whether this 
affects correctness though?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20029: [SPARK-22793][SQL]Memory leak in Spark Thrift Server

2017-12-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20029
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org