GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/8909
[SPARK-10810] [WIP] [SQL] Improve session management in SQL
This PR improve the sessions management by replacing the thread-local based
to one SQLContext per session approach, introduce separated temporary tables
and UDFs/UDAFs for each session.
A new session of SQLContext could be created by:
1) create an new SQLContext
2) call newSession() on existing SQLContext
For HiveContext, in order to reduce the cost for each session, the
classloader and Hive client are shared across multiple sessions (created by
newSession).
CacheManager is also shared by multiple sessions, so cache a table multiple
times in different sessions will not cause multiple copies of in-memory cache.
Added jars are still shared by all the sessions, because SparkContext does
not support sessions.
cc @marmbrus @yhuai @rxin
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark sessions
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8909.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8909
----
commit af8df73556b466fa68dd0689469cb465dd761a1e
Author: Davies Liu <[email protected]>
Date: 2015-09-24T19:08:54Z
sessions for SQLContext
commit bc9f06447482dac5cfc61fd9d8c21bacb2300431
Author: Davies Liu <[email protected]>
Date: 2015-09-24T21:18:00Z
improve executionHive
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]