[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel closed the pull request at: https://github.com/apache/spark/pull/4382 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-82003000 Closing it since #4885 has been merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4885#issuecomment-82034566 Thank you very much @liancheng, I will create another PR for the requirements that we discussed above, and also the minor issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4885#discussion_r26504850 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala --- @@ -195,6 +195,146 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { } } } + + test(test multiple session) { +import org.apache.spark.sql.SQLConf +var defaultV1: String = null +var defaultV2: String = null + +withMultipleConnectionJdbcStatement( + // create table + { statement = + +val queries = Seq( +DROP TABLE IF EXISTS test_map, +CREATE TABLE test_map(key INT, value STRING), +sLOAD DATA LOCAL INPATH '${TestData.smallKv}' OVERWRITE INTO TABLE test_map, +CACHE TABLE test_table AS SELECT key FROM test_map ORDER BY key DESC) + +queries.foreach(statement.execute) + +val rs1 = statement.executeQuery(SELECT key FROM test_table ORDER BY KEY DESC) +val buf1 = new collection.mutable.ArrayBuffer[Int]() +while (rs1.next()) { + buf1 += rs1.getInt(1) +} +rs1.close() + +val rs2 = statement.executeQuery(SELECT key FROM test_map ORDER BY KEY DESC) +val buf2 = new collection.mutable.ArrayBuffer[Int]() +while (rs2.next()) { + buf2 += rs2.getInt(1) +} +rs2.close() + +assert(buf1 === buf2) + }, + + // first session, we get the default value of the session status + { statement = + +val rs1 = statement.executeQuery(sSET ${SQLConf.SHUFFLE_PARTITIONS}) +rs1.next() +defaultV1 = rs1.getString(1) +assert(defaultV1 != 200) --- End diff -- Would be nice to add comment to indicate that the expected value should be `undefined`. I was quite confused at first as 200 should be the default value of spark.sql.shuffle.partitions :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4885#discussion_r26504872 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala --- @@ -195,6 +195,146 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { } } } + + test(test multiple session) { +import org.apache.spark.sql.SQLConf +var defaultV1: String = null +var defaultV2: String = null + +withMultipleConnectionJdbcStatement( + // create table + { statement = + +val queries = Seq( +DROP TABLE IF EXISTS test_map, +CREATE TABLE test_map(key INT, value STRING), +sLOAD DATA LOCAL INPATH '${TestData.smallKv}' OVERWRITE INTO TABLE test_map, +CACHE TABLE test_table AS SELECT key FROM test_map ORDER BY key DESC) + +queries.foreach(statement.execute) + +val rs1 = statement.executeQuery(SELECT key FROM test_table ORDER BY KEY DESC) +val buf1 = new collection.mutable.ArrayBuffer[Int]() +while (rs1.next()) { + buf1 += rs1.getInt(1) +} +rs1.close() + +val rs2 = statement.executeQuery(SELECT key FROM test_map ORDER BY KEY DESC) +val buf2 = new collection.mutable.ArrayBuffer[Int]() +while (rs2.next()) { + buf2 += rs2.getInt(1) +} +rs2.close() + +assert(buf1 === buf2) + }, + + // first session, we get the default value of the session status + { statement = + +val rs1 = statement.executeQuery(sSET ${SQLConf.SHUFFLE_PARTITIONS}) +rs1.next() +defaultV1 = rs1.getString(1) +assert(defaultV1 != 200) +rs1.close() + +val rs2 = statement.executeQuery(SET hive.cli.print.header) +rs2.next() + +defaultV2 = rs2.getString(1) +assert(defaultV1 != true) --- End diff -- `defaultV2`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4885 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4885#discussion_r26505199 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala --- @@ -195,6 +195,146 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest { } } } + + test(test multiple session) { +import org.apache.spark.sql.SQLConf +var defaultV1: String = null +var defaultV2: String = null + +withMultipleConnectionJdbcStatement( + // create table + { statement = + +val queries = Seq( +DROP TABLE IF EXISTS test_map, +CREATE TABLE test_map(key INT, value STRING), +sLOAD DATA LOCAL INPATH '${TestData.smallKv}' OVERWRITE INTO TABLE test_map, +CACHE TABLE test_table AS SELECT key FROM test_map ORDER BY key DESC) + +queries.foreach(statement.execute) + +val rs1 = statement.executeQuery(SELECT key FROM test_table ORDER BY KEY DESC) +val buf1 = new collection.mutable.ArrayBuffer[Int]() +while (rs1.next()) { + buf1 += rs1.getInt(1) +} +rs1.close() + +val rs2 = statement.executeQuery(SELECT key FROM test_map ORDER BY KEY DESC) +val buf2 = new collection.mutable.ArrayBuffer[Int]() +while (rs2.next()) { + buf2 += rs2.getInt(1) +} +rs2.close() + +assert(buf1 === buf2) + }, + + // first session, we get the default value of the session status + { statement = + +val rs1 = statement.executeQuery(sSET ${SQLConf.SHUFFLE_PARTITIONS}) +rs1.next() +defaultV1 = rs1.getString(1) +assert(defaultV1 != 200) +rs1.close() + +val rs2 = statement.executeQuery(SET hive.cli.print.header) +rs2.next() + +defaultV2 = rs2.getString(1) +assert(defaultV1 != true) +rs2.close() + }, + + // second session, we update the session status + { statement = + +val queries = Seq( +sSET ${SQLConf.SHUFFLE_PARTITIONS}=291, +SET hive.cli.print.header=true +) + +queries.map(statement.execute) +val rs1 = statement.executeQuery(sSET ${SQLConf.SHUFFLE_PARTITIONS}) +rs1.next() +assert(spark.sql.shuffle.partitions=291 === rs1.getString(1)) +rs1.close() + +val rs2 = statement.executeQuery(SET hive.cli.print.header) +rs2.next() +assert(hive.cli.print.header=true === rs2.getString(1)) +rs2.close() + }, + + // third session, we get the latest session status, supposed to be the + // default value + { statement = + +val rs1 = statement.executeQuery(sSET ${SQLConf.SHUFFLE_PARTITIONS}) +rs1.next() +assert(defaultV1 === rs1.getString(1)) +rs1.close() + +val rs2 = statement.executeQuery(SET hive.cli.print.header) +rs2.next() +assert(defaultV2 === rs2.getString(1)) +rs2.close() + }, + + // accessing the cached data in another session + { statement = + +val rs1 = statement.executeQuery(SELECT key FROM test_table ORDER BY KEY DESC) +val buf1 = new collection.mutable.ArrayBuffer[Int]() +while (rs1.next()) { + buf1 += rs1.getInt(1) +} +rs1.close() + +val rs2 = statement.executeQuery(SELECT key FROM test_map ORDER BY KEY DESC) +val buf2 = new collection.mutable.ArrayBuffer[Int]() +while (rs2.next()) { + buf2 += rs2.getInt(1) +} +rs2.close() + +assert(buf1 === buf2) +statement.executeQuery(UNCACHE TABLE test_table) + +// TODO need to figure out how to determine if the data loaded from cache --- End diff -- We may check the result of `EXPLAIN EXTENDED SELECT ...` for `InMemoryColumnarTableScan`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4885#issuecomment-81812932 Hey @chenghao-intel, left another 3 minor comments. But I'm gonna merge this. Please fix them in another PR. Also verified locally that both session isolation and cache sharing work as expected. Thanks for the efforts!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-77792262 [Test build #28377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28377/consoleFull) for PR 4382 at commit [`197e806`](https://github.com/apache/spark/commit/197e806e14b5e9ad2201eb3f319222d890b6e2fc). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-77798081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28377/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-77798078 [Test build #28377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28377/consoleFull) for PR 4382 at commit [`197e806`](https://github.com/apache/spark/commit/197e806e14b5e9ad2201eb3f319222d890b6e2fc). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-76874043 @chenghao-intel I'm little confusing about why people have to create multiple `HiveContext` or `SQLContext` instances. Could u provide some cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-76901265 @chenghao-intel I'm posting the summary of our offline discussion here for future reference: `SQLContext.CacheManager` maps to `SparkContext.CacheManager` in a one-to-one manner. Once the SQL cache manager is made a global object, users can only use a single `SparkContext` in a single process. This limitation is neither wanted nor necessary. On the other hand, moving session specific stuff to `SQLSession` and leaving stuff shared by multiple sessions in `SQLContext`/`HiveContext` avoids this problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-75908281 Hey @chenghao-intel, thanks for working on this, AFAIK this is a pain point for many Spark SQL users who would like to put HiveThriftServer2 into production. Also had a discussion with @marmbrus about this recently. As we've discussed offline, instead of changing `CacheManager` and `FunctionRegistry` to global instances, adding a `SQLSession` and moving per-session objects (configurations, temporary functions, etc.) to it could be more preferrable. To be more specific: 1. Add a new `SQLSession` class, which is responsible to maintain all per-session objects, like configurations, temporary functions, etc.. 2. Add a `session` field of type `SQLSession` in `SQLContext`, and override it in `HiveContext`, then put Hive specific per-session objects into it, like Hive client, Hive session state, etc.. 3. Add the following session specific methods to `SQLContext`: - `createSession: SQLSession` - `currentSession: SQLSession` - `setSession(session: SQLSession): Unit` - `closeSession(session: SQLSession)` These methods should be `private[sql]` as they are subject to change. Currently we can just mimic Hive behavior, for example, using thread-local instances just like what Hive session state does. (You may see the `SQLSession` object within `HiveContext` a thin wrapper of `SessionState` together with other per-session components.) The benefits of this approach are: 1. In the long run, we'd like to move `HiveContext` out of the main framework and make Hive a separate data source. With the above approach, it's more natural to build a separate Spark SQL server with multi-user support without depending on Hive specific code. 2. Making components like `CacheManager` global objects are not test friendly. Basically it's impossible to make Spark SQL tests run in parallel. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-75884341 /cc @liancheng @marmbrus can you give some high level comments? and then I can start the rebasing (again) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-75919566 @liancheng thank you very much for the so detailed comment! Actually I am quite fighting with the 2 approaches: Single `HiveContext` Instance (with the thread local of SQLSession) V.S. Multiple `HiveContext` Instances(with global `CacheManager` and Hive Metastore instance etc.) The main reasons I choice the later approach is people can create `HiveContext` or `SQLContext` instance without any restriction when using DF API, and then they may also using the created instances within the same thread, in `ThreadLocal`(former one) approach, it still will causes unpredictable behavior, unless we make the `HiveContext` or `SQLContext` as global unique instance, but this is not we want, right? Whatever approach we take, it's intuitive that the `CacheManager`, `HiveMetaStore`, `FunctionRegistry` can be `Context` independent and thread-safe, and we can put the `Context` dependent information into `SQLContext` or `HiveContext`, like the `sqlconf` and `sessionState`. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-75535316 /cc @liancheng can you review this for me? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-75418987 [Test build #27830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27830/consoleFull) for PR 4382 at commit [`197e806`](https://github.com/apache/spark/commit/197e806e14b5e9ad2201eb3f319222d890b6e2fc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-75420900 [Test build #27830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27830/consoleFull) for PR 4382 at commit [`197e806`](https://github.com/apache/spark/commit/197e806e14b5e9ad2201eb3f319222d890b6e2fc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-75420903 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27830/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74402603 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27501/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74401576 [Test build #27501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27501/consoleFull) for PR 4382 at commit [`9d3b296`](https://github.com/apache/spark/commit/9d3b29615b01c57b1af33666448c90cf5d749ae9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74402600 [Test build #27501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27501/consoleFull) for PR 4382 at commit [`9d3b296`](https://github.com/apache/spark/commit/9d3b29615b01c57b1af33666448c90cf5d749ae9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user guowei2 commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74004209 I agree with liancheng, user want to share cached tables in many cases. I still think isolate all resources in `HiveContext` is not a good idea. Whether to use cached table created by different user is the responsibility the authority does. we should not always keep them invisibility I still think `SessionState` `ThreadLocal` is enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74026303 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74026567 [Test build #27328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27328/consoleFull) for PR 4382 at commit [`73922ae`](https://github.com/apache/spark/commit/73922ae23f64997e6ee0d30fb74f6d80d38d385e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74024944 [Test build #27326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27326/consoleFull) for PR 4382 at commit [`73922ae`](https://github.com/apache/spark/commit/73922ae23f64997e6ee0d30fb74f6d80d38d385e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait DataFrame extends RDDApi[Row] with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74024949 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27326/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74021473 [Test build #27326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27326/consoleFull) for PR 4382 at commit [`73922ae`](https://github.com/apache/spark/commit/73922ae23f64997e6ee0d30fb74f6d80d38d385e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74024214 I agree with @liancheng, too, both code and description are updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74026286 Seems failure due to the irrelevant code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74030756 [Test build #27328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27328/consoleFull) for PR 4382 at commit [`73922ae`](https://github.com/apache/spark/commit/73922ae23f64997e6ee0d30fb74f6d80d38d385e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-74030760 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27328/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73761370 @chenghao-intel The biggest problem issue I see in this PR is that users can not share cached tables any more, which breaks many existing use scenarios. I'm afraid it may require major efforts to support both multi-session and cache sharing. I can think of two alternatives: 1. All sessions share a single `HiveContext` to enable cache sharing, and provide one `SQLConf` per session to isolate session configurations. 2. Provide one `HiveContext` per session to isolate session configuration, but extract cached table meta data management out of `SQLContext` / `HiveContext` to some global place, so that different `HiveContext` can figure out the mapping between cached tables (actually query plans) and their underlying materialized in-memory columnar RDDs). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73813752 Yeah, I can understand people want to share the cached table among multi-sessions, is there any potential requirement that people just want to keep the 'temp' table visibility within the HiveContext (will not share with the other session)? Probably the option 2 is a better if we want to support the feature above in the future. Anyway, thanks for reviewing, I will update the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73630755 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73637897 [Test build #27158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27158/consoleFull) for PR 4382 at commit [`403d6ec`](https://github.com/apache/spark/commit/403d6ec4e11e0e815a2b2f10ebd4e530d857074b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73637900 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27158/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73629899 @liancheng Seems the `HiveThriftServer2Suite` still not be triggered. test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73629292 [Test build #27143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27143/consoleFull) for PR 4382 at commit [`403d6ec`](https://github.com/apache/spark/commit/403d6ec4e11e0e815a2b2f10ebd4e530d857074b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73621115 HiveThriftServer2Suite timeout is also fixed, please refer to #4484. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73621049 test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73621169 Yeah, you should retest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73621530 [Test build #27143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27143/consoleFull) for PR 4382 at commit [`403d6ec`](https://github.com/apache/spark/commit/403d6ec4e11e0e815a2b2f10ebd4e530d857074b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73631012 [Test build #27158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27158/consoleFull) for PR 4382 at commit [`403d6ec`](https://github.com/apache/spark/commit/403d6ec4e11e0e815a2b2f10ebd4e530d857074b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73620632 @chenghao-intel This is just fixed by #4486. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73620715 Should I retest this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2087] [SQL] Multiple thriftserver sessi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4382#issuecomment-73629300 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27143/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org