[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250251#comment-14250251 ] Hive QA commented on HIVE-8843: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687754/HIVE-8843.3-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7236 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/564/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/564/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-564/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687754 - PreCommit-HIVE-SPARK-Build Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250496#comment-14250496 ] Jimmy Xiang commented on HIVE-8843: --- These failures are not related to the patch. Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248489#comment-14248489 ] Jimmy Xiang commented on HIVE-8843: --- Thanks a lot for reviewing it. Good point. Yes, it is a little intrusive for the RSC one. Let me fix it. Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248536#comment-14248536 ] Jimmy Xiang commented on HIVE-8843: --- Thought about it again. The current solution seems to be the simplest one. Did I miss anything? Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249259#comment-14249259 ] Hive QA commented on HIVE-8843: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687612/HIVE-8843.2-spark.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7235 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mergejoins org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/557/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/557/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-557/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687612 - PreCommit-HIVE-SPARK-Build Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249434#comment-14249434 ] Xuefu Zhang commented on HIVE-8843: --- +1 pending on tests. Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, HIVE-8843.3-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249506#comment-14249506 ] Hive QA commented on HIVE-8843: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687661/HIVE-8843.3-spark.patch {color:red}ERROR:{color} -1 due to 480 failed/errored test(s), 7142 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_dboutput org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_avg org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_group_concat org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_max org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_max_n org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_min org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_min_n org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_13 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_bmj_schema_evolution org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_0 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_13 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_date_funcs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_shufflejoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_udf_local_resource org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hadoop.hive.ql.TestCreateUdfEntities.testUdfWithDfsResource org.apache.hadoop.hive.ql.TestCreateUdfEntities.testUdfWithLocalResource org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hadoop.hive.ql.exec.TestExecDriver.initializationError org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClass org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassComparison org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassUnionAll org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testGetMethodInternal org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testGetTypeInfoForPrimitiveCategory org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testImplicitConversion org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testImpliesOrder org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testIsRankingFunction org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testPrintTypeCompatibility org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testTypeAffinity org.apache.hadoop.hive.ql.exec.TestOperators.testFetchOperatorContext org.apache.hadoop.hive.ql.exec.TestOperators.testScriptOperator org.apache.hadoop.hive.ql.exec.TestUtilities.testgetDbTableName org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testGetNonDefaultSession org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testReturn org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testSessionPoolGetInOrder org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testBuildDag org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testClose org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testEmptyWork
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247597#comment-14247597 ] Hive QA commented on HIVE-8843: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12687309/HIVE-8843.1-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7235 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_spark4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/549/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/549/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-549/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12687309 - PreCommit-HIVE-SPARK-Build Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247735#comment-14247735 ] Xuefu Zhang commented on HIVE-8843: --- [~jxiang], thanks for working on this. The change made here seems a little more complicated and pervasive than I thought. A SparkPlan object has all the references to the RDDs including those being cached. Thus, once the plan is executed, these cached RDDs can be released by accessing SparkPlan object. Thus, the changes will most likely be made in RemoteHiveSparkClient and LocalHiveSparkClient. Release RDD cache when Hive query is done [Spark Branch] Key: HIVE-8843 URL: https://issues.apache.org/jira/browse/HIVE-8843 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang Attachments: HIVE-8843.1-spark.patch In some multi-inser cases, RDD.cache() is called to improve performance. RDD is SparkContext specific, but the caching is useful only for the query. Thus, once the query is executed, we need to release the cache used by calling RDD.uncache(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)