[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250251#comment-14250251
 ] 

Hive QA commented on HIVE-8843:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687754/HIVE-8843.3-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7236 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_authorization_admin_almighty1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/564/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/564/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-564/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687754 - PreCommit-HIVE-SPARK-Build

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, 
 HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-17 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250496#comment-14250496
 ] 

Jimmy Xiang commented on HIVE-8843:
---

These failures are not related to the patch.

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, 
 HIVE-8843.3-spark.patch, HIVE-8843.3-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-16 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248489#comment-14248489
 ] 

Jimmy Xiang commented on HIVE-8843:
---

Thanks a lot for reviewing it. Good point. Yes, it is a little intrusive for 
the RSC one. Let me fix it.

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-16 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248536#comment-14248536
 ] 

Jimmy Xiang commented on HIVE-8843:
---

Thought about it again. The current solution seems to be the simplest one.  Did 
I miss anything?

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249259#comment-14249259
 ] 

Hive QA commented on HIVE-8843:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687612/HIVE-8843.2-spark.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7235 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mergejoins
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/557/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/557/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-557/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687612 - PreCommit-HIVE-SPARK-Build

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249434#comment-14249434
 ] 

Xuefu Zhang commented on HIVE-8843:
---

+1 pending on tests.

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch, HIVE-8843.2-spark.patch, 
 HIVE-8843.3-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249506#comment-14249506
 ] 

Hive QA commented on HIVE-8843:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687661/HIVE-8843.3-spark.patch

{color:red}ERROR:{color} -1 due to 480 failed/errored test(s), 7142 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_dboutput
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_avg
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_group_concat
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_max
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_max_n
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_min
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_min_n
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_13
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_bmj_schema_evolution
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_date_funcs
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_shufflejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_udf_local_resource
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hadoop.hive.ql.TestCreateUdfEntities.testUdfWithDfsResource
org.apache.hadoop.hive.ql.TestCreateUdfEntities.testUdfWithLocalResource
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.exec.TestExecDriver.initializationError
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClass
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassComparison
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassUnionAll
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testGetMethodInternal
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testGetTypeInfoForPrimitiveCategory
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testImplicitConversion
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testImpliesOrder
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testIsRankingFunction
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testPrintTypeCompatibility
org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testTypeAffinity
org.apache.hadoop.hive.ql.exec.TestOperators.testFetchOperatorContext
org.apache.hadoop.hive.ql.exec.TestOperators.testScriptOperator
org.apache.hadoop.hive.ql.exec.TestUtilities.testgetDbTableName
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testGetNonDefaultSession
org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testReturn
org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testSessionPoolGetInOrder
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testBuildDag
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testClose
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testEmptyWork

[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247597#comment-14247597
 ] 

Hive QA commented on HIVE-8843:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12687309/HIVE-8843.1-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7235 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_spark4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/549/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/549/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-549/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12687309 - PreCommit-HIVE-SPARK-Build

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8843) Release RDD cache when Hive query is done [Spark Branch]

2014-12-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247735#comment-14247735
 ] 

Xuefu Zhang commented on HIVE-8843:
---

[~jxiang], thanks for working on this. The change made here seems a little more 
complicated and pervasive than I thought. A SparkPlan object has all the 
references to the RDDs including those being cached. Thus, once the plan is 
executed, these cached RDDs can be released by accessing SparkPlan object. 
Thus, the changes will most likely be made in RemoteHiveSparkClient and 
LocalHiveSparkClient.

 Release RDD cache when Hive query is done [Spark Branch]
 

 Key: HIVE-8843
 URL: https://issues.apache.org/jira/browse/HIVE-8843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang
 Attachments: HIVE-8843.1-spark.patch


 In some multi-inser cases, RDD.cache() is called to improve performance. RDD 
 is SparkContext specific, but the caching is useful only for the query. Thus, 
 once the query is executed, we need to release the cache used by calling 
 RDD.uncache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)