[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297959#comment-14297959 ] Vikram Dixit K commented on HIVE-9436: -- Committed to RC for 1.0. RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Fix For: 1.0.0, 1.2.0 Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296096#comment-14296096 ] Sushanth Sowmyan commented on HIVE-9436: The test failure now reported is down to 3, and they're unconnected to the issue being fixed here, so I will go ahead and commit this. RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296104#comment-14296104 ] Sushanth Sowmyan commented on HIVE-9436: [~vikram.dixit] - if you do respin an rc for 1.0, it would be useful to have this fix in as well - it's a simple fix which fixes retries from the client, and is a robustness fix. It isn't, however a breaking bug as I see it, since it is used only in the case of connection issues, where we give it a chance to retry instead of failing directly. RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Fix For: 1.2.0 Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292991#comment-14292991 ] Hive QA commented on HIVE-9436: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694681/HIVE-9436.3.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7397 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2528/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2528/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2528/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694681 - PreCommit-HIVE-TRUNK-Build RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292744#comment-14292744 ] Thejas M Nair commented on HIVE-9436: - +1 I agree in context of metastore HA setup, it does makes sense to retry on JDOException as well. RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289628#comment-14289628 ] Hive QA commented on HIVE-9436: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694069/HIVE-9436.2.patch {color:red}ERROR:{color} -1 due to 67 failed/errored test(s), 7346 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_simple_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataOnlyOptimizer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_table_access_keys_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6_subq org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mrr org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289829#comment-14289829 ] Sushanth Sowmyan commented on HIVE-9436: Randomly sampling some of the test failures: {noformat} source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/udaf_histogram_numeric.q.out /home/hiveptest/54.205.215.38-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../ql/src/test/results/clientpositive/udaf_histogram_numeric.q.out 9c9 [{x:139.16078431372554,y:255.0},{x:386.1428571428572,y:245.0}] --- [{x:135.0284552845532,y:246.0},{x:381.39370078740143,y:254.0}] {noformat} {noformat} Running: diff -a /home/hiveptest/54.159.206.72-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/auto_join12.q.out /home/hiveptest/54.159.206.72-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../ql/src/test/results/clientpositive/auto_join12.q.out 32c32 $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src --- $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src 35c35 $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:src --- $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:$hdt$_1:src 39c39 $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src --- $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src 54c54 $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:src --- $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:$hdt$_1:src {noformat} Neither of those have anything whatsoever to do with retries. Also, I see some other errors reporting junit.framework.AssertionFailedError: Client Execution failed with error code = 4 running, which is similar to what's reported in HIVE-8997. Is there a test flakiness issue right now in trunk? RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match .\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289984#comment-14289984 ] Sushanth Sowmyan commented on HIVE-9436: As to the precommit tests, it looks like the majority of those tests were already failing before this build: {noformat} Test Result (66 failures / +3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection ... {noformat} This indicates that tests were already flaky (probably due to an earlier commit?) and there were a total of 3 reported regressions. Diffing between this build(2495) and the previous one where tests ran (2493) to find changes, I get: {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5 --- org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} For org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric: {noformat} Running: diff -a /home/hiveptest/54.205.215.38-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/udaf_histogram_numeric.q.out /home/hiveptest/54.205.215.38-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../ql/src/test/results/clientpositive/udaf_histogram_numeric.q.out 9c9 [{x:139.16078431372554,y:255.0},{x:386.1428571428572,y:245.0}] --- [{x:135.0284552845532,y:246.0},{x:381.39370078740143,y:254.0}] {noformat} This does not look connected to this issue. As for org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore and {noformat} java.lang.NullPointerException: null at org.apache.hadoop.hive.metastore.HiveMetaStore.getDelegationToken(HiveMetaStore.java:5596) at org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.getDelegationTokenStr(TestHadoop20SAuthBridge.java:318) at org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.obtainTokenAndAddIntoUGI(TestHadoop20SAuthBridge.java:339) at org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore(TestHadoop20SAuthBridge.java:231) {noformat} {noformat} java.lang.NullPointerException: null at org.apache.hadoop.hive.metastore.HiveMetaStore.getDelegationToken(HiveMetaStore.java:5596) at org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.getDelegationTokenStr(TestHadoop20SAuthBridge.java:318) at org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.access$100(TestHadoop20SAuthBridge.java:62) {noformat} Both of these seem to be errors getting a delegation token, again something not connected with this connection retry issue. As to the last one, org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection, that looks like an issue with derby setup? I'm not certain. It does look like a valid issue in and of itself, but again, not related to this patch. {noformat} java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.duplicateDescriptorException(Unknown Source) at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.addDescriptor(Unknown Source) at org.apache.derby.impl.sql.execute.CreateTableConstantAction.executeConstantAction(Unknown Source) at org.apache.derby.impl.sql.execute.MiscResultSet.open(Unknown Source) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:72) at org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:131) at org.apache.hive.hcatalog.streaming.TestStreaming.init(TestStreaming.java:157) {noformat} RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289961#comment-14289961 ] Sushanth Sowmyan commented on HIVE-9436: [~thejas]/[~hsubramaniyan] : I have a couple of thoughts about moving JDOException retries solely to the metastore: a) Firstly, we have had cases so far where a JDOException invalidates the connection on the metastore side, and retrying from the metastore has not helped. Retrying from the client-side, though, causes a fresh openTransaction() that clears the connection and all history, sometimes by hitting a different HMSHandler, and this causes the retry from client to be more successful than a retry from server. Admittedly, this is more likely because we need to clean up our metastore code to make sure that the retry from the metastore-side handles this properly, and thus, is something we should attempt to improve. b) Second, from a perspective of a loaded metastore, having a metastore thread do retries, thus using up valuable metastore resources/time is more wasteful than having the client do retries. We thus tend to keep our metastore-side retries to a low amount, but the fact that we have client-side retries as well gives us an ability to be fail-fast on the metastore, but retry a large number of times in particular clients if we find the need to do so. Particularly, in HA configurations, I've seen a large number of retries and longer retry-intervals on the client side that allow a connection to go through despite metastore HUPs. c) Thirdly, speaking of HA, retrying on the client-side allows us to hit alternate metastores as well, if configured, if we have scenarios where one metastore is getting bogged down. As you mention, client should ideally only be retrying connection exceptions, but JDOExceptions are frequently the result of connection exceptions raised by the connection pool from the metastore to the db. There is definitely scope for refactoring and improvement in all this, I will look into it further, but for now, this is a simpler bugfix to enable the already-existing regex to work correctly. RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match .\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288623#comment-14288623 ] Sushanth Sowmyan commented on HIVE-9436: Hari, good point - I had a look at RetryingHMSHandler, and the code is rather similar. There seems to be two main differences though: First, on the HMSHandler side, the exact cause-chain of exceptions are still available, and we can compare using instanceof, whereas on the client-side, we have only the first level exception object, and all internal objects are serialized, so we have to look at the messages instead. Secondly, the client has thrift-related exceptions which are acceptable as well. Still, possibly, we should undertake to refactor this into MetastoreUtils with a parameter which asks whether to look strictly at types, or look inside the messages as well, and call that from both sides, and also pass in a list of acceptable exception types. In that scenario, this would be usable all over the place where we need to retry. I would, however, like to tackle the refactor as an improvement rather than in a bug-fix jira if that's okay. RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match .\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288534#comment-14288534 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-9436: - A minor comment since I remember looking at RetryingHMSHandler code for catching Nucleus Exceptions a while back. I think RetryingHMSHandler::invoke and RetryingMetaStoreClient::invoke should have similar exceptions caught(except might be TException or something that might be client specific or HMS handler specific). Is it possible to factorize the catch part and have them called from RetryingMetaStoreClient::invoke and RetryingHMSHandler::invoke to avoid any redundancy/ missing cases. If this is not possible, manually comparing these two functions for the exception type handled might help in resolving any similar missing exception handling cases. Another point is comparing the exception messages for JDOException regex looks a bit hairy to me. Please correct me if this understanding is wrong. Thanks Hari RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match .\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288800#comment-14288800 ] Thejas M Nair commented on HIVE-9436: - [~hsubramaniyan] that is a good point regarding the two places where exception checks and retries are happening. IMO, the metastore client should ideally only retry any issue related to communication with metastore server. HMSHandler is the one that should retry on any temporary errors such as JDOException. As [~sushanth] said we have the real cause available there and we can do an instanceof check over there. It seems like a simple and cleaner change to do the check there. [~sushanth] What do you think ? RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match .\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286475#comment-14286475 ] Thejas M Nair commented on HIVE-9436: - +1 RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match .\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286974#comment-14286974 ] Hive QA commented on HIVE-9436: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12693708/HIVE-9436.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7347 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2470/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2470/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2470/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12693708 - PreCommit-HIVE-TRUNK-Build RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match .\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)