[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651643#comment-15651643 ] Hive QA commented on HIVE-15090: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12838189/HIVE-15090.3-branch-2.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 10462 tests executed *Failed tests:* {noformat} TestJdbcWithMiniHA - did not produce a TEST-*.xml file (likely timed out) (batchId=494) TestJdbcWithMiniMr - did not produce a TEST-*.xml file (likely timed out) (batchId=491) TestMsgBusConnection - did not produce a TEST-*.xml file (likely timed out) (batchId=362) TestOperationLoggingAPIWithTez - did not produce a TEST-*.xml file (likely timed out) (batchId=484) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats (batchId=92) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata (batchId=109) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 (batchId=87) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_schema_evol_3a (batchId=97) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_null_optimizer (batchId=154) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in (batchId=99) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_ppd_basic (batchId=521) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner (batchId=539) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_ppd_basic (batchId=187) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_ppd_schema_evol_3a (batchId=198) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_between_in (batchId=199) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant (batchId=183) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_complex_all (batchId=200) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_between_in (batchId=233) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching (batchId=492) org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd (batchId=487) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2049/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2049/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2049/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12838189 - PreCommit-HIVE-Build > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.3.0, 2.1.0, 2.0.1, 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 2.2.0 > > Attachments: HIVE-15090.2-branch-2.1.patch, HIVE-15090.2.patch, > HIVE-15090.2.patch, HIVE-15090.3-branch-2.1.patch, HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651499#comment-15651499 ] Peter Vary commented on HIVE-15090: --- [~thejas] You are thinking like me :) ??Defining the exceptions that can be thrown by DelegationTokenStore that are not fatal and can be ignored.?? I have chickened out of this since it is a compatibility change - at least in my unpracticed view. If I change the DelegationTokenStore interface to add the new type of exception, then if someone has implemented his own DelegationTokenStore, it has to be changed to work with the new version of hive. ??Updating DBTokenStore to not thrown what could be transient errors, and just log those?? ExpiredTokenRemover uses the following DelegationTokenStore methods: updateMasterKey, removeMasterKey, getAllDelegationTokenIdentifiers, removeToken, getToken. Changing the behavior of these methods could cause unexpected results. So I leaned for your first suggestion, but HIVE-13090 was a longstanding issue (introduced at Dec 7, 2011) with very visible effects and with only two jiras for it. I thought it is not that common to warrant the compatibility change. What do you think [~thejas]? Is it worth to change the DelegationTokenStore interface? You have more experience with Hive than me. Thanks, Peter > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.3.0, 2.1.0, 2.0.1, 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 2.2.0 > > Attachments: HIVE-15090.2-branch-2.1.patch, HIVE-15090.2.patch, > HIVE-15090.2.patch, HIVE-15090.3-branch-2.1.patch, HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651344#comment-15651344 ] Thejas M Nair commented on HIVE-15090: -- Some options are - * Defining the exceptions that can be thrown by DelegationTokenStore that are not fatal and can be ignored. * Updating DBTokenStore to not thrown what could be transient errors, and just log those [~pvary] What are your thoughts ? > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.3.0, 2.1.0, 2.0.1, 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 2.2.0 > > Attachments: HIVE-15090.2-branch-2.1.patch, HIVE-15090.2.patch, > HIVE-15090.2.patch, HIVE-15090.3-branch-2.1.patch, HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651254#comment-15651254 ] Peter Vary commented on HIVE-15090: --- Hi [~thejas], I was thinking about the same lines as you, but finally decided against it. My reasoning was that the METASTORE_CLUSTER_DELEGATION_TOKEN_STORE_CLS is a configuration variable and could be set by the administrator to any class, that is why we will never be able to handle every future exception here correctly. So finally I decided to stick to a clean, easily understandable solution rather than create a partial solution for the DBTokenStore only. Since this one is already committed to master, I think if we find a better approach I think we should open another jira to handle it. I would be happy to help out there too. Thanks again for taking a look at this! Peter > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.3.0, 2.1.0, 2.0.1, 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 2.2.0 > > Attachments: HIVE-15090.2-branch-2.1.patch, HIVE-15090.2.patch, > HIVE-15090.2.patch, HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648745#comment-15648745 ] Thejas M Nair commented on HIVE-15090: -- We shouldn't be catching all throwables, that will include errors like OOM where retrying doesn't make sense. Can we limit it to certain exceptions that can be realistically thrown ? > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.3.0, 2.1.0, 2.0.1, 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 2.2.0 > > Attachments: HIVE-15090.2-branch-2.1.patch, HIVE-15090.2.patch, > HIVE-15090.2.patch, HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647466#comment-15647466 ] Hive QA commented on HIVE-15090: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12837943/HIVE-15090.2-branch-2.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 66 failed/errored test(s), 10462 tests executed *Failed tests:* {noformat} TestJdbcWithMiniHA - did not produce a TEST-*.xml file (likely timed out) (batchId=494) TestJdbcWithMiniMr - did not produce a TEST-*.xml file (likely timed out) (batchId=491) TestMsgBusConnection - did not produce a TEST-*.xml file (likely timed out) (batchId=362) TestOperationLoggingAPIWithTez - did not produce a TEST-*.xml file (likely timed out) (batchId=484) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats (batchId=92) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata (batchId=109) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_ppd_schema_evol_3a (batchId=97) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_part (batchId=68) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_table (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_part (batchId=142) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_table (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_nonvec_fetchwork_part (batchId=84) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_nonvec_fetchwork_table (batchId=65) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_part (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_table (batchId=126) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_vec_mapwork_part (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_vec_mapwork_table (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part (batchId=136) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_table (batchId=58) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vec_mapwork_part (batchId=112) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vec_mapwork_table (batchId=43) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_part (batchId=6) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_table (batchId=132) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_null_optimizer (batchId=154) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in (batchId=99) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_ppd_basic (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_part (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_table (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_part (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_table (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_nonvec_fetchwork_part (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_nonvec_fetchwork_table (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_part (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_nonvec_mapwork_table (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_vec_mapwork_part (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_orc_vec_mapwork_table (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_part (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_table (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_text_vec_mapwork_part (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_text_vec_mapwork_table (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_part (batchId=521) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_schema_evol_text_vecrow_mapwork_table (batchId=521)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15643551#comment-15643551 ] Peter Vary commented on HIVE-15090: --- The failed tests are flaky. See: - HIVE-15084 Flaky test: TestMiniTezCliDriver:explainanalyze_2, 3, 4 - HIVE-15115 Flaky test: TestMiniLlapLocalCliDriver.testCliDriver union_fast_stats - HIVE-15116 Flaky test: TestMiniLlapLocalCliDriver.testCliDriver join_acid_non_acid > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Aihua Xu > Attachments: HIVE-15090.2.patch, HIVE-15090.2.patch, HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637991#comment-15637991 ] Hive QA commented on HIVE-15090: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12837282/HIVE-15090.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1973/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1973/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1973/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12837282 - PreCommit-HIVE-Build > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Aihua Xu > Attachments: HIVE-15090.2.patch, HIVE-15090.2.patch, HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615594#comment-15615594 ] Aihua Xu commented on HIVE-15090: - +1. The patch looks good to me. > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-15090.2.patch, HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15090) Temporary DB failure can stop ExpiredTokenRemover thread
[ https://issues.apache.org/jira/browse/HIVE-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615471#comment-15615471 ] Peter Vary commented on HIVE-15090: --- I have tested with the following configuration: {code} hive.cluster.delegation.token.store.class org.apache.hadoop.hive.thrift.DBTokenStore hive.cluster.delegation.token.gc-interval 10 {code} Started the Metastore, and after a while I have stopped the database. When the database was turned off the metastore logged the {{ExpiredTokenRemover thread received unexpected exception}} message, after the database restarted the messages are stopped, and everything become normal. In the debugger I have verified the thread is running. > Temporary DB failure can stop ExpiredTokenRemover thread > > > Key: HIVE-15090 > URL: https://issues.apache.org/jira/browse/HIVE-15090 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-15090.patch > > > In HIVE-13090 we decided that we should not close the metastore if there is > an unexpected exception during the expired token removal process, but that > fix leaves a running metastore without ExpiredTokenRemover thread. > To fix this I will move the catch inside the running loop, and hope the > thread could recover from the exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)