[jira] [Commented] (PHOENIX-4027) Mark index as disabled during partial rebuild after configurable amount of time
[ https://issues.apache.org/jira/browse/PHOENIX-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171208#comment-16171208 ] Rajeshbabu Chintaguntla commented on PHOENIX-4027: -- [~jamestaylor] I ran the tests and they are passing with the addendum. What about increasing the default threshold to 1 hour(currently 30min) or more than it because sometimes fixing the HBase inconsistencies might take more time. Sometimes rebuilding index also take time. > Mark index as disabled during partial rebuild after configurable amount of > time > --- > > Key: PHOENIX-4027 > URL: https://issues.apache.org/jira/browse/PHOENIX-4027 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Samarth Jain > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4027_addendum_2.patch, > PHOENIX-4027_addendum.patch, PHOENIX-4027.patch > > > Instead of marking an index as permanently disabled in the partial index > rebuilder when a failure occurs, we should let it try again up to a > configurable amount of time. The reason is that the fail-fast approach with > the lower RPC timeout will continue to cause a failure until the index region > can be written to. This will allow us to ride out region moves without a long > RPC time out and thus without holding handler threads for long periods of > time. We can base the failure on the INDEX_DISABLE_TIMESTAMP value of an > index as we walk through the scan results here in MetaDataRegionObserver. : > {code} > do { > results.clear(); > hasMore = scanner.next(results); > if (results.isEmpty()) break; > Result r = Result.create(results); > byte[] disabledTimeStamp = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > > PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES); > byte[] indexState = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > PhoenixDatabaseMetaData.INDEX_STATE_BYTES); > if (disabledTimeStamp == null || disabledTimeStamp.length > == 0) { > continue; > } > // TODO: if disabledTimeStamp - > System.currentTimeMillis() > configurableAmount > // then disable the index. > {code} > I'd propose we allow 30 minutes to get an index back online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4027) Mark index as disabled during partial rebuild after configurable amount of time
[ https://issues.apache.org/jira/browse/PHOENIX-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168347#comment-16168347 ] James Taylor commented on PHOENIX-4027: --- That sounds reasonable, [~rajeshbabu]. How about kicking off a pre-commit run to make sure we don't need to update any tests? > Mark index as disabled during partial rebuild after configurable amount of > time > --- > > Key: PHOENIX-4027 > URL: https://issues.apache.org/jira/browse/PHOENIX-4027 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Samarth Jain > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4027_addendum_2.patch, > PHOENIX-4027_addendum.patch, PHOENIX-4027.patch > > > Instead of marking an index as permanently disabled in the partial index > rebuilder when a failure occurs, we should let it try again up to a > configurable amount of time. The reason is that the fail-fast approach with > the lower RPC timeout will continue to cause a failure until the index region > can be written to. This will allow us to ride out region moves without a long > RPC time out and thus without holding handler threads for long periods of > time. We can base the failure on the INDEX_DISABLE_TIMESTAMP value of an > index as we walk through the scan results here in MetaDataRegionObserver. : > {code} > do { > results.clear(); > hasMore = scanner.next(results); > if (results.isEmpty()) break; > Result r = Result.create(results); > byte[] disabledTimeStamp = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > > PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES); > byte[] indexState = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > PhoenixDatabaseMetaData.INDEX_STATE_BYTES); > if (disabledTimeStamp == null || disabledTimeStamp.length > == 0) { > continue; > } > // TODO: if disabledTimeStamp - > System.currentTimeMillis() > configurableAmount > // then disable the index. > {code} > I'd propose we allow 30 minutes to get an index back online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4027) Mark index as disabled during partial rebuild after configurable amount of time
[ https://issues.apache.org/jira/browse/PHOENIX-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167715#comment-16167715 ] Rajeshbabu Chintaguntla commented on PHOENIX-4027: -- With this patch partial index rebuild make the index disabled forever in very easily in below situations. 1) When we write the past data with row timestamp columns 2) Sometimes any region inconsistencies introduced take more than 30 mins When the data is huge creating index or rebuilding complete index might take hours or days. In such cases it's better to rebuild the index in intervals or batches than completely disabling. [~samarthjain] [~jamestaylor] whyt? {noformat} if (EnvironmentEdgeManager.currentTimeMillis() - Math.abs(indexDisableTimestamp) > indexDisableTimestampThreshold) { /* * It has been too long since the index has been disabled and any future * attempts to reenable it likely will fail. So we are going to mark the * index as disabled and set the index disable timestamp to 0 so that the * rebuild task won't pick up this index again for rebuild. */ try { IndexUtil.updateIndexState(conn, indexTableFullName, PIndexState.DISABLE, 0l); LOG.error("Unable to rebuild index " + indexTableFullName + ". Won't attempt again since index disable timestamp is older than current time by " + indexDisableTimestampThreshold + " milliseconds. Manual intervention needed to re-build the index"); } catch (Throwable ex) { LOG.error( "Unable to mark index " + indexTableFullName + " as disabled.", ex); } continue; // don't attempt another rebuild irrespective of whether // updateIndexState worked or not } {noformat} > Mark index as disabled during partial rebuild after configurable amount of > time > --- > > Key: PHOENIX-4027 > URL: https://issues.apache.org/jira/browse/PHOENIX-4027 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Samarth Jain > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4027_addendum.patch, PHOENIX-4027.patch > > > Instead of marking an index as permanently disabled in the partial index > rebuilder when a failure occurs, we should let it try again up to a > configurable amount of time. The reason is that the fail-fast approach with > the lower RPC timeout will continue to cause a failure until the index region > can be written to. This will allow us to ride out region moves without a long > RPC time out and thus without holding handler threads for long periods of > time. We can base the failure on the INDEX_DISABLE_TIMESTAMP value of an > index as we walk through the scan results here in MetaDataRegionObserver. : > {code} > do { > results.clear(); > hasMore = scanner.next(results); > if (results.isEmpty()) break; > Result r = Result.create(results); > byte[] disabledTimeStamp = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > > PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES); > byte[] indexState = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > PhoenixDatabaseMetaData.INDEX_STATE_BYTES); > if (disabledTimeStamp == null || disabledTimeStamp.length > == 0) { > continue; > } > // TODO: if disabledTimeStamp - > System.currentTimeMillis() > configurableAmount > // then disable the index. > {code} > I'd propose we allow 30 minutes to get an index back online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4027) Mark index as disabled during partial rebuild after configurable amount of time
[ https://issues.apache.org/jira/browse/PHOENIX-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135040#comment-16135040 ] Ankit Singhal commented on PHOENIX-4027: [~samarthjain], is this can be resolved? > Mark index as disabled during partial rebuild after configurable amount of > time > --- > > Key: PHOENIX-4027 > URL: https://issues.apache.org/jira/browse/PHOENIX-4027 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Samarth Jain > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4027_addendum.patch, PHOENIX-4027.patch > > > Instead of marking an index as permanently disabled in the partial index > rebuilder when a failure occurs, we should let it try again up to a > configurable amount of time. The reason is that the fail-fast approach with > the lower RPC timeout will continue to cause a failure until the index region > can be written to. This will allow us to ride out region moves without a long > RPC time out and thus without holding handler threads for long periods of > time. We can base the failure on the INDEX_DISABLE_TIMESTAMP value of an > index as we walk through the scan results here in MetaDataRegionObserver. : > {code} > do { > results.clear(); > hasMore = scanner.next(results); > if (results.isEmpty()) break; > Result r = Result.create(results); > byte[] disabledTimeStamp = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > > PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES); > byte[] indexState = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > PhoenixDatabaseMetaData.INDEX_STATE_BYTES); > if (disabledTimeStamp == null || disabledTimeStamp.length > == 0) { > continue; > } > // TODO: if disabledTimeStamp - > System.currentTimeMillis() > configurableAmount > // then disable the index. > {code} > I'd propose we allow 30 minutes to get an index back online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4027) Mark index as disabled during partial rebuild after configurable amount of time
[ https://issues.apache.org/jira/browse/PHOENIX-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090639#comment-16090639 ] Hudson commented on PHOENIX-4027: - FAILURE: Integrated in Jenkins build Phoenix-master #1692 (See [https://builds.apache.org/job/Phoenix-master/1692/]) PHOENIX-4027 Addendum - move testRebuildIndexConnectionProperties to its (samarth: rev 48341ae3fcc645aa7f559ae98606c522c563268d) * (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/PhoenixRuntimeIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/util/QueryUtil.java * (add) phoenix-core/src/it/java/org/apache/phoenix/end2end/RebuildIndexConnectionPropsIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java > Mark index as disabled during partial rebuild after configurable amount of > time > --- > > Key: PHOENIX-4027 > URL: https://issues.apache.org/jira/browse/PHOENIX-4027 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Samarth Jain > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4027_addendum.patch, PHOENIX-4027.patch > > > Instead of marking an index as permanently disabled in the partial index > rebuilder when a failure occurs, we should let it try again up to a > configurable amount of time. The reason is that the fail-fast approach with > the lower RPC timeout will continue to cause a failure until the index region > can be written to. This will allow us to ride out region moves without a long > RPC time out and thus without holding handler threads for long periods of > time. We can base the failure on the INDEX_DISABLE_TIMESTAMP value of an > index as we walk through the scan results here in MetaDataRegionObserver. : > {code} > do { > results.clear(); > hasMore = scanner.next(results); > if (results.isEmpty()) break; > Result r = Result.create(results); > byte[] disabledTimeStamp = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > > PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES); > byte[] indexState = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > PhoenixDatabaseMetaData.INDEX_STATE_BYTES); > if (disabledTimeStamp == null || disabledTimeStamp.length > == 0) { > continue; > } > // TODO: if disabledTimeStamp - > System.currentTimeMillis() > configurableAmount > // then disable the index. > {code} > I'd propose we allow 30 minutes to get an index back online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4027) Mark index as disabled during partial rebuild after configurable amount of time
[ https://issues.apache.org/jira/browse/PHOENIX-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089551#comment-16089551 ] Hadoop QA commented on PHOENIX-4027: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12877530/PHOENIX-4027_addendum.patch against master branch at commit 18ea6edc00029e7e900ad95562fa73da0e5ccf51. ATTACHMENT ID: 12877530 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 51 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +serverProps.put(QueryServices.EXTRA_JDBC_ARGUMENTS_ATTRIB, QueryServicesOptions.DEFAULT_EXTRA_JDBC_ARGUMENTS); +serverProps.put(QueryServices.INDEX_REBUILD_RPC_RETRIES_COUNTER, Long.toString(NUM_RPC_RETRIES)); + MetaDataRegionObserver.getRebuildIndexConnection(hbaseTestUtil.getMiniHBaseCluster().getConfiguration())) { +ConnectionQueryServices rebuildQueryServices = rebuildIndexConnection.getQueryServices(); +public static final int DEFAULT_INDEX_REBUILD_RPC_RETRIES_COUNTER = 0; // no retries at rpc level {color:red}-1 core tests{color}. The patch failed these unit tests: ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/1220//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/1220//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/1220//console This message is automatically generated. > Mark index as disabled during partial rebuild after configurable amount of > time > --- > > Key: PHOENIX-4027 > URL: https://issues.apache.org/jira/browse/PHOENIX-4027 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Samarth Jain > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4027_addendum.patch, PHOENIX-4027.patch > > > Instead of marking an index as permanently disabled in the partial index > rebuilder when a failure occurs, we should let it try again up to a > configurable amount of time. The reason is that the fail-fast approach with > the lower RPC timeout will continue to cause a failure until the index region > can be written to. This will allow us to ride out region moves without a long > RPC time out and thus without holding handler threads for long periods of > time. We can base the failure on the INDEX_DISABLE_TIMESTAMP value of an > index as we walk through the scan results here in MetaDataRegionObserver. : > {code} > do { > results.clear(); > hasMore = scanner.next(results); > if (results.isEmpty()) break; > Result r = Result.create(results); > byte[] disabledTimeStamp = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > > PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES); > byte[] indexState = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > PhoenixDatabaseMetaData.INDEX_STATE_BYTES); > if (disabledTimeStamp == null || disabledTimeStamp.length > == 0) { > continue; > } > // TODO: if disabledTimeStamp - > System.currentTimeMillis() > configurableAmount > // then disable the index. > {code} > I'd propose we allow 30 minutes to get an index back online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4027) Mark index as disabled during partial rebuild after configurable amount of time
[ https://issues.apache.org/jira/browse/PHOENIX-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1600#comment-1600 ] Hudson commented on PHOENIX-4027: - FAILURE: Integrated in Jenkins build Phoenix-master #1689 (See [https://builds.apache.org/job/Phoenix-master/1689/]) PHOENIX-4027 Mark index as disabled during partial rebuild after (samarth: rev d541d6f2875a590580e8ccf05f26795083b06658) * (edit) phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataRegionObserver.java * (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/PhoenixRuntimeIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java * (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/index/MutableIndexFailureIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java > Mark index as disabled during partial rebuild after configurable amount of > time > --- > > Key: PHOENIX-4027 > URL: https://issues.apache.org/jira/browse/PHOENIX-4027 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Samarth Jain > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4027.patch > > > Instead of marking an index as permanently disabled in the partial index > rebuilder when a failure occurs, we should let it try again up to a > configurable amount of time. The reason is that the fail-fast approach with > the lower RPC timeout will continue to cause a failure until the index region > can be written to. This will allow us to ride out region moves without a long > RPC time out and thus without holding handler threads for long periods of > time. We can base the failure on the INDEX_DISABLE_TIMESTAMP value of an > index as we walk through the scan results here in MetaDataRegionObserver. : > {code} > do { > results.clear(); > hasMore = scanner.next(results); > if (results.isEmpty()) break; > Result r = Result.create(results); > byte[] disabledTimeStamp = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > > PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES); > byte[] indexState = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > PhoenixDatabaseMetaData.INDEX_STATE_BYTES); > if (disabledTimeStamp == null || disabledTimeStamp.length > == 0) { > continue; > } > // TODO: if disabledTimeStamp - > System.currentTimeMillis() > configurableAmount > // then disable the index. > {code} > I'd propose we allow 30 minutes to get an index back online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4027) Mark index as disabled during partial rebuild after configurable amount of time
[ https://issues.apache.org/jira/browse/PHOENIX-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088341#comment-16088341 ] James Taylor commented on PHOENIX-4027: --- +1. Thanks, [~samarthjain]. > Mark index as disabled during partial rebuild after configurable amount of > time > --- > > Key: PHOENIX-4027 > URL: https://issues.apache.org/jira/browse/PHOENIX-4027 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: Samarth Jain > Attachments: PHOENIX-4027.patch > > > Instead of marking an index as permanently disabled in the partial index > rebuilder when a failure occurs, we should let it try again up to a > configurable amount of time. The reason is that the fail-fast approach with > the lower RPC timeout will continue to cause a failure until the index region > can be written to. This will allow us to ride out region moves without a long > RPC time out and thus without holding handler threads for long periods of > time. We can base the failure on the INDEX_DISABLE_TIMESTAMP value of an > index as we walk through the scan results here in MetaDataRegionObserver. : > {code} > do { > results.clear(); > hasMore = scanner.next(results); > if (results.isEmpty()) break; > Result r = Result.create(results); > byte[] disabledTimeStamp = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > > PhoenixDatabaseMetaData.INDEX_DISABLE_TIMESTAMP_BYTES); > byte[] indexState = > r.getValue(PhoenixDatabaseMetaData.TABLE_FAMILY_BYTES, > PhoenixDatabaseMetaData.INDEX_STATE_BYTES); > if (disabledTimeStamp == null || disabledTimeStamp.length > == 0) { > continue; > } > // TODO: if disabledTimeStamp - > System.currentTimeMillis() > configurableAmount > // then disable the index. > {code} > I'd propose we allow 30 minutes to get an index back online. -- This message was sent by Atlassian JIRA (v6.4.14#64029)