[jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
[ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857994#comment-13857994 ] Samir Ahmic commented on HBASE-7386: Thanks for review add comments [~stack]] In respect of usage and documentation this scripts are following same logic as scripts in bin directory. For example here is output of start-supervisord-hbase.sh command: {code} $ $HBASE_HOME/bin/supervisord/start-supervisord-hbase.sh localhost: hbase-ZK: started hbase-MASTER: started localhost: hbase-RS: started {code} so considering usage this relations are true: start-hbase.sh ~= start-supervisord-hbase.sh stop-hbase.sh ~= stop-supervisord-hbase.sh hbase-daemon.sh ~= hbase-supervisord.sh I agree there is danger that scripts 'rot' but also i believe that this approach can solve number of issues for ops people and generally improve hbase MTTR . What is your suggestion how to address 'rot' scripts issue ? graceful-stop.sh from bin dir can be modified to avoid copy/paste. I will also check rest of scripts to try to reduce amount of copy/paste. migrate_to_supervisord.sh will switch running cluster that was started with scripts from bin directory to use supervisor. It will stop hbase daemons on nodes using hbase-daemon.sh and then it will start then using hbase-supervisord.sh script (revert_to_scripts.sh will do opposite). For master znode is removed by autostart method (patch in HMasterCommandLine.java ) in moment of starting. We have supervisor config autorestart=true so if master process dies unexpectedly supervisor will kick off autorestart and in that moment znode will be removed giving enough time for backup master to become active. Alternative is to craft listener script similar to mail_notification.py that will remove master znode when detects that process is exiting. Regarding RS znodes scripts does not remove them yet. I was thinking about listener script (similar to mail_notification.py) calling 'hbase zkcli rmr RSznode' or we can modify HRegionServerCommadLine.java and add 'autorestart' like in HMasterCommandLine.java. What is your suggestion how to address this ? Basically all this scripts are wrappers around 'supervisord' and 'supervisorctl' commands which are python based, I hope i have clarify some details. Cheers Investigate providing some supervisor support for znode deletion Key: HBASE-7386 URL: https://issues.apache.org/jira/browse/HBASE-7386 Project: HBase Issue Type: Task Components: master, regionserver, scripts Reporter: Gregory Chanan Assignee: stack Priority: Blocker Attachments: HBASE-7386-bin.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch There a couple of JIRAs for deleting the znode on a process failure: HBASE-5844 (RS) HBASE-5926 (Master) which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. There are a few problems associated with this approach, as listed in the below JIRAs: 1) Hides startup output in script https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 2) two hbase processes listed per launched daemon https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 3) Not run by a real supervisor https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 4) Weird output after kill -9 actual process in standalone mode https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 5) Can kill existing RS if called again https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 6) Hides stdout/stderr[6] https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10246) Wrap long lines in recently added source files
[ https://issues.apache.org/jira/browse/HBASE-10246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858009#comment-13858009 ] Gustavo Anatoly commented on HBASE-10246: - You're welcome. Wrap long lines in recently added source files -- Key: HBASE-10246 URL: https://issues.apache.org/jira/browse/HBASE-10246 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Gustavo Anatoly Priority: Trivial Fix For: 0.99.0 Attachments: HBASE-10246.patch Due to ineffective line length detection, several newly added files have long lines in them. The following is a partial list: IntegrationTestTableSnapshotInputFormat.java ClientSideRegionScanner.java TableSnapshotInputFormat.java PerformanceEvaluationScan.java TestTableSnapshotScanner.java TestTableSnapshotInputFormat.java Long lines in these files should be wrapped. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858022#comment-13858022 ] Jean-Marc Spaggiari commented on HBASE-8912: But this doesn't explain why we are calling this twice. You fix might work, but should we not try to figure why we have some many calls? I can give a try to you fix in my environment if you want, and play with my balancers... [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Fix For: 0.94.16 Attachments: 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858037#comment-13858037 ] Feng Honghua commented on HBASE-10227: -- [~gustavoanatoly] : glad to know you're also aware of this bug and show interest for fixing it. :-) Actually this issue had already been fixed in my patch for JIRA-8721 (where the mvcc can't be set zero and need to keep across region move / regionserver failover / balance etc, I noticed and fixed this 'logic' bug as a part of that patch), since JIRA-8721 experienced several times close/reopen/close, I think it's not a good timing to reopen it again. but the exposing of this bug and providing its fix can be opened as a separate JIRA. If you can't schedule time for this fix, maybe I can re-assign to myself and extract the fix for this bug from JIRA-8721's patch to here for discussion/review, what do you think? [~gustavoanatoly] / [~stack] When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10176) Canary#sniff() should close the HTable instance
[ https://issues.apache.org/jira/browse/HBASE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10176: --- Fix Version/s: 0.99.0 Hadoop Flags: Reviewed Integrated to trunk. Thanks for the reviews. Canary#sniff() should close the HTable instance --- Key: HBASE-10176 URL: https://issues.apache.org/jira/browse/HBASE-10176 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0 Attachments: 10176-v1.txt {code} table = new HTable(admin.getConfiguration(), tableDesc.getName()); {code} HTable instance should be closed by the end of the method. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858107#comment-13858107 ] Ted Yu commented on HBASE-10227: There is considerable overlap between this JIRA and HBASE-10241 When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858159#comment-13858159 ] Gustavo Anatoly commented on HBASE-10227: - Thanks for reply, [~fenghh]. :) I can schedule to start on 08/Jan/14, would be possible? This date is good? But if I can't start this task on 08/Jan. Please, re-assign for you [~fenghh]. When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858193#comment-13858193 ] Lars Hofhansl commented on HBASE-8912: -- True. It does not explain two calls. There *is* a race condition, though. Notice that in AssignmentManager, assign calls addToRegionsInTransition, which either creates a new OFFLINE RegionState or sets the RegionState to OFFLINE (unless hijack is passed), but in the actual assignment it is set back to PENDING_OPEN. This can only happen by another thread in the HMaster process interfering in between. So rechecking with the lock on the appropriate RegionState held is a good thing anyway. If you could put it through the wringer that'd be cool. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Fix For: 0.94.16 Attachments: 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-8912: - Attachment: org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt Attaching full log from latest failed run, so it won't get lost. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Fix For: 0.94.16 Attachments: 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858204#comment-13858204 ] Lars Hofhansl commented on HBASE-8912: -- It seems what we really have to do is handling per region ZK events completely an strictly in the order in which they are delivered. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Fix For: 0.94.16 Attachments: 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858205#comment-13858205 ] Lars Hofhansl commented on HBASE-8912: -- A per region queue would be best here, but we can't rewrite the AssignmentManager in 0.94. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Fix For: 0.94.16 Attachments: 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9426) Make custom distributed barrier procedure pluggable
[ https://issues.apache.org/jira/browse/HBASE-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated HBASE-9426: Attachment: HBASE-9426.patch.3 Attaching a new patch which * Combined the previous two patches * Rebased against the trunk * Removed the generated protobuf files * Added a unit test case Make custom distributed barrier procedure pluggable Key: HBASE-9426 URL: https://issues.apache.org/jira/browse/HBASE-9426 Project: HBase Issue Type: Improvement Affects Versions: 0.95.2, 0.94.11 Reporter: Richard Ding Assignee: Richard Ding Attachments: HBASE-9426.patch.1, HBASE-9426.patch.2, HBASE-9426.patch.3 Currently if one wants to implement a custom distributed barrier procedure (e.g., distributed log roll or distributed table flush), the HBase core code needs to be modified in order for the procedure to work. Looking into the snapshot code (especially on region server side), most of the code to enable the procedure are generic life-cycle management (i.e., init, start, stop). We can make this part pluggable. Here is the proposal. Following the coprocessor example, we define two properties: {code} hbase.procedure.regionserver.classes hbase.procedure.master.classes {code} The values for both are comma delimited list of classes. On region server side, the classes implements the following interface: {code} public interface RegionServerProcedureManager { public void initialize(RegionServerServices rss) throws KeeperException; public void start(); public void stop(boolean force) throws IOException; public String getProcedureName(); } {code} While on Master side, the classes implement the interface: {code} public interface MasterProcedureManager { public void initialize(MasterServices master) throws KeeperException, IOException, UnsupportedOperationException; public void stop(String why); public String getProcedureName(); public void execProcedure(ProcedureDescription desc) throws IOException; IOException; } {code} Where the ProcedureDescription is defined as {code} message ProcedureDescription { required string name = 1; required string instance = 2; optional int64 creationTime = 3 [default = 0]; message Property { required string tag = 1; optional string value = 2; } repeated Property props = 4; } {code} A generic API can be defined on HMaster to trigger a procedure: {code} public boolean execProcedure(ProcedureDescription desc) throws IOException; {code} _SnapshotManager_ and _RegionServerSnapshotManager_ are special examples of _MasterProcedureManager_ and _RegionServerProcedureManager_. They will be automatically included (users don't need to specify them in the conf file). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10239) Improve determinism and debugability of TestAccessController
[ https://issues.apache.org/jira/browse/HBASE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858210#comment-13858210 ] Hudson commented on HBASE-10239: FAILURE: Integrated in HBase-0.98 #40 (See [https://builds.apache.org/job/HBase-0.98/40/]) HBASE-10239. Improve determinism and debugability of TestAccessController (apurtell: rev 1553719) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestNamespaceCommands.java Improve determinism and debugability of TestAccessController Key: HBASE-10239 URL: https://issues.apache.org/jira/browse/HBASE-10239 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10239.patch, wip-10239.patch Separate grant and revoke API invocations to static helper methods in SecureTestUtils. Wait for permissions cache updates using a Predicate. Log the API calls, state checks, and waits. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10175) 2-thread ChaosMonkey steps on its own toes
[ https://issues.apache.org/jira/browse/HBASE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858209#comment-13858209 ] Hudson commented on HBASE-10175: FAILURE: Integrated in HBase-0.98 #40 (See [https://builds.apache.org/job/HBase-0.98/40/]) HBASE-10175. 2-thread ChaosMonkey steps on its own toes (Sergey Shelukhin) (apurtell: rev 1553717) * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/CompactRandomRegionOfTableAction.java * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/CompactTableAction.java * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/FlushRandomRegionOfTableAction.java * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/FlushTableAction.java * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/MergeRandomAdjacentRegionsOfTableAction.java * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/MoveRegionsOfTableAction.java * /hbase/branches/0.98/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/SplitRandomRegionOfTableAction.java 2-thread ChaosMonkey steps on its own toes -- Key: HBASE-10175 URL: https://issues.apache.org/jira/browse/HBASE-10175 Project: HBase Issue Type: Improvement Components: test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10175.patch ChaosMonkey with one destructive and one volatility (flush-compact-split-etc.) threads steps on its own toes and logs a lot of exceptions. A simple solution would be to catch most (or all), like NotServingRegionException, and log less (not a full callstack for example, it's not very useful anyway). A more complicated/complementary one would be to keep track which regions the destructive thread affects and use other regions for volatile one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10229) Support OperationAttributes in Increment and Append in Shell
[ https://issues.apache.org/jira/browse/HBASE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858211#comment-13858211 ] Hudson commented on HBASE-10229: FAILURE: Integrated in HBase-0.98 #40 (See [https://builds.apache.org/job/HBase-0.98/40/]) HBASE-10229. Support OperationAttributes in Increment and Append in Shell (Ramkrishna. S. Vasudevan) (apurtell: rev 1553716) * /hbase/branches/0.98/hbase-shell/src/main/ruby/hbase/table.rb * /hbase/branches/0.98/hbase-shell/src/main/ruby/shell.rb * /hbase/branches/0.98/hbase-shell/src/main/ruby/shell/commands/append.rb * /hbase/branches/0.98/hbase-shell/src/main/ruby/shell/commands/incr.rb * /hbase/branches/0.98/hbase-shell/src/test/ruby/hbase/table_test.rb Support OperationAttributes in Increment and Append in Shell Key: HBASE-10229 URL: https://issues.apache.org/jira/browse/HBASE-10229 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10229_1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9426) Make custom distributed barrier procedure pluggable
[ https://issues.apache.org/jira/browse/HBASE-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858212#comment-13858212 ] Richard Ding commented on HBASE-9426: - I have trouble posting the patch to the review board and get No valid separator after the filename was found in the diff header error. Make custom distributed barrier procedure pluggable Key: HBASE-9426 URL: https://issues.apache.org/jira/browse/HBASE-9426 Project: HBase Issue Type: Improvement Affects Versions: 0.95.2, 0.94.11 Reporter: Richard Ding Assignee: Richard Ding Attachments: HBASE-9426.patch.1, HBASE-9426.patch.2, HBASE-9426.patch.3 Currently if one wants to implement a custom distributed barrier procedure (e.g., distributed log roll or distributed table flush), the HBase core code needs to be modified in order for the procedure to work. Looking into the snapshot code (especially on region server side), most of the code to enable the procedure are generic life-cycle management (i.e., init, start, stop). We can make this part pluggable. Here is the proposal. Following the coprocessor example, we define two properties: {code} hbase.procedure.regionserver.classes hbase.procedure.master.classes {code} The values for both are comma delimited list of classes. On region server side, the classes implements the following interface: {code} public interface RegionServerProcedureManager { public void initialize(RegionServerServices rss) throws KeeperException; public void start(); public void stop(boolean force) throws IOException; public String getProcedureName(); } {code} While on Master side, the classes implement the interface: {code} public interface MasterProcedureManager { public void initialize(MasterServices master) throws KeeperException, IOException, UnsupportedOperationException; public void stop(String why); public String getProcedureName(); public void execProcedure(ProcedureDescription desc) throws IOException; IOException; } {code} Where the ProcedureDescription is defined as {code} message ProcedureDescription { required string name = 1; required string instance = 2; optional int64 creationTime = 3 [default = 0]; message Property { required string tag = 1; optional string value = 2; } repeated Property props = 4; } {code} A generic API can be defined on HMaster to trigger a procedure: {code} public boolean execProcedure(ProcedureDescription desc) throws IOException; {code} _SnapshotManager_ and _RegionServerSnapshotManager_ are special examples of _MasterProcedureManager_ and _RegionServerProcedureManager_. They will be automatically included (users don't need to specify them in the conf file). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9426) Make custom distributed barrier procedure pluggable
[ https://issues.apache.org/jira/browse/HBASE-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858213#comment-13858213 ] Hadoop QA commented on HBASE-9426: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org against trunk revision . ATTACHMENT ID: http: {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8295//console This message is automatically generated. Make custom distributed barrier procedure pluggable Key: HBASE-9426 URL: https://issues.apache.org/jira/browse/HBASE-9426 Project: HBase Issue Type: Improvement Affects Versions: 0.95.2, 0.94.11 Reporter: Richard Ding Assignee: Richard Ding Attachments: HBASE-9426.patch.1, HBASE-9426.patch.2, HBASE-9426.patch.3 Currently if one wants to implement a custom distributed barrier procedure (e.g., distributed log roll or distributed table flush), the HBase core code needs to be modified in order for the procedure to work. Looking into the snapshot code (especially on region server side), most of the code to enable the procedure are generic life-cycle management (i.e., init, start, stop). We can make this part pluggable. Here is the proposal. Following the coprocessor example, we define two properties: {code} hbase.procedure.regionserver.classes hbase.procedure.master.classes {code} The values for both are comma delimited list of classes. On region server side, the classes implements the following interface: {code} public interface RegionServerProcedureManager { public void initialize(RegionServerServices rss) throws KeeperException; public void start(); public void stop(boolean force) throws IOException; public String getProcedureName(); } {code} While on Master side, the classes implement the interface: {code} public interface MasterProcedureManager { public void initialize(MasterServices master) throws KeeperException, IOException, UnsupportedOperationException; public void stop(String why); public String getProcedureName(); public void execProcedure(ProcedureDescription desc) throws IOException; IOException; } {code} Where the ProcedureDescription is defined as {code} message ProcedureDescription { required string name = 1; required string instance = 2; optional int64 creationTime = 3 [default = 0]; message Property { required string tag = 1; optional string value = 2; } repeated Property props = 4; } {code} A generic API can be defined on HMaster to trigger a procedure: {code} public boolean execProcedure(ProcedureDescription desc) throws IOException; {code} _SnapshotManager_ and _RegionServerSnapshotManager_ are special examples of _MasterProcedureManager_ and _RegionServerProcedureManager_. They will be automatically included (users don't need to specify them in the conf file). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858214#comment-13858214 ] Lars Hofhansl commented on HBASE-8912: -- It can't just be a race, though, since it happens on master restart as well. [~jmspaggi], do you still have the exact state of the znodes that causes the trouble at master restart? That might be easy to capture in a unit test. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Fix For: 0.94.16 Attachments: 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()
[ https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858215#comment-13858215 ] Andrew Purtell commented on HBASE-10251: Should code in src/test/... carry an expectation it's for other than internal project use? Restore API Compat for PerformanceEvaluation.generateValue() Key: HBASE-10251 URL: https://issues.apache.org/jira/browse/HBASE-10251 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Labels: api_compatibility Observed: A couple of my client tests fail to compile against trunk because the method PerformanceEvaluation.generateValue was removed as part of HBASE-8496. This is an issue because it was used in a number of places, including unit tests. Since we did not explicitly label this API as private, it's ambiguous as to whether this could/should have been used by people writing apps against 0.96. If they used it, then they would be broken upon upgrade to 0.98 and trunk. Potential Solution: The method was renamed to generateData, but the logic is still the same. We can reintroduce it as deprecated in 0.98, as compat shim over generateData. The patch should be a few lines. We may also consider doing so in trunk, but I'd be just as fine with leaving it out. More generally, this raises the question about what other code is in this grey-area, where it is public, is used outside of the package, but is not explicitly labeled with an AudienceInterface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9426) Make custom distributed barrier procedure pluggable
[ https://issues.apache.org/jira/browse/HBASE-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858220#comment-13858220 ] Ted Yu commented on HBASE-9426: --- Generated files should be included for QA to run test suite. Without them, I got: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-client: Compilation failure: Compilation failure: [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[95,61] cannot find symbol [ERROR] symbol : class ProcedureDescription [ERROR] location: class org.apache.hadoop.hbase.protobuf.generated.HBaseProtos [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[135,62] cannot find symbol [ERROR] symbol : class ExecProcedureRequest [ERROR] location: class org.apache.hadoop.hbase.protobuf.generated.MasterProtos [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[136,62] cannot find symbol [ERROR] symbol : class ExecProcedureResponse [ERROR] location: class org.apache.hadoop.hbase.protobuf.generated.MasterProtos [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[137,62] cannot find symbol [ERROR] symbol : class IsProcedureDoneRequest [ERROR] location: class org.apache.hadoop.hbase.protobuf.generated.MasterProtos [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[138,62] cannot find symbol [ERROR] symbol : class IsProcedureDoneResponse [ERROR] location: class org.apache.hadoop.hbase.protobuf.generated.MasterProtos [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:[2053,38] cannot find symbol [ERROR] symbol: class ExecProcedureRequest [ERROR] RpcController controller, ExecProcedureRequest request) [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:[2052,15] cannot find symbol [ERROR] symbol: class ExecProcedureResponse [ERROR] public ExecProcedureResponse execProcedure( [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:[2060,12] cannot find symbol [ERROR] symbol: class IsProcedureDoneRequest [ERROR] IsProcedureDoneRequest request) throws ServiceException { [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:[2059,15] cannot find symbol [ERROR] symbol: class IsProcedureDoneResponse [ERROR] public IsProcedureDoneResponse isProcedureDone(RpcController controller, [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:[2051,8] method does not override or implement a method from a supertype [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:[2058,8] method does not override or implement a method from a supertype [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3014,24] package ProcedureDescription does not exist [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3014,43] cannot find symbol [ERROR] symbol : variable ProcedureDescription [ERROR] location: class org.apache.hadoop.hbase.client.HBaseAdmin [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3022,10] cannot find symbol [ERROR] symbol : class ExecProcedureRequest [ERROR] location: class org.apache.hadoop.hbase.client.HBaseAdmin [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3022,41] cannot find symbol [ERROR] symbol : variable ExecProcedureRequest [ERROR] location: class org.apache.hadoop.hbase.client.HBaseAdmin [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3025,39] cannot find symbol [ERROR] symbol : class ExecProcedureResponse [ERROR] location: class org.apache.hadoop.hbase.client.HBaseAdmin [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3027,13] cannot find symbol [ERROR] symbol: class ExecProcedureResponse [ERROR] public ExecProcedureResponse call() throws ServiceException { [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3080,30] package ProcedureDescription does not exist [ERROR] /Users/tyu/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3080,49] cannot find symbol [ERROR] symbol : variable ProcedureDescription [ERROR] location: class org.apache.hadoop.hbase.client.HBaseAdmin [ERROR]
[jira] [Commented] (HBASE-7226) HRegion.checkAndMutate uses incorrect comparison result for , =, and =
[ https://issues.apache.org/jira/browse/HBASE-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858221#comment-13858221 ] Feng Honghua commented on HBASE-7226: - thanks [~lhofhansl] [~yuzhih...@gmail.com] [~apurtell] [~stack] for prompt response/review/resolving. :-) HRegion.checkAndMutate uses incorrect comparison result for , =, and = --- Key: HBASE-7226 URL: https://issues.apache.org/jira/browse/HBASE-7226 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-7226-trunk-v2.patch, HBASE-7226-trunk.patch, HRegion_HBASE_7226_0.94.2.patch in HRegion.checkAndMutate, incorrect comparison results are used for , =, and =, as below: switch (compareOp) { case LESS: matches = compareResult = 0; // should be '' here break; case LESS_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case EQUAL: matches = compareResult == 0; break; case NOT_EQUAL: matches = compareResult != 0; break; case GREATER_OR_EQUAL: matches = compareResult 0; // should be '=' here break; case GREATER: matches = compareResult = 0; // should be '' here break; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858222#comment-13858222 ] Feng Honghua commented on HBASE-10227: -- [~gustavoanatoly] : sounds good. I will also try to find time to extract the original fix for it from jira-8721 these days to attach here for your reference. When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858224#comment-13858224 ] Feng Honghua commented on HBASE-10227: -- thanks [~yuzhih...@gmail.com] for directing. I just took a look at 10241, seems it together with 8763 is a bit general, while this one is pretty specific , its fix can be a part of those general jiras, so I think it's ok to fix this one separately, opinion? :-) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10229) Support OperationAttributes in Increment and Append in Shell
[ https://issues.apache.org/jira/browse/HBASE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858230#comment-13858230 ] Hudson commented on HBASE-10229: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #27 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/27/]) HBASE-10229-Support OperationAttributes in Increment and Append in Shell (Ram) (ramkrishna: rev 1553623) * /hbase/trunk/hbase-shell/src/main/ruby/hbase/table.rb * /hbase/trunk/hbase-shell/src/main/ruby/shell.rb * /hbase/trunk/hbase-shell/src/main/ruby/shell/commands/append.rb * /hbase/trunk/hbase-shell/src/main/ruby/shell/commands/incr.rb * /hbase/trunk/hbase-shell/src/test/ruby/hbase/table_test.rb Support OperationAttributes in Increment and Append in Shell Key: HBASE-10229 URL: https://issues.apache.org/jira/browse/HBASE-10229 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10229_1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10246) Wrap long lines in recently added source files
[ https://issues.apache.org/jira/browse/HBASE-10246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858231#comment-13858231 ] Hudson commented on HBASE-10246: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #27 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/27/]) HBASE-10246 Wrap long lines in recently added source files (tedyu: rev 1553786) * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestTableSnapshotInputFormat.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestTableSnapshotScanner.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSnapshotInputFormat.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/PerformanceEvaluation.java Wrap long lines in recently added source files -- Key: HBASE-10246 URL: https://issues.apache.org/jira/browse/HBASE-10246 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Gustavo Anatoly Priority: Trivial Fix For: 0.99.0 Attachments: HBASE-10246.patch Due to ineffective line length detection, several newly added files have long lines in them. The following is a partial list: IntegrationTestTableSnapshotInputFormat.java ClientSideRegionScanner.java TableSnapshotInputFormat.java PerformanceEvaluationScan.java TestTableSnapshotScanner.java TestTableSnapshotInputFormat.java Long lines in these files should be wrapped. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10175) 2-thread ChaosMonkey steps on its own toes
[ https://issues.apache.org/jira/browse/HBASE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858227#comment-13858227 ] Hudson commented on HBASE-10175: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #27 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/27/]) HBASE-10175 2-thread ChaosMonkey steps on its own toes (sershe: rev 1553634) * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/CompactRandomRegionOfTableAction.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/CompactTableAction.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/FlushRandomRegionOfTableAction.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/FlushTableAction.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/MergeRandomAdjacentRegionsOfTableAction.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/MoveRegionsOfTableAction.java * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/SplitRandomRegionOfTableAction.java 2-thread ChaosMonkey steps on its own toes -- Key: HBASE-10175 URL: https://issues.apache.org/jira/browse/HBASE-10175 Project: HBase Issue Type: Improvement Components: test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10175.patch ChaosMonkey with one destructive and one volatility (flush-compact-split-etc.) threads steps on its own toes and logs a lot of exceptions. A simple solution would be to catch most (or all), like NotServingRegionException, and log less (not a full callstack for example, it's not very useful anyway). A more complicated/complementary one would be to keep track which regions the destructive thread affects and use other regions for volatile one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10176) Canary#sniff() should close the HTable instance
[ https://issues.apache.org/jira/browse/HBASE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858228#comment-13858228 ] Hudson commented on HBASE-10176: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #27 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/27/]) HBASE-10176 Canary#sniff() should close the HTable instance (tedyu: rev 1553857) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java Canary#sniff() should close the HTable instance --- Key: HBASE-10176 URL: https://issues.apache.org/jira/browse/HBASE-10176 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.99.0 Attachments: 10176-v1.txt {code} table = new HTable(admin.getConfiguration(), tableDesc.getName()); {code} HTable instance should be closed by the end of the method. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10239) Improve determinism and debugability of TestAccessController
[ https://issues.apache.org/jira/browse/HBASE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858229#comment-13858229 ] Hudson commented on HBASE-10239: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #27 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/27/]) HBASE-10239. Improve determinism and debugability of TestAccessController (apurtell: rev 1553718) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestNamespaceCommands.java Improve determinism and debugability of TestAccessController Key: HBASE-10239 URL: https://issues.apache.org/jira/browse/HBASE-10239 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10239.patch, wip-10239.patch Separate grant and revoke API invocations to static helper methods in SecureTestUtils. Wait for permissions cache updates using a Predicate. Log the API calls, state checks, and waits. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858232#comment-13858232 ] Feng Honghua commented on HBASE-10227: -- [~yuzhih...@gmail.com] : actually I changed the (un)serialization of WALEdit to persist mvcc, and use it to correctly recover region's mvcc during reopening, that change is part of jira-8721's patch :-) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858233#comment-13858233 ] Lars Hofhansl commented on HBASE-8912: -- So here's the hypothesis. A region bounces between PENDING_OPEN and FAILED_OPEN. Each time the state changes the AssignmentManager is notified, but when it reads the state, it'll always read the latest state (FAILED_OPEN), thus it gets two notification for FAILED_OPEN. I did one more test. Started HBase and created a table with COMPRESSION='SNAPPY'. Since I do not have SNAPPY installed the region keeps bouncing. Without the patch the HMaster reliably aborts *every* time. With this patch the HMaster continues to stay up, and eventually the region stops bouncing and stays in PENDING_OPEN. (Which means that the master eventually gives up). So the patch definitely fixes one of the issues! Does anybody thinks it will cause other issues? [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Fix For: 0.94.16 Attachments: 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10241) implement mvcc-consistent scanners (across recovery)
[ https://issues.apache.org/jira/browse/HBASE-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858234#comment-13858234 ] Feng Honghua commented on HBASE-10241: -- I also encountered this issue when implementing JIRA-8721, the fix for deletes can mask puts that happen after the delete, and already persisted mvcc in WAL, and then they can be used to recover region's correct mvcc during re-opening. Anyone who has interest can refer it :-) Seems setting mvcc (per hfile) to zero for (minor, arguably) performance benefit can't offset the correctness penalty it brings. Persisting mvcc and survive them across regionservers is a matter of semantic correctness, seems most related issues can be resolved by making this correct, combining mvcc and seqid is not as critical as this correcting. implement mvcc-consistent scanners (across recovery) Key: HBASE-10241 URL: https://issues.apache.org/jira/browse/HBASE-10241 Project: HBase Issue Type: New Feature Components: HFile, regionserver, Scanners Affects Versions: 0.99.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: Consistent scanners.pdf Scanners currently use mvcc for consistency. However, mvcc is lost on server restart, or even a region move. This JIRA is to enable the scanners to transfer mvcc (or seqId, or some other number, see HBASE-8763) between servers. First, client scanner needs to get and store the readpoint. Second, mvcc needs to be preserved in WAL. Third, the mvcc needs to be stored in store files per KV and discarded when not needed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858236#comment-13858236 ] Ted Yu commented on HBASE-10227: MVCC persistence in WAL is a subtask of HBASE-10241 too. When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10243) store mvcc in WAL
[ https://issues.apache.org/jira/browse/HBASE-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858237#comment-13858237 ] Feng Honghua commented on HBASE-10243: -- this feature is implemented as part of [HBASE-8721|https://issues.apache.org/jira/browse/HBASE-8721] :-) store mvcc in WAL - Key: HBASE-10243 URL: https://issues.apache.org/jira/browse/HBASE-10243 Project: HBase Issue Type: Sub-task Components: HFile, regionserver, Scanners Reporter: Sergey Shelukhin Priority: Minor mvcc needs to be stored in WAL. Right now seqId is already stored, so if they are combined, it would be removed or deprecated. Might also happen before this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858238#comment-13858238 ] Feng Honghua commented on HBASE-10227: -- yes, you mean [HBASE-10243|https://issues.apache.org/jira/browse/HBASE-10243]? ...actually I've already implemented it as part of [HBASE-8721|https://issues.apache.org/jira/browse/HBASE-8721], and MVCC persistence is only the prerequisite of the fix for this issue. :-) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10252) Don't write back to WAL/memstore when Increment amount is zero (mostly for query rather than update intention)
Feng Honghua created HBASE-10252: Summary: Don't write back to WAL/memstore when Increment amount is zero (mostly for query rather than update intention) Key: HBASE-10252 URL: https://issues.apache.org/jira/browse/HBASE-10252 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua When user calls Increment by providing amount=0, we don't write the original value to WAL or memstore : adding 0 yields a 'new' value just with the same value as the original one. 1. user provides 0 amount for query rather than for update, this fix is ok; this intention is the most possible case; 2. user provides 0 amount for an update, this fix is also ok : no need to touch back-end value if that value isn't changed; 3. either case we both return correct value, and keep subsequent query results correct : if the 0 amount Increment is the first update, the query is the same for retrieving a 0 value or retrieving nothing; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10252) Don't write back to WAL/memstore when Increment amount is zero (mostly for query rather than update intention)
[ https://issues.apache.org/jira/browse/HBASE-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10252: - Attachment: HBASE-10252-trunk-v0.patch patch for trunk is provided Don't write back to WAL/memstore when Increment amount is zero (mostly for query rather than update intention) -- Key: HBASE-10252 URL: https://issues.apache.org/jira/browse/HBASE-10252 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10252-trunk-v0.patch When user calls Increment by providing amount=0, we don't write the original value to WAL or memstore : adding 0 yields a 'new' value just with the same value as the original one. 1. user provides 0 amount for query rather than for update, this fix is ok; this intention is the most possible case; 2. user provides 0 amount for an update, this fix is also ok : no need to touch back-end value if that value isn't changed; 3. either case we both return correct value, and keep subsequent query results correct : if the 0 amount Increment is the first update, the query is the same for retrieving a 0 value or retrieving nothing; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9426) Make custom distributed barrier procedure pluggable
[ https://issues.apache.org/jira/browse/HBASE-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated HBASE-9426: Attachment: HBASE-9426.patch.4 New patch with generated files. Make custom distributed barrier procedure pluggable Key: HBASE-9426 URL: https://issues.apache.org/jira/browse/HBASE-9426 Project: HBase Issue Type: Improvement Affects Versions: 0.95.2, 0.94.11 Reporter: Richard Ding Assignee: Richard Ding Attachments: HBASE-9426.patch.1, HBASE-9426.patch.2, HBASE-9426.patch.3, HBASE-9426.patch.4 Currently if one wants to implement a custom distributed barrier procedure (e.g., distributed log roll or distributed table flush), the HBase core code needs to be modified in order for the procedure to work. Looking into the snapshot code (especially on region server side), most of the code to enable the procedure are generic life-cycle management (i.e., init, start, stop). We can make this part pluggable. Here is the proposal. Following the coprocessor example, we define two properties: {code} hbase.procedure.regionserver.classes hbase.procedure.master.classes {code} The values for both are comma delimited list of classes. On region server side, the classes implements the following interface: {code} public interface RegionServerProcedureManager { public void initialize(RegionServerServices rss) throws KeeperException; public void start(); public void stop(boolean force) throws IOException; public String getProcedureName(); } {code} While on Master side, the classes implement the interface: {code} public interface MasterProcedureManager { public void initialize(MasterServices master) throws KeeperException, IOException, UnsupportedOperationException; public void stop(String why); public String getProcedureName(); public void execProcedure(ProcedureDescription desc) throws IOException; IOException; } {code} Where the ProcedureDescription is defined as {code} message ProcedureDescription { required string name = 1; required string instance = 2; optional int64 creationTime = 3 [default = 0]; message Property { required string tag = 1; optional string value = 2; } repeated Property props = 4; } {code} A generic API can be defined on HMaster to trigger a procedure: {code} public boolean execProcedure(ProcedureDescription desc) throws IOException; {code} _SnapshotManager_ and _RegionServerSnapshotManager_ are special examples of _MasterProcedureManager_ and _RegionServerProcedureManager_. They will be automatically included (users don't need to specify them in the conf file). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9426) Make custom distributed barrier procedure pluggable
[ https://issues.apache.org/jira/browse/HBASE-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858248#comment-13858248 ] Richard Ding commented on HBASE-9426: - Thanks Ted. Posted on the review board: https://reviews.apache.org/r/16503/ Make custom distributed barrier procedure pluggable Key: HBASE-9426 URL: https://issues.apache.org/jira/browse/HBASE-9426 Project: HBase Issue Type: Improvement Affects Versions: 0.95.2, 0.94.11 Reporter: Richard Ding Assignee: Richard Ding Attachments: HBASE-9426.patch.1, HBASE-9426.patch.2, HBASE-9426.patch.3, HBASE-9426.patch.4 Currently if one wants to implement a custom distributed barrier procedure (e.g., distributed log roll or distributed table flush), the HBase core code needs to be modified in order for the procedure to work. Looking into the snapshot code (especially on region server side), most of the code to enable the procedure are generic life-cycle management (i.e., init, start, stop). We can make this part pluggable. Here is the proposal. Following the coprocessor example, we define two properties: {code} hbase.procedure.regionserver.classes hbase.procedure.master.classes {code} The values for both are comma delimited list of classes. On region server side, the classes implements the following interface: {code} public interface RegionServerProcedureManager { public void initialize(RegionServerServices rss) throws KeeperException; public void start(); public void stop(boolean force) throws IOException; public String getProcedureName(); } {code} While on Master side, the classes implement the interface: {code} public interface MasterProcedureManager { public void initialize(MasterServices master) throws KeeperException, IOException, UnsupportedOperationException; public void stop(String why); public String getProcedureName(); public void execProcedure(ProcedureDescription desc) throws IOException; IOException; } {code} Where the ProcedureDescription is defined as {code} message ProcedureDescription { required string name = 1; required string instance = 2; optional int64 creationTime = 3 [default = 0]; message Property { required string tag = 1; optional string value = 2; } repeated Property props = 4; } {code} A generic API can be defined on HMaster to trigger a procedure: {code} public boolean execProcedure(ProcedureDescription desc) throws IOException; {code} _SnapshotManager_ and _RegionServerSnapshotManager_ are special examples of _MasterProcedureManager_ and _RegionServerProcedureManager_. They will be automatically included (users don't need to specify them in the conf file). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10253) test-patch.sh should state the reason why patch is not tested if patch filename is not recognized
Ted Yu created HBASE-10253: -- Summary: test-patch.sh should state the reason why patch is not tested if patch filename is not recognized Key: HBASE-10253 URL: https://issues.apache.org/jira/browse/HBASE-10253 Project: HBase Issue Type: Test Reporter: Ted Yu Currently if patch filename is not recognized (unknown file extension, e.g.), we would see the following in post back: {code} -1 overall. Here are the results of testing the latest attachment http://issues.apache.org against trunk revision . ATTACHMENT ID: http: {code} In this situation, post back should indicate the reason why patch is not tested. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10252) Don't write back to WAL/memstore when Increment amount is zero (mostly for query rather than update intention)
[ https://issues.apache.org/jira/browse/HBASE-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10252: --- Status: Patch Available (was: Open) Don't write back to WAL/memstore when Increment amount is zero (mostly for query rather than update intention) -- Key: HBASE-10252 URL: https://issues.apache.org/jira/browse/HBASE-10252 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10252-trunk-v0.patch When user calls Increment by providing amount=0, we don't write the original value to WAL or memstore : adding 0 yields a 'new' value just with the same value as the original one. 1. user provides 0 amount for query rather than for update, this fix is ok; this intention is the most possible case; 2. user provides 0 amount for an update, this fix is also ok : no need to touch back-end value if that value isn't changed; 3. either case we both return correct value, and keep subsequent query results correct : if the 0 amount Increment is the first update, the query is the same for retrieving a 0 value or retrieving nothing; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10252) Don't write back to WAL/memstore when Increment amount is zero (mostly for query rather than update intention)
[ https://issues.apache.org/jira/browse/HBASE-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858267#comment-13858267 ] Hadoop QA commented on HBASE-10252: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620750/HBASE-10252-trunk-v0.patch against trunk revision . ATTACHMENT ID: 12620750 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8297//console This message is automatically generated. Don't write back to WAL/memstore when Increment amount is zero (mostly for query rather than update intention) -- Key: HBASE-10252 URL: https://issues.apache.org/jira/browse/HBASE-10252 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10252-trunk-v0.patch When user calls Increment by providing amount=0, we don't write the original value to WAL or memstore : adding 0 yields a 'new' value just with the same value as the original one. 1. user provides 0 amount for query rather than for update, this fix is ok; this intention is the most possible case; 2. user provides 0 amount for an update, this fix is also ok : no need to touch back-end value if that value isn't changed; 3. either case we both return correct value, and keep subsequent query results correct : if the 0 amount Increment is the first update, the query is the same for retrieving a 0 value or retrieving nothing; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10239) Improve determinism and debugability of TestAccessController
[ https://issues.apache.org/jira/browse/HBASE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858270#comment-13858270 ] ramkrishna.s.vasudevan commented on HBASE-10239: What i meant was I was trying to verify a scenario and added a testcase and was using verifyAllowed to see if the test fails. Actually the test was intended not to return any result. But verifyAllowed passed the test. Hence I thought of adding a fail condition even if obj is null. Improve determinism and debugability of TestAccessController Key: HBASE-10239 URL: https://issues.apache.org/jira/browse/HBASE-10239 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 Attachments: 10239.patch, wip-10239.patch Separate grant and revoke API invocations to static helper methods in SecureTestUtils. Wait for permissions cache updates using a Predicate. Log the API calls, state checks, and waits. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-8912: - Priority: Critical (was: Major) Upping to critical. The same effect can also be achieved by replacing the abort with an error log. setOfflineInZooKeeper would then return -1, and the caller would stop the assignment. I think for 0.94 that is best course of action, unless somebody has a better idea. The current state of affairs is unacceptable as it would actually lead to a cascading failure of all HMaster. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-8912: - Attachment: 8912-0.94-alt2.txt Something like this. Tested with a snappy table again. HMaster stays up, region stops bouncing after a while. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)