[jira] Commented: (HBASE-1537) Intra-row scanning
[ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880429#action_12880429 ] stack commented on HBASE-1537: -- Reapply to branch. We seem to have lost it (Or I never applied it in first place). Intra-row scanning -- Key: HBASE-1537 URL: https://issues.apache.org/jira/browse/HBASE-1537 Project: HBase Issue Type: New Feature Reporter: Jonathan Gray Assignee: Andrew Purtell Fix For: 0.21.0 Attachments: HBASE-1537-2.patch, HBASE-1537-v1.patch, HBASE-1537-v2-0.20.3.patch, HBASE-1537-v2.patch To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time. Currently, an entire row must come back as one piece. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1537) Intra-row scanning
[ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880442#action_12880442 ] Andrew Purtell commented on HBASE-1537: --- I noticed this was dropped in the pivot from 0.20_pre_durability to 0.20. But as a matter of fact we have an open internal ticket on this. Unit tests and bugfixes coming. Will submit to reviewboard when ready. Intra-row scanning -- Key: HBASE-1537 URL: https://issues.apache.org/jira/browse/HBASE-1537 Project: HBase Issue Type: New Feature Reporter: Jonathan Gray Assignee: Andrew Purtell Fix For: 0.21.0 Attachments: HBASE-1537-2.patch, HBASE-1537-v1.patch, HBASE-1537-v2-0.20.3.patch, HBASE-1537-v2.patch To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time. Currently, an entire row must come back as one piece. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1537) Intra-row scanning
[ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880448#action_12880448 ] ryan rawson commented on HBASE-1537: stack just said he wanted to reapply to branch? Intra-row scanning -- Key: HBASE-1537 URL: https://issues.apache.org/jira/browse/HBASE-1537 Project: HBase Issue Type: New Feature Reporter: Jonathan Gray Assignee: Andrew Purtell Fix For: 0.21.0 Attachments: HBASE-1537-2.patch, HBASE-1537-v1.patch, HBASE-1537-v2-0.20.3.patch, HBASE-1537-v2.patch To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time. Currently, an entire row must come back as one piece. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1537) Intra-row scanning
[ https://issues.apache.org/jira/browse/HBASE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880452#action_12880452 ] Andrew Purtell commented on HBASE-1537: --- Current situation where this was dropped from 0.20 but is in trunk is a reasonable outcome in my opinion. Intra-row scanning -- Key: HBASE-1537 URL: https://issues.apache.org/jira/browse/HBASE-1537 Project: HBase Issue Type: New Feature Reporter: Jonathan Gray Assignee: Andrew Purtell Fix For: 0.21.0 Attachments: HBASE-1537-2.patch, HBASE-1537-v1.patch, HBASE-1537-v2-0.20.3.patch, HBASE-1537-v2.patch To continue scaling numbers of columns or versions in a single row, we need a mechanism to scan within a row so we can return some columns at a time. Currently, an entire row must come back as one piece. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880454#action_12880454 ] Li Chongxin commented on HBASE-50: -- Sure. We do need a branch for snapshot. Currently I'm working on TRUNK. Once the stuff is ready, I think we can create a new feature branch for commit. What do you think? Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans
Remove sorted() methods from Result now that Gets are Scans --- Key: HBASE-2753 URL: https://issues.apache.org/jira/browse/HBASE-2753 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.21.0 Reporter: Jonathan Gray Fix For: 0.21.0 With the old Get codepath, we used to sometimes get results sent to the client that weren't fully sorted. Now that Gets are Scans, results should always be sorted. Confirm that we always get back sorted results and if so drop the Result.sorted() method and update javadoc accordingly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans
[ https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880509#action_12880509 ] Todd Lipcon commented on HBASE-2753: maybe as an interim we can add: assert Ordering.from(new KVComparator()).isOrdered(result) to the function, and then if we don't see any hudson failures for a week, take out the sort call? Remove sorted() methods from Result now that Gets are Scans --- Key: HBASE-2753 URL: https://issues.apache.org/jira/browse/HBASE-2753 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.21.0 Reporter: Jonathan Gray Fix For: 0.21.0 With the old Get codepath, we used to sometimes get results sent to the client that weren't fully sorted. Now that Gets are Scans, results should always be sorted. Confirm that we always get back sorted results and if so drop the Result.sorted() method and update javadoc accordingly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2754) Deprecate Result#sort; it sends the wrong message (its superfluous anyways)
Deprecate Result#sort; it sends the wrong message (its superfluous anyways) --- Key: HBASE-2754 URL: https://issues.apache.org/jira/browse/HBASE-2754 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.21.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans
[ https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880523#action_12880523 ] stack commented on HBASE-2753: -- I like the idea of adding an assert for a while. Remove sorted() methods from Result now that Gets are Scans --- Key: HBASE-2753 URL: https://issues.apache.org/jira/browse/HBASE-2753 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.21.0 Reporter: Jonathan Gray Fix For: 0.21.0 With the old Get codepath, we used to sometimes get results sent to the client that weren't fully sorted. Now that Gets are Scans, results should always be sorted. Confirm that we always get back sorted results and if so drop the Result.sorted() method and update javadoc accordingly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2754) Deprecate Result#sort; it sends the wrong message (its superfluous anyways)
[ https://issues.apache.org/jira/browse/HBASE-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2754. -- Resolution: Duplicate Duplicate of hbase-2753 Deprecate Result#sort; it sends the wrong message (its superfluous anyways) --- Key: HBASE-2754 URL: https://issues.apache.org/jira/browse/HBASE-2754 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.21.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2624) TestMultiParallelPuts flaky on trunk
[ https://issues.apache.org/jira/browse/HBASE-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880526#action_12880526 ] stack commented on HBASE-2624: -- OK. Committed logging and made it so we expect 8 regions instead of 10 only. Logging should help w/ narrowing where to look in the logs and with what is failing. TestMultiParallelPuts flaky on trunk Key: HBASE-2624 URL: https://issues.apache.org/jira/browse/HBASE-2624 Project: HBase Issue Type: Bug Components: client, regionserver, test Reporter: Todd Lipcon Saw this test failure on my Hudson: org.apache.hadoop.hbase.client.RetriesExhaustedException: Still had 11 puts left after retrying 4 times. at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1428) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:540) at org.apache.hadoop.hbase.TestMultiParallelPut.doATest(TestMultiParallelPut.java:93) at org.apache.hadoop.hbase.TestMultiParallelPut.testParallelPutWithRSAbort(TestMultiParallelPut.java:65) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2755) Duplicate assignment of a region after region server recovery
[ https://issues.apache.org/jira/browse/HBASE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880573#action_12880573 ] Kannan Muthukkaruppan commented on HBASE-2755: -- The race condition is as follows: ProcessRegionOpen (in Master) thread: In process() Note: This is invoked on the master when a Region Server sends a message to the master indicating it has successfully opened a region. {code} #a. Updates META to put newly-available region's location #b. synchronized(master.getRegionManager()) { remove region from regionsInTransition list. } {code} BaseScanner Thread: In checkAssigned(): {code} #c. Gets/reads regions current info from META (potentially stale). The info is stored in serverName (which the hostname/port/startcode). #d. synchronized(master.getRegionManager()) { if (regionIsInTransition() || regionIsFromDeadServer(serverName)) { return; --- we don't come here... } storedInfo = this.master.getServerManager().getServerInfo(serverName); if (storedInfo == null) { Log(Current assignment of region is not valid); set region to unassigned; } } {code} So if sequence of events is #c, #a, #b, #d then we will end up with this double assignment condition. In #d, we don't early exit via the return in the first if check because the region has already been removed from the regionsInTransition list during step #b, and also the serverName (corresponding to the crashed region server) has also been removed from the dead servers map much earlier (@ 10:29:51,317). Duplicate assignment of a region after region server recovery - Key: HBASE-2755 URL: https://issues.apache.org/jira/browse/HBASE-2755 Project: HBase Issue Type: Bug Components: master Reporter: Kannan Muthukkaruppan Priority: Blocker After a region server recovery, some regions may get assigned to duplicate region servers. Note: I am based on a slightly older trunk (prior to the HBASE-2694). Nevertheless, I think HBASE-2694 doesn't address this case. Scenario: * Three region server setup (store285,286,287), with about 500 regions in the table overall. * kill -9 and restart on of the region servers (store286). * The 170 odd regions in the failed region server got assigned out. Two of the regions got assigned to multiple region servers. * Looking at the log entries for one such region, it appears that there is some race condition that happens between the ProcessRegionOpen (a RegionServerOperation) and BaseScanner which causes the BaseScanner to think this region needs to be reassigned. Relevant Logs: Master detects that the server start message (from the restarted RS) is from a server it already knows about, but startcode is different. So, it triggers server recovery. Alternatively, the recovery will be triggered by ZNODE expiry in some cases depending on which ever event (restart of RS or Znode expiry) happens first. After that it does logs splits etc. for the failed RS; it then also removes the old region server/startcode from the deadservers map. {code} 2010-06-17 10:26:06,420 INFO org.apache.hadoop.hbase.master.ServerManager: Server start rejected; we already have 10.138.95.182:60020 registered; existingServer=serverName=store286.xyz.com,60020,1276629467680, load=(requests=22, regions=171, usedHeap=6549, maxHeap=11993), newServer=serverName=store286.xyz.com,60020,1276795566511, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) 2010-06-17 10:26:06,420 INFO org.apache.hadoop.hbase.master.ServerManager: Triggering server recovery; existingServer looks stale 2010-06-17 10:26:06,420 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=store286.xyz.com,60020,1276629467680 to dead servers, added shutdown processing operation ... split log processing... 2010-06-17 10:29:51,317 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation: Removed store286.xyz.com,60020,1276629467680 from deadservers Map {code} What follows is the relevant log snippet for one of the regions that gets double assigned. Master tries to assign the region to store285. At 10:30:20,006, in ProcessRegionOpen, we update META with information about the new assignment. However, just around the same time, BaseScanner processes this entry (at 10:30:20,009), but finds that the region is still assigned to the old region server. There have been some fixes for double assignment in BaseScanner because BaseScanner might be doing a stale read depending on when it started. But looks like there is still another hole left. {code} 2010-06-17 10:30:10,186 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region
[jira] Updated: (HBASE-2755) Duplicate assignment of a region after region server recovery
[ https://issues.apache.org/jira/browse/HBASE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Muthukkaruppan updated HBASE-2755: - Description: After a region server recovery, some regions may get assigned to duplicate region servers. Note: I am based on a slightly older trunk (prior to the HBASE-2694). Nevertheless, I think HBASE-2694 doesn't address this case. Scenario: * Three region server setup (store285,286,287), with about 500 regions in the table overall. * kill -9 and restart one of the region servers (store286). * The 170 odd regions in the failed region server got assigned out. But two of the regions got assigned to multiple region servers. * Looking at the log entries for one such region, it appears that there is some race condition that happens between the ProcessRegionOpen (a RegionServerOperation) and BaseScanner which causes the BaseScanner to think this region needs to be reassigned. Relevant Logs: Master detects that the server start message (from the restarted RS) is from a server it already knows about, but startcode is different. So, it triggers server recovery. Alternatively, the recovery will be triggered by ZNODE expiry in some cases depending on which ever event (restart of RS or Znode expiry) happens first. After that it does logs splits etc. for the failed RS; it then also removes the old region server/startcode from the deadservers map. {code} 2010-06-17 10:26:06,420 INFO org.apache.hadoop.hbase.master.ServerManager: Server start rejected; we already have 10.138.95.182:60020 registered; existingServer=serverName=store286.xyz.com,60020,1276629467680, load=(requests=22, regions=171, usedHeap=6549, maxHeap=11993), newServer=serverName=store286.xyz.com,60020,1276795566511, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) 2010-06-17 10:26:06,420 INFO org.apache.hadoop.hbase.master.ServerManager: Triggering server recovery; existingServer looks stale 2010-06-17 10:26:06,420 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=store286.xyz.com,60020,1276629467680 to dead servers, added shutdown processing operation ... split log processing... 2010-06-17 10:29:51,317 DEBUG org.apache.hadoop.hbase.master.RegionServerOperation: Removed store286.xyz.com,60020,1276629467680 from deadservers Map {code} What follows is the relevant log snippet for one of the regions that gets double assigned. Master tries to assign the region to store285. At 10:30:20,006, in ProcessRegionOpen, we update META with information about the new assignment. However, just around the same time, BaseScanner processes this entry (at 10:30:20,009), but finds that the region is still assigned to the old region server. There have been some fixes for double assignment in BaseScanner because BaseScanner might be doing a stale read depending on when it started. But looks like there is still another hole left. {code} 2010-06-17 10:30:10,186 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region test1,930176,1276657884012.acff724037d739bab9af61e3edef0cc9. to store285.xyz.com,60020,1276629468460 2010-06-17 10:30:11,701 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_PROCESS_OPEN: test1,930176,1276657884012.acff724037d739bab9af61e3edef0cc9. from store285.xyz.com,60020,1276629468460; 8 of 2010-06-17 10:30:12,800 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_PROCESS_OPEN: test1,930176,1276657884012.acff724037d739bab9af61e3edef0cc9. from store285.xyz.com,60020,1276629468460; 7 of 2010-06-17 10:30:13,905 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_PROCESS_OPEN: test1,930176,1276657884012.acff724037d739bab9af61e3edef0cc9. from store285.xyz.com,60020,1276629468460; 6 of ... 2010-06-17 10:30:20,001 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN: test1,930176,1276657884012.acff724037d739bab9af61e3edef0cc9. from store285.xyz.com,60020,1276629468460; 1 of 3 2010-06-17 10:30:20,001 INFO org.apache.hadoop.hbase.master.RegionServerOperation: test1,930176,1276657884012.acff724037d739bab9af61e3edef0cc9. open on store285.xyz.com,60020,1276629468460 2010-06-17 10:30:20,006 INFO org.apache.hadoop.hbase.master.RegionServerOperation: Updated row test1,930176,1276657884012.acff724037d739bab9af61e3edef0cc9. in region .META.,,1 with startcode=1276629468460, server=store285.xyz.com:60020 2010-06-17 10:30:20,009 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of test1,930176,1276657884012.acff724037d739bab9af61e3edef0cc9. is not valid; serverAddress=store286.xyz.com:60020, startCode=1276629467680 unknown. {code} At this point BaseScanner calls this.master.getRegionManager().setUnassigned(info, true) to set the region to be unassigned (even though it is assigned to store285). And later, this region is given to
[jira] Updated: (HBASE-2756) MetaScanner.metaScan doesn't take configurations
[ https://issues.apache.org/jira/browse/HBASE-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-2756: -- Attachment: HBASE-2756.patch Patch that passes the configuration object to HTable, and that adds a unit test for the multi clusters case. It also requires a fix that will be included soon in HBASE-2741. MetaScanner.metaScan doesn't take configurations Key: HBASE-2756 URL: https://issues.apache.org/jira/browse/HBASE-2756 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.21.0 Attachments: HBASE-2756.patch HBASE-2468 added a bunch of code in MetaScanner.metaScan, and this particular line is wrong: {code} +// if row is not null, we want to use the startKey of the row's region as +// the startRow for the meta scan. +if (row != null) { + HTable metaTable = new HTable(HConstants.META_TABLE_NAME); + Result startRowResult = metaTable.getRowOrBefore(startRow, + HConstants.CATALOG_FAMILY); + if (startRowResult == null) { {code} If the user specified any new configuration in his code, like ZK's parent znode, then it will miss it. This should use the HTable constructor that takes a Configuration and pass the one it already has. I found this with my TestReplication test in HBASE-2223. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2720) TestFromClientSide fails for client region cache prewarm on Hudson
[ https://issues.apache.org/jira/browse/HBASE-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880578#action_12880578 ] stack commented on HBASE-2720: -- The apache hudson failure http://hudson.zones.apache.org/hudson/view/HBase/job/HBase-TRUNK/1339/testReport/ seems to be because of HBASE-2757. There is not enough in the meta for the cache to pre-warm itself on. TestFromClientSide fails for client region cache prewarm on Hudson -- Key: HBASE-2720 URL: https://issues.apache.org/jira/browse/HBASE-2720 Project: HBase Issue Type: Bug Components: client, test Affects Versions: 0.21.0 Environment: hudson Reporter: Mingjie Lai Assignee: Mingjie Lai Fix For: 0.21.0 TestFromClientSide failed by HBASE-2468 patch: http://hudson.zones.apache.org/hudson/job/HBase-TRUNK/1322/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide/testRegionCachePreWarm/ It seems the number of actual cached regions was less than expected (as configured) on hudson. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.