[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900319#action_12900319 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review957 --- 1. Rename SnapshotTracker to SnapshotSentinel 2. Write a script (add_snapshot_family.rb) to add snapshot family for META and remove method HMaster.addSnapshotFamily. The script is not tested yet (how?) - Chongxin Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900321#action_12900321 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/ --- (Updated 2010-08-19 08:35:37.043957) Review request for hbase. Summary --- This patch includes the first three sub-tasks of HBASE-50: 1. Start and monitor the creation of snapshot via ZooKeeper 2. Create snapshot of an HBase table 3. Some existing functions of HBase are modified to support snapshot Currently snapshots can be created as expected, but can not be restored or deleted yet This addresses bug HBASE-50. http://issues.apache.org/jira/browse/HBASE-50 Diffs (updated) - bin/add_snapshot_family.rb PRE-CREATION src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 src/main/java/org/apache/hadoop/hbase/SnapshotDescriptor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/TablePartiallyOpenException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e7a src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde3a src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1183584 src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotSentinel.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 src/main/java/org/apache/hadoop/hbase/regionserver/Snapshotter.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java 4d4b00a src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 src/main/resources/hbase-default.xml b73f0ff src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 src/test/java/org/apache/hadoop/hbase/master/TestLogsCleaner.java 8b7f60f src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 38ef520 src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatcher.java PRE-CREATION Diff: http://review.cloudera.org/r/467/diff Testing --- Unit tests and integration tests with mini cluster passed. Thanks, Chongxin Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue
[jira] Resolved: (HBASE-58) [hbase] review and fix logging levels
[ https://issues.apache.org/jira/browse/HBASE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-58. Resolution: Invalid Old and an ongoing project, one that we'll not address in a single JIRA. Closing. [hbase] review and fix logging levels - Key: HBASE-58 URL: https://issues.apache.org/jira/browse/HBASE-58 Project: HBase Issue Type: Improvement Reporter: Jim Kellerman Priority: Minor Currently, the only way to tell what is really going on with an HBase cluster is to enable DEBUG level logging. Unfortunately, this also generates a lot of 'noise' messages. We need to review log messages and see which DEBUG messages should be promoted to INFO and if any current INFO messages should be demoted to debug. In addition, some messages are very verbose and don't really need to be. This should be fixed too. A good starting point for review would be to look at the output from test-contrib. Although that is not everything, it is a place to start working from. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close
[ https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans reassigned HBASE-2915: - Assignee: Jean-Daniel Cryans Deadlock between HRegion.ICV and HRegion.close -- Key: HBASE-2915 URL: https://issues.apache.org/jira/browse/HBASE-2915 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.0 HRegion.ICV gets a row lock then gets a newScanner lock. HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all row locks to finish. If the ICV got the row lock and then close got the newScannerLock, both end up waiting on the other. This was introduced when Get became a Scan. Stack thinks we can get rid of the newScannerLock in close since we setClosing to true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close
[ https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900367#action_12900367 ] Jean-Daniel Cryans commented on HBASE-2915: --- There is another deadlock that needs fixing in the scope of this jira. Since the split code was redone, there's a deadlock when SplitTransaction acquires the splitAndCloses writelock while a flush is running for it. It looks like: {noformat} regionserver60021.compactor daemon prio=10 tid=0x7fc31845b800 nid=0x5f62 in Object.wait() [0x7fc31e9e7000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:493) - locked 0x7fc336877998 (a org.apache.hadoop.hbase.regionserver.HRegion$WriteState) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:213) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:186) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:157) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:87) regionserver60021.cacheFlusher daemon prio=10 tid=0x7fc31845a000 nid=0x5f61 waiting on condition [0x7fc31eae8000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x7fc336561750 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:877) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:793) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:249) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:223) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146) {noformat} Deadlock between HRegion.ICV and HRegion.close -- Key: HBASE-2915 URL: https://issues.apache.org/jira/browse/HBASE-2915 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.0 HRegion.ICV gets a row lock then gets a newScanner lock. HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all row locks to finish. If the ICV got the row lock and then close got the newScannerLock, both end up waiting on the other. This was introduced when Get became a Scan. Stack thinks we can get rid of the newScannerLock in close since we setClosing to true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1660) need a rolling restart script
[ https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900381#action_12900381 ] Jean-Daniel Cryans commented on HBASE-1660: --- Since this was committed, I see this when using the normal startup scripts: {noformat} /data/jdcryans/git/hbase/bin/hbase-daemons.sh: line 49: /data/jdcryans/git/hbase/bin/master-backup.sh: No such file or directory /data/jdcryans/git/hbase/bin/hbase-daemons.sh: line 49: exec: /data/jdcryans/git/hbase/bin/master-backup.sh: cannot execute: No such file or directory {noformat} I don't see that file anywhere. need a rolling restart script - Key: HBASE-1660 URL: https://issues.apache.org/jira/browse/HBASE-1660 Project: HBase Issue Type: New Feature Affects Versions: 0.20.0 Reporter: ryan rawson Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.90.0 need a script that will do a rolling restart. It should be configurable in 2 ways: - how long to keep the daemon down per host - how long to wait between hosts for regionservers in my own hacky command line I used 10/60. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1660) need a rolling restart script
[ https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900385#action_12900385 ] Nicolas Spiegelberg commented on HBASE-1660: Do you have HBASE-2870 applied? That patch added the master-backup.sh file. It should have been committed to the public branch before this patch. need a rolling restart script - Key: HBASE-1660 URL: https://issues.apache.org/jira/browse/HBASE-1660 Project: HBase Issue Type: New Feature Affects Versions: 0.20.0 Reporter: ryan rawson Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.90.0 need a script that will do a rolling restart. It should be configurable in 2 ways: - how long to keep the daemon down per host - how long to wait between hosts for regionservers in my own hacky command line I used 10/60. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1660) need a rolling restart script
[ https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900386#action_12900386 ] stack commented on HBASE-1660: -- My fault. Fixing. I didn't svn add. need a rolling restart script - Key: HBASE-1660 URL: https://issues.apache.org/jira/browse/HBASE-1660 Project: HBase Issue Type: New Feature Affects Versions: 0.20.0 Reporter: ryan rawson Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.90.0 need a script that will do a rolling restart. It should be configurable in 2 ways: - how long to keep the daemon down per host - how long to wait between hosts for regionservers in my own hacky command line I used 10/60. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2924) TestLogRolling doesn't use the right HLog half the time
[ https://issues.apache.org/jira/browse/HBASE-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-2924. --- Hadoop Flags: [Reviewed] Resolution: Fixed I refactored TestScannerTimeout to use the new tool too, which is now more commented. Committed to trunk. TestLogRolling doesn't use the right HLog half the time --- Key: HBASE-2924 URL: https://issues.apache.org/jira/browse/HBASE-2924 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.90.0 Attachments: HBASE-2924.patch Since HBASE-2868, TestLogRolling uses 2 region servers instead of 1. The rest of the un-refactored code isn't expecting that, and only used the log from the first RS. This is why we get very inconsistent results. Fix by either coming back to 1 RS or at least use the right HLog. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close
[ https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900492#action_12900492 ] HBase Review Board commented on HBASE-2915: --- Message from: Jean-Daniel Cryans jdcry...@apache.org --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/691/ --- Review request for hbase. Summary --- This patch removes newScannerLock and renames splitAndClose lock to just lock. Every operation is now required to obtain the read lock on lock before doing anything (including getting a row lock). This is done by calling openRegionTransaction inside a try statement and by calling closeRegionTransaction in finally. flushcache got refactored some more in order to do the locking in the proper order; first get the read lock, then do the writestate handling. Finally, it removes the need to have a writeLock when flushing when subclassers give atomic work do to via internalPreFlushcacheCommit. This means that this patch breaks external contribs. This is required to keep our whole locking mechanism simpler. This addresses bug HBASE-2915. http://issues.apache.org/jira/browse/HBASE-2915 Diffs - /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 987300 /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java 987300 Diff: http://review.cloudera.org/r/691/diff Testing --- 5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in the process of deploying it on a cluster. Thanks, Jean-Daniel Deadlock between HRegion.ICV and HRegion.close -- Key: HBASE-2915 URL: https://issues.apache.org/jira/browse/HBASE-2915 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.0 HRegion.ICV gets a row lock then gets a newScanner lock. HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all row locks to finish. If the ICV got the row lock and then close got the newScannerLock, both end up waiting on the other. This was introduced when Get became a Scan. Stack thinks we can get rid of the newScannerLock in close since we setClosing to true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close
[ https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900510#action_12900510 ] HBase Review Board commented on HBASE-2915: --- Message from: Ryan Rawson ryano...@gmail.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/691/#review966 --- /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3144 oh wow i cant believe this was ever here /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3145 i thought we agreed that the closing flag had to be set BEFORE the write lock was acquired to prevent race conditions? /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3146 lets just excise this and break compile time compatibility. Also remove internalPreFlushcacheCommit too /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3147 ditto remove this whole try/finally bit /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3148 maybe we shouldnt call this 'transaction' might confuse people into thinking we support real transactions... not sure what to call it at this moment tho - Ryan Deadlock between HRegion.ICV and HRegion.close -- Key: HBASE-2915 URL: https://issues.apache.org/jira/browse/HBASE-2915 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.0 HRegion.ICV gets a row lock then gets a newScanner lock. HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all row locks to finish. If the ICV got the row lock and then close got the newScannerLock, both end up waiting on the other. This was introduced when Get became a Scan. Stack thinks we can get rid of the newScannerLock in close since we setClosing to true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close
[ https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900515#action_12900515 ] HBase Review Board commented on HBASE-2915: --- Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/691/#review967 --- /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3151 How about naming this method openRegionProlog ? /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3149 Please add: It has to be called inside the corresponding finally block /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3150 How about naming this method closeRegionEpilog ? - Ted Deadlock between HRegion.ICV and HRegion.close -- Key: HBASE-2915 URL: https://issues.apache.org/jira/browse/HBASE-2915 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.0 HRegion.ICV gets a row lock then gets a newScanner lock. HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all row locks to finish. If the ICV got the row lock and then close got the newScannerLock, both end up waiting on the other. This was introduced when Get became a Scan. Stack thinks we can get rid of the newScannerLock in close since we setClosing to true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close
[ https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900527#action_12900527 ] HBase Review Board commented on HBASE-2915: --- Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/691/#review969 --- /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3155 Or regionOperationProlog() /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java http://review.cloudera.org/r/691/#comment3156 and regionOperationEpilog() - Ted Deadlock between HRegion.ICV and HRegion.close -- Key: HBASE-2915 URL: https://issues.apache.org/jira/browse/HBASE-2915 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.0 HRegion.ICV gets a row lock then gets a newScanner lock. HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all row locks to finish. If the ICV got the row lock and then close got the newScannerLock, both end up waiting on the other. This was introduced when Get became a Scan. Stack thinks we can get rid of the newScannerLock in close since we setClosing to true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900535#action_12900535 ] Pranav Khaitan commented on HBASE-2856: --- I guess we could fix this by not updating the scanners after a flush. Currently, after every flush we are notifying the scanners (called as observers) so that they update their heap. If we do not notify them about the flush, the scanner wouldn't encounter any inconsistencies. This should solve the specific problem you discussed above where flushing results in inconsistency. This seems like an easy change and maintains correctness. The only drawback is that we are holding some memstore keys for a little longer which doesn't seem too big of a problem. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: ryan rawson Priority: Blocker Fix For: 0.90.0 TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2922) HLog cleanup is done under the updateLock, major slowdown
[ https://issues.apache.org/jira/browse/HBASE-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900548#action_12900548 ] HBase Review Board commented on HBASE-2922: --- Message from: Jean-Daniel Cryans jdcry...@apache.org --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/694/ --- Review request for hbase. Summary --- Simply moves the LogActionListeners and outputfiles handling out of the updateLock synchronization. This addresses bug HBASE-2922. http://issues.apache.org/jira/browse/HBASE-2922 Diffs - /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 987355 Diff: http://review.cloudera.org/r/694/diff Testing --- Unit tests and some PEs. Thanks, Jean-Daniel HLog cleanup is done under the updateLock, major slowdown - Key: HBASE-2922 URL: https://issues.apache.org/jira/browse/HBASE-2922 Project: HBase Issue Type: Bug Affects Versions: 0.20.6, 0.89.20100621 Reporter: Jean-Daniel Cryans Something I've seen quite often in our production environment: {quote} 2010-08-16 16:17:27,104 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000385321 whose highest sequence/edit id is 64837079950 2010-08-16 16:17:27,286 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000392770 whose highest sequence/edit id is 64837088260 2010-08-16 16:17:27,452 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000399300 whose highest sequence/edit id is 64837096566 2010-08-16 16:17:27,635 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000406997 whose highest sequence/edit id is 64837104865 2010-08-16 16:17:27,827 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000413803 whose highest sequence/edit id is 64837113153 2010-08-16 16:17:27,993 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000421709 whose highest sequence/edit id is 64837121467 2010-08-16 16:17:28,160 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000427333 whose highest sequence/edit id is 64837129775 2010-08-16 16:17:28,432 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000434365 whose highest sequence/edit id is 64837138074 2010-08-16 16:17:28,518 INFO org.apache.hadoop.hbase.regionserver.HLog: removing old hlog file /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000440347 whose highest sequence/edit id is 64837146376 2010-08-16 16:17:28,612 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 39 on 60020 took 1801ms appending an edit to hlog; editcount=0 2010-08-16 16:17:28,615 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 37 on 60020 took 1804ms appending an edit to hlog; editcount=1 2010-08-16 16:17:28,615 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 25 on 60020 took 1805ms appending an edit to hlog; editcount=2 ... 2010-08-16 16:17:28,619 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 41 on 60020 took 1875ms appending an edit to hlog; editcount=50 2010-08-16 16:17:28,619 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 24 on 60020 took 1876ms appending an edit to hlog; editcount=51 2010-08-16 16:17:28,619 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 48 on 60020 took 1881ms appending an edit to hlog; editcount=54 {quote} And looking at HLog.rollWriter, we roll then cleanup those unused hlog files under updateLock, which blocks all the appenders (as shown). We should only do the first part under that lock -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900553#action_12900553 ] ryan rawson commented on HBASE-2856: That sounds possible... the extra memory held could be up to 64mb * block-size * # of families. Ie: a few hundred megs or even gigs. https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900535#action_12900535] Currently, after every flush we are notifying the scanners (called as observers) so that they update their heap. If we do not notify them about the flush, the scanner wouldn't encounter any inconsistencies. This should solve the specific problem you discussed above where flushing results in inconsistency. This seems like an easy change and maintains correctness. The only drawback is that we are holding some memstore keys for a little longer which doesn't seem too big of a problem. columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: ryan rawson Priority: Blocker Fix For: 0.90.0 TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.