[jira] Commented: (HBASE-50) Snapshot of table

2010-08-19 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900319#action_12900319
 ] 

HBase Review Board commented on HBASE-50:
-

Message from: Chongxin Li lichong...@zju.edu.cn

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review957
---


1. Rename SnapshotTracker to SnapshotSentinel
2. Write a script (add_snapshot_family.rb) to add snapshot family for META and 
remove method HMaster.addSnapshotFamily. The script is not tested yet (how?)

- Chongxin





 Snapshot of table
 -

 Key: HBASE-50
 URL: https://issues.apache.org/jira/browse/HBASE-50
 Project: HBase
  Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
 Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class 
 Diagram.png


 Havening an option to take a snapshot of a table would be vary useful in 
 production.
 What I would like to see this option do is do a merge of all the data into 
 one or more files stored in the same folder on the dfs. This way we could 
 save data in case of a software bug in hadoop or user code. 
 The other advantage would be to be able to export a table to multi locations. 
 Say I had a read_only table that must be online. I could take a snapshot of 
 it when needed and export it to a separate data center and have it loaded 
 there and then i would have it online at multi data centers for load 
 balancing and failover.
 I understand that hadoop takes the need out of havening backup to protect 
 from failed servers, but this does not protect use from software bugs that 
 might delete or alter data in ways we did not plan. We should have a way we 
 can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-50) Snapshot of table

2010-08-19 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900321#action_12900321
 ] 

HBase Review Board commented on HBASE-50:
-

Message from: Chongxin Li lichong...@zju.edu.cn

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/
---

(Updated 2010-08-19 08:35:37.043957)


Review request for hbase.


Summary
---

This patch includes the first three sub-tasks of HBASE-50:
1. Start and monitor the creation of snapshot via ZooKeeper
2. Create snapshot of an HBase table
3. Some existing functions of HBase are modified to support snapshot

Currently snapshots can be created as expected, but can not be restored or 
deleted yet


This addresses bug HBASE-50.
http://issues.apache.org/jira/browse/HBASE-50


Diffs (updated)
-

  bin/add_snapshot_family.rb PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 
  src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 
  src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 
  src/main/java/org/apache/hadoop/hbase/SnapshotDescriptor.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/TablePartiallyOpenException.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 
  src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e7a 
  src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde3a 
  src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java 
d4bcbed 
  src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b 
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
1183584 
  src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 
  src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d 
  src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 
  src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/SnapshotSentinel.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 
  src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 
  src/main/java/org/apache/hadoop/hbase/regionserver/Snapshotter.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c 
  src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 
  
src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
 4d4b00a 
  src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 
  src/main/resources/hbase-default.xml b73f0ff 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 
  src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 
  src/test/java/org/apache/hadoop/hbase/master/TestLogsCleaner.java 8b7f60f 
  src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 
34b8044 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java 
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 38ef520 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatcher.java 
PRE-CREATION 

Diff: http://review.cloudera.org/r/467/diff


Testing
---

Unit tests and integration tests with mini cluster passed.


Thanks,

Chongxin




 Snapshot of table
 -

 Key: HBASE-50
 URL: https://issues.apache.org/jira/browse/HBASE-50
 Project: HBase
  Issue 

[jira] Resolved: (HBASE-58) [hbase] review and fix logging levels

2010-08-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-58.


Resolution: Invalid

Old and an ongoing project, one that we'll not address in a single JIRA.  
Closing.

 [hbase] review and fix logging levels
 -

 Key: HBASE-58
 URL: https://issues.apache.org/jira/browse/HBASE-58
 Project: HBase
  Issue Type: Improvement
Reporter: Jim Kellerman
Priority: Minor

 Currently, the only way to tell what is really going on with an HBase cluster 
 is to enable DEBUG level logging. Unfortunately, this also generates a lot of 
 'noise' messages. We need to review log messages and see which DEBUG messages 
 should be promoted to INFO and if any current INFO messages should be demoted 
 to debug.
 In addition, some messages are very verbose and don't really need to be. This 
 should be fixed too.
 A good starting point for review would be to look at the output from 
 test-contrib. Although that is not everything, it is a place to start working 
 from.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close

2010-08-19 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-2915:
-

Assignee: Jean-Daniel Cryans

 Deadlock between HRegion.ICV and HRegion.close
 --

 Key: HBASE-2915
 URL: https://issues.apache.org/jira/browse/HBASE-2915
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0


 HRegion.ICV gets a row lock then gets a newScanner lock.
 HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all 
 row locks to finish.
 If the ICV got the row lock and then close got the newScannerLock, both end 
 up waiting on the other. This was introduced when Get became a Scan.
 Stack thinks we can get rid of the newScannerLock in close since we 
 setClosing to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close

2010-08-19 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900367#action_12900367
 ] 

Jean-Daniel Cryans commented on HBASE-2915:
---

There is another deadlock that needs fixing in the scope of this jira. Since 
the split code was redone, there's a deadlock when SplitTransaction acquires 
the splitAndCloses writelock while a flush is running for it. It looks like:

{noformat}

regionserver60021.compactor daemon prio=10 tid=0x7fc31845b800 nid=0x5f62 
in Object.wait() [0x7fc31e9e7000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:493)
- locked 0x7fc336877998 (a 
org.apache.hadoop.hbase.regionserver.HRegion$WriteState)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:213)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:186)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:157)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:87)

regionserver60021.cacheFlusher daemon prio=10 tid=0x7fc31845a000 
nid=0x5f61 waiting on condition [0x7fc31eae8000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x7fc336561750 (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:877)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1197)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:793)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:249)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:223)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
{noformat}

 Deadlock between HRegion.ICV and HRegion.close
 --

 Key: HBASE-2915
 URL: https://issues.apache.org/jira/browse/HBASE-2915
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0


 HRegion.ICV gets a row lock then gets a newScanner lock.
 HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all 
 row locks to finish.
 If the ICV got the row lock and then close got the newScannerLock, both end 
 up waiting on the other. This was introduced when Get became a Scan.
 Stack thinks we can get rid of the newScannerLock in close since we 
 setClosing to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1660) need a rolling restart script

2010-08-19 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900381#action_12900381
 ] 

Jean-Daniel Cryans commented on HBASE-1660:
---

Since this was committed, I see this when using the normal startup scripts:

{noformat}

/data/jdcryans/git/hbase/bin/hbase-daemons.sh: line 49: 
/data/jdcryans/git/hbase/bin/master-backup.sh: No such file or directory
/data/jdcryans/git/hbase/bin/hbase-daemons.sh: line 49: exec: 
/data/jdcryans/git/hbase/bin/master-backup.sh: cannot execute: No such file or 
directory
{noformat}

I don't see that file anywhere.

 need a rolling restart script
 -

 Key: HBASE-1660
 URL: https://issues.apache.org/jira/browse/HBASE-1660
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.20.0
Reporter: ryan rawson
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.90.0


 need a script that will do a rolling restart.
 It should be configurable in 2 ways:
 - how long to keep the daemon down per host
 - how long to wait between hosts
 for regionservers in my own hacky command line I used 10/60.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1660) need a rolling restart script

2010-08-19 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900385#action_12900385
 ] 

Nicolas Spiegelberg commented on HBASE-1660:


Do you have HBASE-2870 applied?  That patch added the master-backup.sh file.  
It should have been committed to the public branch before this patch.

 need a rolling restart script
 -

 Key: HBASE-1660
 URL: https://issues.apache.org/jira/browse/HBASE-1660
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.20.0
Reporter: ryan rawson
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.90.0


 need a script that will do a rolling restart.
 It should be configurable in 2 ways:
 - how long to keep the daemon down per host
 - how long to wait between hosts
 for regionservers in my own hacky command line I used 10/60.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1660) need a rolling restart script

2010-08-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900386#action_12900386
 ] 

stack commented on HBASE-1660:
--

My fault.  Fixing.  I didn't svn add.

 need a rolling restart script
 -

 Key: HBASE-1660
 URL: https://issues.apache.org/jira/browse/HBASE-1660
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.20.0
Reporter: ryan rawson
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.90.0


 need a script that will do a rolling restart.
 It should be configurable in 2 ways:
 - how long to keep the daemon down per host
 - how long to wait between hosts
 for regionservers in my own hacky command line I used 10/60.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2924) TestLogRolling doesn't use the right HLog half the time

2010-08-19 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-2924.
---

Hadoop Flags: [Reviewed]
  Resolution: Fixed

I refactored TestScannerTimeout to use the new tool too, which is now more 
commented. Committed to trunk.

 TestLogRolling doesn't use the right HLog half the time
 ---

 Key: HBASE-2924
 URL: https://issues.apache.org/jira/browse/HBASE-2924
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.90.0

 Attachments: HBASE-2924.patch


 Since HBASE-2868, TestLogRolling uses 2 region servers instead of 1. The rest 
 of the un-refactored code isn't expecting that, and only used the log from 
 the first RS. This is why we get very inconsistent results. Fix by either 
 coming back to 1 RS or at least use the right HLog.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close

2010-08-19 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900492#action_12900492
 ] 

HBase Review Board commented on HBASE-2915:
---

Message from: Jean-Daniel Cryans jdcry...@apache.org

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/
---

Review request for hbase.


Summary
---

This patch removes newScannerLock and renames splitAndClose lock to just 
lock. Every operation is now required to obtain the read lock on lock 
before doing anything (including getting a row lock). This is done by calling 
openRegionTransaction inside a try statement and by calling 
closeRegionTransaction in finally.

flushcache got refactored some more in order to do the locking in the proper 
order; first get the read lock, then do the writestate handling.

Finally, it removes the need to have a writeLock when flushing when subclassers 
give atomic work do to via internalPreFlushcacheCommit. This means that this 
patch breaks external contribs. This is required to keep our whole locking 
mechanism simpler.


This addresses bug HBASE-2915.
http://issues.apache.org/jira/browse/HBASE-2915


Diffs
-

  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 987300 
  
/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 
987300 
  
/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
 987300 

Diff: http://review.cloudera.org/r/691/diff


Testing
---

5 concurrent ICV threads + randomWrite 3 + scans on a single RS. I'm also in 
the process of deploying it on a cluster.


Thanks,

Jean-Daniel




 Deadlock between HRegion.ICV and HRegion.close
 --

 Key: HBASE-2915
 URL: https://issues.apache.org/jira/browse/HBASE-2915
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0


 HRegion.ICV gets a row lock then gets a newScanner lock.
 HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all 
 row locks to finish.
 If the ICV got the row lock and then close got the newScannerLock, both end 
 up waiting on the other. This was introduced when Get became a Scan.
 Stack thinks we can get rid of the newScannerLock in close since we 
 setClosing to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close

2010-08-19 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900510#action_12900510
 ] 

HBase Review Board commented on HBASE-2915:
---

Message from: Ryan Rawson ryano...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review966
---



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3144

oh wow i cant believe this was ever here



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3145

i thought we agreed that the closing flag had to be set BEFORE the write 
lock was acquired to prevent race conditions?



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3146

lets just excise this and break compile time compatibility.  Also remove 
internalPreFlushcacheCommit too



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3147

ditto remove this whole try/finally bit



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3148

maybe we shouldnt call this 'transaction' might confuse people into 
thinking we support real transactions... not sure what to call it at this 
moment tho


- Ryan





 Deadlock between HRegion.ICV and HRegion.close
 --

 Key: HBASE-2915
 URL: https://issues.apache.org/jira/browse/HBASE-2915
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0


 HRegion.ICV gets a row lock then gets a newScanner lock.
 HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all 
 row locks to finish.
 If the ICV got the row lock and then close got the newScannerLock, both end 
 up waiting on the other. This was introduced when Get became a Scan.
 Stack thinks we can get rid of the newScannerLock in close since we 
 setClosing to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close

2010-08-19 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900515#action_12900515
 ] 

HBase Review Board commented on HBASE-2915:
---

Message from: Ted Yu ted...@yahoo.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review967
---



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3151

How about naming this method openRegionProlog ?



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3149

Please add:
It has to be called inside the corresponding finally block



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3150

How about naming this method closeRegionEpilog ?


- Ted





 Deadlock between HRegion.ICV and HRegion.close
 --

 Key: HBASE-2915
 URL: https://issues.apache.org/jira/browse/HBASE-2915
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0


 HRegion.ICV gets a row lock then gets a newScanner lock.
 HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all 
 row locks to finish.
 If the ICV got the row lock and then close got the newScannerLock, both end 
 up waiting on the other. This was introduced when Get became a Scan.
 Stack thinks we can get rid of the newScannerLock in close since we 
 setClosing to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close

2010-08-19 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900527#action_12900527
 ] 

HBase Review Board commented on HBASE-2915:
---

Message from: Ted Yu ted...@yahoo.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/691/#review969
---



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3155

Or regionOperationProlog()



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
http://review.cloudera.org/r/691/#comment3156

and regionOperationEpilog()


- Ted





 Deadlock between HRegion.ICV and HRegion.close
 --

 Key: HBASE-2915
 URL: https://issues.apache.org/jira/browse/HBASE-2915
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0


 HRegion.ICV gets a row lock then gets a newScanner lock.
 HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all 
 row locks to finish.
 If the ICV got the row lock and then close got the newScannerLock, both end 
 up waiting on the other. This was introduced when Get became a Scan.
 Stack thinks we can get rid of the newScannerLock in close since we 
 setClosing to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2856) TestAcidGuarantee broken on trunk

2010-08-19 Thread Pranav Khaitan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900535#action_12900535
 ] 

Pranav Khaitan commented on HBASE-2856:
---

I guess we could fix this by not updating the scanners after a flush. 
Currently, after every flush we are notifying the scanners (called as 
observers) so that they update their heap. If we do not notify them about the 
flush, the scanner wouldn't encounter any inconsistencies. This should solve 
the specific problem you discussed above where flushing results in 
inconsistency. This seems like an easy change and maintains correctness. The 
only drawback is that we are holding some memstore keys for a little longer 
which doesn't seem too big of a problem.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: ryan rawson
Priority: Blocker
 Fix For: 0.90.0


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2922) HLog cleanup is done under the updateLock, major slowdown

2010-08-19 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900548#action_12900548
 ] 

HBase Review Board commented on HBASE-2922:
---

Message from: Jean-Daniel Cryans jdcry...@apache.org

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/694/
---

Review request for hbase.


Summary
---

Simply moves the LogActionListeners and outputfiles handling out of the 
updateLock synchronization.


This addresses bug HBASE-2922.
http://issues.apache.org/jira/browse/HBASE-2922


Diffs
-

  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
987355 

Diff: http://review.cloudera.org/r/694/diff


Testing
---

Unit tests and some PEs.


Thanks,

Jean-Daniel




 HLog cleanup is done under the updateLock, major slowdown
 -

 Key: HBASE-2922
 URL: https://issues.apache.org/jira/browse/HBASE-2922
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.20.6, 0.89.20100621
Reporter: Jean-Daniel Cryans

 Something I've seen quite often in our production environment:
 {quote}
 2010-08-16 16:17:27,104 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000385321 whose highest 
 sequence/edit id is 64837079950
 2010-08-16 16:17:27,286 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000392770 whose highest 
 sequence/edit id is 64837088260
 2010-08-16 16:17:27,452 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000399300 whose highest 
 sequence/edit id is 64837096566
 2010-08-16 16:17:27,635 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000406997 whose highest 
 sequence/edit id is 64837104865
 2010-08-16 16:17:27,827 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000413803 whose highest 
 sequence/edit id is 64837113153
 2010-08-16 16:17:27,993 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000421709 whose highest 
 sequence/edit id is 64837121467
 2010-08-16 16:17:28,160 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000427333 whose highest 
 sequence/edit id is 64837129775
 2010-08-16 16:17:28,432 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000434365 whose highest 
 sequence/edit id is 64837138074
 2010-08-16 16:17:28,518 INFO org.apache.hadoop.hbase.regionserver.HLog: 
 removing old hlog file 
 /hbase/.logs/rs22,60020,1280909840873/hlog.dat.1282000440347 whose highest 
 sequence/edit id is 64837146376
 2010-08-16 16:17:28,612 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
 Server handler 39 on 60020 took 1801ms appending an edit to hlog; editcount=0
 2010-08-16 16:17:28,615 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
 Server handler 37 on 60020 took 1804ms appending an edit to hlog; editcount=1
 2010-08-16 16:17:28,615 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
 Server handler 25 on 60020 took 1805ms appending an edit to hlog; editcount=2
 ...
 2010-08-16 16:17:28,619 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
 Server handler 41 on 60020 took 1875ms appending an edit to hlog; editcount=50
 2010-08-16 16:17:28,619 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
 Server handler 24 on 60020 took 1876ms appending an edit to hlog; editcount=51
 2010-08-16 16:17:28,619 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC 
 Server handler 48 on 60020 took 1881ms appending an edit to hlog; editcount=54
 {quote}
 And looking at HLog.rollWriter, we roll then cleanup those unused hlog files 
 under updateLock, which blocks all the appenders (as shown). We should only 
 do the first part under that lock

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2856) TestAcidGuarantee broken on trunk

2010-08-19 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900553#action_12900553
 ] 

ryan rawson commented on HBASE-2856:


That sounds possible... the extra memory held could be up to 64mb *
block-size * # of families. Ie: a few hundred megs or even gigs.

https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900535#action_12900535]
Currently, after every flush we are notifying the scanners (called as
observers) so that they update their heap. If we do not notify them about
the flush, the scanner wouldn't encounter any inconsistencies. This should
solve the specific problem you discussed above where flushing results in
inconsistency. This seems like an easy change and maintains correctness. The
only drawback is that we are holding some memstore keys for a little longer
which doesn't seem too big of a problem.
columns from a row, and every so often the first column of N is different,
when it should be the same. This is a bug deep inside the scanner whereby
the first peek() of a row is done at time T then the rest of the read is
done at T+1 after a flush, thus the memstoreTS data is lost, and previously
'uncommitted' data becomes committed and flushed to disk.
equivalent value) to the HFile thus allowing us to preserve read consistency
past flushes. Another solution involves fixing the scanners so that peek()
is not destructive (and thus might return different things at different
times alas).


 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: ryan rawson
Priority: Blocker
 Fix For: 0.90.0


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.