[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2011-12-20 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173012#comment-13173012
 ] 

Todd Lipcon commented on HBASE-5074:


bq. One minor disadvantage of this approach is that checksums would be computed 
twice, once by the hbase regionserver and once by the hdfs client. How bad is 
this cpu overhead?

You mean on write? The native CRC32C implementation in HDFS trunk right now can 
do somewhere around 6GB/sec - I clocked it at about 16% overhead compared to 
the non-checksummed path a while ago. So I think overhead is fairly minimal.

bq. I am proposing that HBase disk format V3 have a 4 byte checksum for every 
hbase block

4 byte checksum for 64KB+ of data seems pretty low. IMO we should continue to 
do chunked checksums - maybe a CRC32 for every 1KB in the block. This allows 
people to use larger block sizes without compromising checksum effectiveness. 
The reason to choose chunked CRC32 over a wider hash is that CRC32 has a very 
efficient hardware implementation in SSE4.2. Plus, we can share all the JNI 
code already developed for Hadoop to calculate and verify these style of 
checksums :)

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5066) Upgrade to zk 3.4.1

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173022#comment-13173022
 ] 

Hudson commented on HBASE-5066:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-5066 Upgrade to zk 3.4.1

stack : 
Files : 
* /hbase/trunk/pom.xml


 Upgrade to zk 3.4.1
 ---

 Key: HBASE-5066
 URL: https://issues.apache.org/jira/browse/HBASE-5066
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 5066.txt


 Currently we are shipping 0.92 with 3.4.1rc2 which is what became the release 
 but change the pom to get the release; it looks better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5062) Missing logons if security is enabled

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173026#comment-13173026
 ] 

Hudson commented on HBASE-5062:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-5062 Missing logons if security is enabled

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/Main.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Strings.java


 Missing logons if security is enabled
 -

 Key: HBASE-5062
 URL: https://issues.apache.org/jira/browse/HBASE-5062
 Project: HBase
  Issue Type: Bug
  Components: rest, security, thrift
Affects Versions: 0.92.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.92.0

 Attachments: HBASE-5062-v2.patch, HBASE-5062.patch


 Somehow the attached changes are missing from the security integration. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5029) TestDistributedLogSplitting fails on occasion

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173023#comment-13173023
 ] 

Hudson commented on HBASE-5029:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-5029 TestDistributedLogSplitting fails on occasion; Added catch of 
NPE and reenabled ignored test
HBASE-5029 TestDistributedLogSplitting fails on occasion; disabling failing 
test -- redo -- forgot to import @Ignore
HBASE-5029 TestDistributedLogSplitting fails on occasion; disabling failing test

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java

stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java

stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java


 TestDistributedLogSplitting fails on occasion
 -

 Key: HBASE-5029
 URL: https://issues.apache.org/jira/browse/HBASE-5029
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Attachments: 
 0001-HBASE-5029-jira-TestDistributedLogSplitting-fails-on.patch, 
 5029-addingignore.txt, 5029-catch-dfsclient-npe-v2.txt, 
 5029-catch-dfsclient-npe.txt, HBASE-5029.D891.1.patch, HBASE-5029.D891.2.patch


 This is how it usually fails: 
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testWorkerAbort/
 Assigning mighty Prakash since he offered to take a looksee.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5051) HBaseTestingUtility#getHBaseAdmin() creates a new HBaseAdmin instance at each call

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173024#comment-13173024
 ] 

Hudson commented on HBASE-5051:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-5051 HBaseTestingUtility#getHBaseAdmin() creates a new HBaseAdmin 
instance at each call

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/constraint/TestConstraint.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportExport.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/rest/TestScannersWithFilters.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/rest/TestTableResource.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java


 HBaseTestingUtility#getHBaseAdmin() creates a new HBaseAdmin instance at each 
 call
 --

 Key: HBASE-5051
 URL: https://issues.apache.org/jira/browse/HBASE-5051
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5051.patch, 5051.v2.patch, 5051.v2.patch, 5051.v2.patch, 
 5051.v2.patch


 As it's a new instance, it should be closed. As the function name seems to 
 imply that it's an instance managed by HBaseTestingUtility, most of the users 
 don't close it = leak

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5063) RegionServers fail to report to backup HMaster after primary goes down.

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173025#comment-13173025
 ] 

Hudson commented on HBASE-5063:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-5063 RegionServers fail to report to backup HMaster after primary 
goes down

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java


 RegionServers fail to report to backup HMaster after primary goes down.
 ---

 Key: HBASE-5063
 URL: https://issues.apache.org/jira/browse/HBASE-5063
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5063.patch, hbase-5063.v2.0.92.patch, 
 hbase-5063.v2.trunk.patch


 # Setup cluster with two HMasters
 # Observe that HM1 is up and that all RS's are in the RegionServer list on 
 web page.
 # Kill (not even -9) the active HMaster
 # Wait for ZK to time out (default 3 minutes).
 # Observe that HM2 is now active.  Tables may show up but RegionServers never 
 report on web page.  Existing connections are fine.  New connections cannot 
 find regionservers.
 Note: 
 * If we replace a new HM1 in the same place and kill HM2, the cluster 
 functions normally again after recovery.  This sees to indicate that 
 regionservers are stuck trying to talk to the old HM1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4935) hbase 0.92.0 doesn't work going against 0.20.205.0, its packaged hadoop

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173028#comment-13173028
 ] 

Hudson commented on HBASE-4935:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-4935 hbase 0.92.0 doesn't work going against 0.20.205.0, its packaged 
hadoop

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java


 hbase 0.92.0 doesn't work going against 0.20.205.0, its packaged hadoop
 ---

 Key: HBASE-4935
 URL: https://issues.apache.org/jira/browse/HBASE-4935
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 4935-reverse.txt, 4935-v3-092.txt, 4935-v3.txt, 
 4935-v3.txt, 4935.txt


 See this Mikhail thread up on the list: 
 http://search-hadoop.com/m/WMUZR24EAJ1/%2522SequenceFileLogReader+uses+a+reflection+hack+resulting+in+runtime+failures%2522subj=Re+SequenceFileLogReader+uses+a+reflection+hack+resulting+in+runtime+failures
 Dig into it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5068) RC1 can not build its hadoop-0.23 profile

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173030#comment-13173030
 ] 

Hudson commented on HBASE-5068:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-5068 RC1 can not build its hadoop-0.23 profile

stack : 
Files : 
* /hbase/trunk/pom.xml


 RC1 can not build its hadoop-0.23 profile
 -

 Key: HBASE-5068
 URL: https://issues.apache.org/jira/browse/HBASE-5068
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-5068.patch.txt


 The hadoop .23 version needs to be bumped to 0.23.1-SNAPSHOT

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5060) HBase client is blocked forever

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173027#comment-13173027
 ] 

Hudson commented on HBASE-5060:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-5060  HBase client is blocked forever (Jinchao)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


 HBase client is blocked forever
 ---

 Key: HBASE-5060
 URL: https://issues.apache.org/jira/browse/HBASE-5060
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Critical
 Fix For: 0.92.0, 0.90.6

 Attachments: HBASE-5060_Branch90trial.patch, HBASE-5060_trunk.patch


 Since the client had a temporary network failure, After it recovered.
 I found my client thread was blocked. 
 Looks below stack and logs, It said that we use a invalid CatalogTracker in 
 function tableExists.
 Block stack:
 WriteHbaseThread33 prio=10 tid=0x7f76bc27a800 nid=0x2540 in 
 Object.wait() [0x7f76af4f3000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)
  - locked 0x7f7a67817c98 (a 
 java.util.concurrent.atomic.AtomicBoolean)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
  at 
 org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:427)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:164)
  at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
 Source)
  at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
  - locked 0x7f7a4c5dc578 (a com.huawei.hdi.hbase.HbaseReOper)
  at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
  at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)
 In ZooKeeperNodeTracker, We don't throw the KeeperException to high level.
 So in CatalogTracker level, We think ZooKeeperNodeTracker start success and
 continue to process .
 [WriteHbaseThread33]2011-12-16 17:07:33,153[WARN ]  | 
 hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Unable to 
 get data of znode /hbase/root-region-server | 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:557)
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/root-region-server
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
  at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
  at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
  at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
 Source)
  at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
  at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
  at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)
 [WriteHbaseThread33]2011-12-16 17:07:33,361[ERROR]  | 
 hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Received 
 unexpected KeeperException, re-throwing exception | 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.keeperException(ZooKeeperWatcher.java:385)
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/root-region-server
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
  at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
  at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
  at 
 

[jira] [Commented] (HBASE-5058) Allow HBaseAmin to use an existing connection

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173029#comment-13173029
 ] 

Hudson commented on HBASE-5058:
---

Integrated in HBase-TRUNK-security #38 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/38/])
HBASE-5058 Allow HBaseAmin to use an existing connection (Lars H)

larsh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java


 Allow HBaseAmin to use an existing connection
 -

 Key: HBASE-5058
 URL: https://issues.apache.org/jira/browse/HBASE-5058
 Project: HBase
  Issue Type: Sub-task
  Components: client
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5058-v2.txt, 5058-v3.txt, 5058-v3.txt, 5058.txt


 What HBASE-4805 does for HTables, this should do for HBaseAdmin.
 Along with this the shared error handling and retrying between HBaseAdmin and 
 HConnectionManager can also be improved. I'll attach a first pass patch soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5063) RegionServers fail to report to backup HMaster after primary goes down.

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173063#comment-13173063
 ] 

Hudson commented on HBASE-5063:
---

Integrated in HBase-TRUNK #2562 (See 
[https://builds.apache.org/job/HBase-TRUNK/2562/])
HBASE-5063 RegionServers fail to report to backup HMaster after primary 
goes down

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java


 RegionServers fail to report to backup HMaster after primary goes down.
 ---

 Key: HBASE-5063
 URL: https://issues.apache.org/jira/browse/HBASE-5063
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5063.patch, hbase-5063.v2.0.92.patch, 
 hbase-5063.v2.trunk.patch


 # Setup cluster with two HMasters
 # Observe that HM1 is up and that all RS's are in the RegionServer list on 
 web page.
 # Kill (not even -9) the active HMaster
 # Wait for ZK to time out (default 3 minutes).
 # Observe that HM2 is now active.  Tables may show up but RegionServers never 
 report on web page.  Existing connections are fine.  New connections cannot 
 find regionservers.
 Note: 
 * If we replace a new HM1 in the same place and kill HM2, the cluster 
 functions normally again after recovery.  This sees to indicate that 
 regionservers are stuck trying to talk to the old HM1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v6.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5067) HMaster uses wrong name for address (in stand-alone mode)

2011-12-20 Thread Eran Hirsch (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173079#comment-13173079
 ] 

Eran Hirsch commented on HBASE-5067:


To the best of my understanding, the problem is fixed in the trunk, but only to 
some extent.
It seems like the flow would work correctly, but relies on the underlying VM 
implementation and assumes certain things which are not strictly assumable.

=

I'll explain...
1. the hostname is computed based on the reverse DNS like before
2. an InetSockedAddress is built from this hostname and stored locally as 
'initialIsa'
3. The RPC server now is created using, among others, 'initialIsa.getHostName()'
4. The address which was binded on by the rpc server is stored as the HMaster 
field 'isa'
5. The server name is initialized with the 'isa' field's hostname.

=

Why is this problematic?

Because it assumes things about the socked implementation which are not 
strictly enforced:

We first call the 'bind' method of a ServerSocket object, with an 
InetSocketAddress instance.
Later on we call ServerSocket's 'getLocalSocketAddress' to get this address 
instance back.
There is no way to know if the same object is returned, or maybe a new object 
is built based on the IP, or whatever other way the implementation chooses. 
Specifically to our case, You can tell this would still hold the 'hostname' 
field we gave it, with our fully qualified dns name.



To conclude,
I think there is a semantic problem with the way the HMaster is initialzed in 
it's c'tor:
1. When creating the rpcServer, we should call the method with 
'initialIsa.getAddress().getHostAddress()' (instead of 
'initialIsa.getHostName()).
This would also be consistent with the comment written next to this parameter, 
saying that we are sending an IP (because now we are sending a DNS name).
2. When setting the 'serverName' field, we need to use the local field 
'hostname' computed earlier (instead of 'this.isa.getHostName()).

==

Notes:
1. The same problem applies to HRegionServer which uses almost the same 
initialization code in its c'tor. 
2. I am not an HBase developer, so i don't know really how to add these changes 
myself.



 HMaster uses wrong name for address (in stand-alone mode)
 -

 Key: HBASE-5067
 URL: https://issues.apache.org/jira/browse/HBASE-5067
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Eran Hirsch

 In STANDALONE mode:
 When setting the configuration option hbase.master.dns.interface (and 
 optional hbase.master.dns.nameserver) to non-default values,
 it is EXPECTED that the master node would report its fully qualified dns name 
 when registering in ZooKeeper,
 BUT INSTEAD, the machines hostname is taken instead.
 For example, my machine is called (aka its hostname is...) machine1 but 
 it's name in the network is machine1.our-dev-network.my-corp.com, so to 
 find this machine's IP anywhere on the network i would need to query for the 
 whole name (because trying to find machine1 is ambiguous on a network).
 Why is this a bug, because when trying to connect to this stand-alone hbase 
 installation from outside the machine it is running on, when querying ZK for 
 /hbase/master we get only the machine1 part, and then fail with an 
 unresolvable address for the master (which later even gives a null pointer 
 because of a missing null check).
 This is the stack trace when calling HTable's c'tor:
 java.lang.IllegalArgumentException: hostname can't be null
   at java.net.InetSocketAddress.init(InetSocketAddress.java:139) 
 ~[na:1.7.0_02]
   at 
 org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64) 
 ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:579)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:688)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
  

[jira] [Updated] (HBASE-5073) Registered listeners not getting removed leading to memory leak in HBaseAdmin

2011-12-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5073:
--

Attachment: HBASE-5073.patch

For branch patch

 Registered listeners not getting removed leading to memory leak in HBaseAdmin
 -

 Key: HBASE-5073
 URL: https://issues.apache.org/jira/browse/HBASE-5073
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-5073.patch


 HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog 
 tracker.  Every time Root node tracker and meta node tracker are started and 
 a listener is registered.  But after the operations are performed the 
 listeners are not getting removed. Hence if the admin apis are consistently 
 used then it may lead to memory leak.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5066) Upgrade to zk 3.4.1

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173131#comment-13173131
 ] 

Hudson commented on HBASE-5066:
---

Integrated in HBase-0.92-security #45 (See 
[https://builds.apache.org/job/HBase-0.92-security/45/])
HBASE-5066  Upgrade to zk 3.4.1
HBASE-5066 Upgrade to zk 3.4.1

stack : 
Files : 
* /hbase/branches/0.92/pom.xml

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt


 Upgrade to zk 3.4.1
 ---

 Key: HBASE-5066
 URL: https://issues.apache.org/jira/browse/HBASE-5066
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 5066.txt


 Currently we are shipping 0.92 with 3.4.1rc2 which is what became the release 
 but change the pom to get the release; it looks better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5029) TestDistributedLogSplitting fails on occasion

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173132#comment-13173132
 ] 

Hudson commented on HBASE-5029:
---

Integrated in HBase-0.92-security #45 (See 
[https://builds.apache.org/job/HBase-0.92-security/45/])
HBASE-5029 TestDistributedLogSplitting fails on occasion; Added catch of 
NPE and reenabled ignored test
HBASE-5029 TestDistributedLogSplitting fails on occasion; disabling failing 
test -- redo
HBASE-5029 TestDistributedLogSplitting fails on occasion; disabling failing 
test -- undoing an overcommitpatch -p0 -R  x.txt
HBASE-5029 TestDistributedLogSplitting fails on occasion; disabling failing test

stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java

stack : 
Files : 
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java

stack : 
Files : 
* /hbase/branches/0.92/pom.xml
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java

stack : 
Files : 
* /hbase/branches/0.92/pom.xml
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java


 TestDistributedLogSplitting fails on occasion
 -

 Key: HBASE-5029
 URL: https://issues.apache.org/jira/browse/HBASE-5029
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Attachments: 
 0001-HBASE-5029-jira-TestDistributedLogSplitting-fails-on.patch, 
 5029-addingignore.txt, 5029-catch-dfsclient-npe-v2.txt, 
 5029-catch-dfsclient-npe.txt, HBASE-5029.D891.1.patch, HBASE-5029.D891.2.patch


 This is how it usually fails: 
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testWorkerAbort/
 Assigning mighty Prakash since he offered to take a looksee.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5060) HBase client is blocked forever

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173134#comment-13173134
 ] 

Hudson commented on HBASE-5060:
---

Integrated in HBase-0.92-security #45 (See 
[https://builds.apache.org/job/HBase-0.92-security/45/])
HBASE-5060  HBase client is blocked forever (Jinchao)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


 HBase client is blocked forever
 ---

 Key: HBASE-5060
 URL: https://issues.apache.org/jira/browse/HBASE-5060
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Critical
 Fix For: 0.92.0, 0.90.6

 Attachments: HBASE-5060_Branch90trial.patch, HBASE-5060_trunk.patch


 Since the client had a temporary network failure, After it recovered.
 I found my client thread was blocked. 
 Looks below stack and logs, It said that we use a invalid CatalogTracker in 
 function tableExists.
 Block stack:
 WriteHbaseThread33 prio=10 tid=0x7f76bc27a800 nid=0x2540 in 
 Object.wait() [0x7f76af4f3000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)
  - locked 0x7f7a67817c98 (a 
 java.util.concurrent.atomic.AtomicBoolean)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
  at 
 org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:427)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:164)
  at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
 Source)
  at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
  - locked 0x7f7a4c5dc578 (a com.huawei.hdi.hbase.HbaseReOper)
  at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
  at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)
 In ZooKeeperNodeTracker, We don't throw the KeeperException to high level.
 So in CatalogTracker level, We think ZooKeeperNodeTracker start success and
 continue to process .
 [WriteHbaseThread33]2011-12-16 17:07:33,153[WARN ]  | 
 hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Unable to 
 get data of znode /hbase/root-region-server | 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:557)
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/root-region-server
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
  at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
  at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
  at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
 Source)
  at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
  at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
  at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)
 [WriteHbaseThread33]2011-12-16 17:07:33,361[ERROR]  | 
 hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Received 
 unexpected KeeperException, re-throwing exception | 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.keeperException(ZooKeeperWatcher.java:385)
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/root-region-server
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
  at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
  at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
  

[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173157#comment-13173157
 ] 

Hadoop QA commented on HBASE-5064:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508062/5064.v6.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/554//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/554//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/554//console

This message is automatically generated.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5065) wrong IllegalArgumentException thrown when creating an 'HServerAddress' with an un-reachable hostname

2011-12-20 Thread Eran Hirsch (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173219#comment-13173219
 ] 

Eran Hirsch commented on HBASE-5065:


i am not an Hbase developer, i don't know how to provide a patch.

Anyhow, i checked the trunk and this class has been deprecated all-together, so 
there is no need to fix this anymore (i guess...?)

 wrong IllegalArgumentException thrown when creating an 'HServerAddress' with 
 an un-reachable hostname
 -

 Key: HBASE-5065
 URL: https://issues.apache.org/jira/browse/HBASE-5065
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.90.4
Reporter: Eran Hirsch
Priority: Trivial

 When trying to build an 'HServerAddress' object with an unresolvable hostname:
 e.g. new HServerAddress(www.IAMUNREACHABLE.com:80)
 a call to 'getResolvedAddress' would cause the 'InetSocketAddress' c'tor to 
 throw an IllegalArgumentException because it is called with a null 'hostname' 
 parameter.
 This happens because there is no null-check after the static 
 'getBindAddressInternal' method returns a null value when the hostname is 
 unresolved.
 This is a trivial bug because the code HServerAddress is expected to throw 
 this kind of exception when this error occurs, but it is thrown for the 
 wrong reason. The method 'checkBindAddressCanBeResolved' should be the one 
 throwing the exception (and give a slightly different reason). Because of 
 this reason the method call itself becomes redundent as it will always 
 succeed in the current flow, because the case it checks is already checked 
 for by the previous getResolvedAddress method.
 In short:
 an IllegalArgumentException is thrown with reason: hostname can't be null 
 from the InetSocketAddress c'tor
 INSTEAD OF
 an IllegalArgumentException with reason: Could not resolve the DNS name of 
 [BADHOSTNAME]:[PORT] from HServerAddress's checkBindCanBeResolved method.
 Stack trace:
 java.lang.IllegalArgumentException: hostname can't be null
   at java.net.InetSocketAddress.init(InetSocketAddress.java:139) 
 ~[na:1.7.0_02]
   at 
 org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64) 
 ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:579)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:688)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:688)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594)
  ~[hbase-0.90.4.jar:0.90.4]
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559)
  ~[hbase-0.90.4.jar:0.90.4]
   at org.apache.hadoop.hbase.client.HTable.init(HTable.java:173) 
 ~[hbase-0.90.4.jar:0.90.4]
   at org.apache.hadoop.hbase.client.HTable.init(HTable.java:147) 
 ~[hbase-0.90.4.jar:0.90.4]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173236#comment-13173236
 ] 

Zhihong Yu commented on HBASE-5009:
---

I think we should use threadPool.awaitTermination() where a timeout can be 
specified so that we don't wait indefinitely.

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5076) HBase shell hangs when creating some 'illegal' tables.

2011-12-20 Thread Jonathan Hsieh (Created) (JIRA)
HBase shell hangs when creating some 'illegal' tables.
--

 Key: HBASE-5076
 URL: https://issues.apache.org/jira/browse/HBASE-5076
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Priority: Minor


In hbase shell. These commands hang:
{code}
create 'hbase.version','foo'
create 'splitlog','foo'
{code}

Interestingly

{code}
create 'hbase.id','foo'
create existingtablename, 'foo'
create '.META.','foo'
create '-ROOT-','foo'
{code}

are properly rejected.

We should probably either rename to make the files illegal table names 
(hbase.version to .hbase.version and splitlog to .splitlog) or we could add 
more special cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173318#comment-13173318
 ] 

Zhihong Yu commented on HBASE-5064:
---

Apart from the 5 failed tests, TestReplication hung.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5066) Upgrade to zk 3.4.1

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173323#comment-13173323
 ] 

stack commented on HBASE-5066:
--

The tough part IIRC, was that we use 3.4.x APIs because the 3.3.x have been 
removed w/ no means of work around.  Andrew?

 Upgrade to zk 3.4.1
 ---

 Key: HBASE-5066
 URL: https://issues.apache.org/jira/browse/HBASE-5066
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Andrew Purtell
 Fix For: 0.92.0

 Attachments: 5066.txt


 Currently we are shipping 0.92 with 3.4.1rc2 which is what became the release 
 but change the pom to get the release; it looks better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5073) Registered listeners not getting removed leading to memory leak in HBaseAdmin

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173330#comment-13173330
 ] 

Zhihong Yu commented on HBASE-5073:
---

+1 on patch, if tests pass.

 Registered listeners not getting removed leading to memory leak in HBaseAdmin
 -

 Key: HBASE-5073
 URL: https://issues.apache.org/jira/browse/HBASE-5073
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-5073.patch


 HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog 
 tracker.  Every time Root node tracker and meta node tracker are started and 
 a listener is registered.  But after the operations are performed the 
 listeners are not getting removed. Hence if the admin apis are consistently 
 used then it may lead to memory leak.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5073) Registered listeners not getting removed leading to memory leak in HBaseAdmin

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173350#comment-13173350
 ] 

stack commented on HBASE-5073:
--

+1

 Registered listeners not getting removed leading to memory leak in HBaseAdmin
 -

 Key: HBASE-5073
 URL: https://issues.apache.org/jira/browse/HBASE-5073
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-5073.patch


 HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog 
 tracker.  Every time Root node tracker and meta node tracker are started and 
 a listener is registered.  But after the operations are performed the 
 listeners are not getting removed. Hence if the admin apis are consistently 
 used then it may lead to memory leak.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173357#comment-13173357
 ] 

Phabricator commented on HBASE-5072:


Kannan has commented on the revision [jira] [HBASE-5072] Support Max Value for 
Per-Store Metrics.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java:392
 minor: Since this is already a map with MutableDoubles, you can break this 
into two cases to avoid new allocations when possible. Something like:

  if (cur  == null)  {
tmpMap.put(maxKey, new MutableDouble(val);
  } else if (cur.doubleValue()  val) {
cur.setValue(val);
  }


REVISION DETAIL
  https://reviews.facebook.net/D945


 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5072:
---

Attachment: D945.2.patch

nspiegelberg updated the revision [jira] [HBASE-5072] Support Max Value for 
Per-Store Metrics.
Reviewers: JIRA, mbautin, Kannan

  Added Kannan's peer review optimization

REVISION DETAIL
  https://reviews.facebook.net/D945

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java


 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5058) Allow HBaseAmin to use an existing connection

2011-12-20 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173364#comment-13173364
 ] 

Lars Hofhansl commented on HBASE-5058:
--

@Stack: I think that about sums it up. The complexity of layers and timeout 
stories are alleviated somewhat by parent HBASE-4805 (no per HTable threadpool, 
HTablePool no longer needed).
I had a brief look at the first issue, unless I am missing something this would 
require a nontrivial amount of refactoring. The simplest would be to do all 
network IO from the Connection thread rather than the application thread (as 
described in HBASE-4956). Would need allow for the client to synchronize and 
retrieved exceptions on/from a Future.

Short term, should we take HBASE-4805 all the way and a getTable(...) method to 
HConnection? (Or even further and add put/get/scan/etc methods that take a 
table name to HConnection?)

Long term a design based on asynchhbase with a thin synchronous layer on top is 
probably the best option.

 Allow HBaseAmin to use an existing connection
 -

 Key: HBASE-5058
 URL: https://issues.apache.org/jira/browse/HBASE-5058
 Project: HBase
  Issue Type: Sub-task
  Components: client
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5058-v2.txt, 5058-v3.txt, 5058-v3.txt, 5058.txt


 What HBASE-4805 does for HTables, this should do for HBaseAdmin.
 Along with this the shared error handling and retrying between HBaseAdmin and 
 HConnectionManager can also be improved. I'll attach a first pass patch soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5058) Allow HBaseAmin to use an existing connection

2011-12-20 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173364#comment-13173364
 ] 

Lars Hofhansl edited comment on HBASE-5058 at 12/20/11 6:12 PM:


@Stack: I think that about sums it up. The complexity of layers and timeout 
stories are alleviated somewhat by parent HBASE-4805 (no per HTable threadpool, 
HTablePool no longer needed).
I had a brief look at the first issue, unless I am missing something this would 
require a nontrivial amount of refactoring. The simplest would be to do all 
network IO from the Connection thread rather than the application thread (as 
described in HBASE-4956). Would need allow for the client to synchronize and 
retrieve exceptions on/from a Future.

Short term, should we take HBASE-4805 all the way and add a getTable(...) 
method to HConnection? (Or even further and add put/get/scan/etc methods that 
take a table name to HConnection?)

Long term a design based on asynchhbase with a thin synchronous layer on top is 
probably the best option.

  was (Author: lhofhansl):
@Stack: I think that about sums it up. The complexity of layers and timeout 
stories are alleviated somewhat by parent HBASE-4805 (no per HTable threadpool, 
HTablePool no longer needed).
I had a brief look at the first issue, unless I am missing something this would 
require a nontrivial amount of refactoring. The simplest would be to do all 
network IO from the Connection thread rather than the application thread (as 
described in HBASE-4956). Would need allow for the client to synchronize and 
retrieved exceptions on/from a Future.

Short term, should we take HBASE-4805 all the way and a getTable(...) method to 
HConnection? (Or even further and add put/get/scan/etc methods that take a 
table name to HConnection?)

Long term a design based on asynchhbase with a thin synchronous layer on top is 
probably the best option.
  
 Allow HBaseAmin to use an existing connection
 -

 Key: HBASE-5058
 URL: https://issues.apache.org/jira/browse/HBASE-5058
 Project: HBase
  Issue Type: Sub-task
  Components: client
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5058-v2.txt, 5058-v3.txt, 5058-v3.txt, 5058.txt


 What HBASE-4805 does for HTables, this should do for HBaseAdmin.
 Along with this the shared error handling and retrying between HBaseAdmin and 
 HConnectionManager can also be improved. I'll attach a first pass patch soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4895) Change tablename format in meta to be the UUID of the tablename rather than the tablename.

2011-12-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173370#comment-13173370
 ] 

jirapos...@reviews.apache.org commented on HBASE-4895:
--



bq.  On 2011-12-20 00:48:41, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 29
bq.   https://reviews.apache.org/r/3188/diff/3/?file=64523#file64523line29
bq.  
bq.   If this is an md5 under the wraps, maybe we should just do md5 
rather than do this uuid indirection?  But maybe the UUID class has some 
facility you like that makes it easier to work with?

I'm down with moving to an md5


bq.  On 2011-12-20 00:48:41, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 340
bq.   https://reviews.apache.org/r/3188/diff/3/?file=64523#file64523line340
bq.  
bq.   Why line here?

woops


bq.  On 2011-12-20 00:48:41, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 352
bq.   https://reviews.apache.org/r/3188/diff/3/?file=64523#file64523line352
bq.  
bq.   Were we talking about uuids in original code?
bq.   
bq.   Should we cache tablename in HRI if we are passed it so can avoid a 
meta hit if absent?
bq.   
bq.   If a meta hit to get table name, its in the last HRI only?  Is that 
the plan?   The last HRI in a table has the table name?  Or if not this, where 
is it in the meta table?

That's true, i must have gotten that comment in my previous patch 
(https://reviews.apache.org/r/3186/)

I assumed the tablename was in the hregioninfo.
Not sure what the third question means.


bq.  On 2011-12-20 00:48:41, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 398
bq.   https://reviews.apache.org/r/3188/diff/3/?file=64523#file64523line398
bq.  
bq.   Whats UUID tablename?  And though its not you, whats the 1|2 about?

The 1 or 2 is how you know it's the last region I can make it more clear i'm 
sure.


bq.  On 2011-12-20 00:48:41, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 422
bq.   https://reviews.apache.org/r/3188/diff/3/?file=64523#file64523line422
bq.  
bq.   Something is wrong w/ this patch ?  We had a '@return The UUID of 
the Table name' in original src?

Woops


bq.  On 2011-12-20 00:48:41, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/MetaSearchRow.java, line 21
bq.   https://reviews.apache.org/r/3188/diff/3/?file=64525#file64525line21
bq.  
bq.   MetaSearchRow is not in src, its brought in by another related 
patch?  So this is a patch on top of that patch?

https://reviews.apache.org/r/3186/


- Alex


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3188/#review3992
---


On 2011-12-13 23:36:44, Alex Newman wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3188/
bq.  ---
bq.  
bq.  (Updated 2011-12-13 23:36:44)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  PART 2 of hbase-4616
bq.  
bq.  By uuiding the tablename in the metarow, it enables us to be able to use 
binary values for the end of table marker
bq.  
bq.  
bq.  This addresses bug HBASE-4895.
bq.  https://issues.apache.org/jira/browse/HBASE-4895
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 74cb821 
bq.src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java e5e60a8 
bq.src/main/java/org/apache/hadoop/hbase/client/MetaSearchRow.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/util/Merge.java 67d0fda 
bq.src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java 95712dd 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/SampleRegionWALObserver.java 
ff9c502 
bq.src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java 
368a0e5 
bq.src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java 
36dd289 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java 
6e1211b 
bq.src/test/java/org/apache/hadoop/hbase/rest/TestStatusResource.java 
cffdcb6 
bq.src/test/ruby/hbase/admin_test.rb 0c2672b 
bq.  
bq.  Diff: https://reviews.apache.org/r/3188/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Alex
bq.  
bq.



 Change tablename format in meta to be the UUID of the tablename rather than 
 the tablename.
 --

 Key: HBASE-4895

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2011-12-20 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173373#comment-13173373
 ] 

Andrew Purtell commented on HBASE-5074:
---

+1

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173377#comment-13173377
 ] 

Phabricator commented on HBASE-5072:


Kannan has commented on the revision [jira] [HBASE-5072] Support Max Value for 
Per-Store Metrics.

  Ok -- go it. You are tracking the max only across all CFs. Sounds good. 
Thanks for the clarification. I misread the code there.

REVISION DETAIL
  https://reviews.facebook.net/D945


 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173382#comment-13173382
 ] 

Phabricator commented on HBASE-5072:


Kannan has commented on the revision [jira] [HBASE-5072] Support Max Value for 
Per-Store Metrics.

  s/go it/got it.

REVISION DETAIL
  https://reviews.facebook.net/D945


 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173385#comment-13173385
 ] 

Jean-Daniel Cryans commented on HBASE-5074:
---

This jira's title make it sound like you want to checksum when reading from the 
block cache.

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173388#comment-13173388
 ] 

stack commented on HBASE-5074:
--

Where in the read pipeline would we verify the checksum?  Down in hfile?  Where 
would we do the exception processing forcing reread with checksum=on?  Also 
down in hfile?

(Nice idea BTW)

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-20 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5033:
---

Attachment: D933.3.patch

Liyin updated the revision [jira][HBASE-5033][[89-fb]]Opening/Closing store in 
parallel to reduce region open/close time.
Reviewers: Kannan, mbautin, Karthik, JIRA

  Refactor the code

REVISION DETAIL
  https://reviews.facebook.net/D933

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-5072:
---

Attachment: HBASE-5072.patch

note: patch applies cleanly to both 89-fb  trunk

 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch, HBASE-5072.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-5072:
---

Status: Patch Available  (was: Open)

 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch, HBASE-5072.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-5072:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch, HBASE-5072.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173407#comment-13173407
 ] 

Phabricator commented on HBASE-5072:


nspiegelberg has committed the revision [jira] [HBASE-5072] Support Max Value 
for Per-Store Metrics.

REVISION DETAIL
  https://reviews.facebook.net/D945

COMMIT
  https://reviews.facebook.net/rHBASE1221419


 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch, HBASE-5072.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.

2011-12-20 Thread Liyin Tang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4698:
--

Status: Patch Available  (was: Open)

 Let the HFile Pretty Printer print all the key values for a specific row.
 -

 Key: HBASE-4698
 URL: https://issues.apache.org/jira/browse/HBASE-4698
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, 
 D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch


 When using HFile Pretty Printer to debug HBase issues, 
 it would very nice to allow the Pretty Printer to seek to a specific row, and 
 only print all the key values for this row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173425#comment-13173425
 ] 

Hadoop QA commented on HBASE-4698:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508009/HBASE-4689-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/555//console

This message is automatically generated.

 Let the HFile Pretty Printer print all the key values for a specific row.
 -

 Key: HBASE-4698
 URL: https://issues.apache.org/jira/browse/HBASE-4698
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, 
 D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch


 When using HFile Pretty Printer to debug HBase issues, 
 it would very nice to allow the Pretty Printer to seek to a specific row, and 
 only print all the key values for this row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173428#comment-13173428
 ] 

Phabricator commented on HBASE-5033:


lhofhansl has commented on the revision 
[jira][HBASE-5033][[89-fb]]Opening/Closing store in parallel to reduce region 
open/close time.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:536 Should 
these be daemon threads?
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:814 Same 
here: daemon threads?
  Does this have to be a separate pool from the opener pool? (I guess yes, but 
just want to make sure)
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:294 This is a 
bit confusing to me. We already have a thread pool to open the stores, now we 
have another pool to open storefiles in each store.
  So in the worst case with the default pool size of 10 we could open 10*10 
store files in parallel?
  Should there be different config options for the number of stores (i.e. CFs) 
in parallel and the number of store files per store to be opened in parallel?

REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5070) Constraints implementation and javadoc changes

2011-12-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173439#comment-13173439
 ] 

jirapos...@reviews.apache.org commented on HBASE-5070:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3273/
---

Review request for hbase, Gary Helmling, Ted Yu, and Michael Stack.


Summary
---

Follow-up on changes to constraint as per stack's comments on HBASE-4605.


This addresses bug HBASE-5070.
https://issues.apache.org/jira/browse/HBASE-5070


Diffs
-

  src/docbkx/book.xml bd3f881 
  src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java 7ce6d45 
  src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java 2d8b4d7 
  src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java 7825466 
  src/main/java/org/apache/hadoop/hbase/constraint/package-info.java 6145ed5 
  
src/test/java/org/apache/hadoop/hbase/constraint/CheckConfigurationConstraint.java
 c49098d 

Diff: https://reviews.apache.org/r/3273/diff


Testing
---

mvn clean test -P localTests -Dest=*Constraint* - all tests pass.


Thanks,

Jesse



 Constraints implementation and javadoc changes
 --

 Key: HBASE-5070
 URL: https://issues.apache.org/jira/browse/HBASE-5070
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu

 This is continuation of HBASE-4605
 See Stack's comments https://reviews.apache.org/r/2579/#review3980

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173457#comment-13173457
 ] 

Phabricator commented on HBASE-5033:


Liyin has commented on the revision [jira][HBASE-5033][[89-fb]]Opening/Closing 
store in parallel to reduce region open/close time.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:294 As you 
suggested, it has already used 2 separate parameters to control the number of 
stores and store files,  which will be opened or closed in parallel.

  For example:
  hbase.hregion.storeCloser.threads.max is to control the number of parallel 
closing stores.
  While hbase.hregion.storeFileCloser.threads.max is to control the number of 
parallel closing store files.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:814 It would 
be easy to control the life cycle of thread pool and decouple the dependency of 
open and close operation if using separate thread pool. There are some details 
explanation in the previous comments :)

  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:536 I am not 
quite sure whether we need to set these threads as daemons since the thread 
pool will be shutdown in the finally block anyway.
  The main thread shall never leave any tasks running in these thread pools 
after the finally block.

  Is there any specific reason ?
  or it will always be safe to set these threads as daemons ?


REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173460#comment-13173460
 ] 

Phabricator commented on HBASE-5033:


lhofhansl has commented on the revision 
[jira][HBASE-5033][[89-fb]]Opening/Closing store in parallel to reduce region 
open/close time.

  Looks good then. Thanks for the explanation.
  Also please see my comment on the jira (this is only helping with request 
latency and might possibly be detrimental to aggregate throughput).

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:294 Oops... 
Didn't see ...storeOpener vs ...storeFileOpener.
  I think traditionally we'd name them store.opener... and storefile.opener...
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:814 Since the 
threadpool is shutdown in a finally clause it is probably ok.

REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2011-12-20 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173461#comment-13173461
 ] 

dhruba borthakur commented on HBASE-5074:
-

Yes, the verification of the checksums would happen when the hfile block is 
loaded into the block cache. it will be entirely in hfile code. also, the 
exception processing would happen in hfile too.

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5073) Registered listeners not getting removed leading to memory leak in HBaseAdmin

2011-12-20 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173462#comment-13173462
 ] 

Lars Hofhansl commented on HBASE-5073:
--

+1

Maybe in another jira we should either disallow passing a Watcher (since 
unremovable listeners will be added to it), or clean up the listeners. That 
applies to 0.92 and trunk as well.

 Registered listeners not getting removed leading to memory leak in HBaseAdmin
 -

 Key: HBASE-5073
 URL: https://issues.apache.org/jira/browse/HBASE-5073
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-5073.patch


 HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog 
 tracker.  Every time Root node tracker and meta node tracker are started and 
 a listener is registered.  But after the operations are performed the 
 listeners are not getting removed. Hence if the admin apis are consistently 
 used then it may lead to memory leak.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5070) Constraints implementation and javadoc changes

2011-12-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173466#comment-13173466
 ] 

jirapos...@reviews.apache.org commented on HBASE-5070:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3273/#review4022
---



src/docbkx/book.xml
https://reviews.apache.org/r/3273/#comment9121

Should read 'checking is enabled'



src/docbkx/book.xml
https://reviews.apache.org/r/3273/#comment9122

When would the URL be active ?
It is not available now.



src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java
https://reviews.apache.org/r/3273/#comment9123

Whitespace should be removed.



src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java
https://reviews.apache.org/r/3273/#comment9124

I think this should start with 'Constraint Class '


- Ted


On 2011-12-20 19:14:46, Jesse Yates wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3273/
bq.  ---
bq.  
bq.  (Updated 2011-12-20 19:14:46)
bq.  
bq.  
bq.  Review request for hbase, Gary Helmling, Ted Yu, and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Follow-up on changes to constraint as per stack's comments on HBASE-4605.
bq.  
bq.  
bq.  This addresses bug HBASE-5070.
bq.  https://issues.apache.org/jira/browse/HBASE-5070
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/docbkx/book.xml bd3f881 
bq.src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java 
7ce6d45 
bq.src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java 2d8b4d7 
bq.src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java 
7825466 
bq.src/main/java/org/apache/hadoop/hbase/constraint/package-info.java 
6145ed5 
bq.
src/test/java/org/apache/hadoop/hbase/constraint/CheckConfigurationConstraint.java
 c49098d 
bq.  
bq.  Diff: https://reviews.apache.org/r/3273/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn clean test -P localTests -Dest=*Constraint* - all tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jesse
bq.  
bq.



 Constraints implementation and javadoc changes
 --

 Key: HBASE-5070
 URL: https://issues.apache.org/jira/browse/HBASE-5070
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu

 This is continuation of HBASE-4605
 See Stack's comments https://reviews.apache.org/r/2579/#review3980

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5063) RegionServers fail to report to backup HMaster after primary goes down.

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173498#comment-13173498
 ] 

Hudson commented on HBASE-5063:
---

Integrated in HBase-0.92-security #46 (See 
[https://builds.apache.org/job/HBase-0.92-security/46/])
HBASE-5063 RegionServers fail to report to backup HMaster after primary 
goes down

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java


 RegionServers fail to report to backup HMaster after primary goes down.
 ---

 Key: HBASE-5063
 URL: https://issues.apache.org/jira/browse/HBASE-5063
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5063.patch, hbase-5063.v2.0.92.patch, 
 hbase-5063.v2.trunk.patch


 # Setup cluster with two HMasters
 # Observe that HM1 is up and that all RS's are in the RegionServer list on 
 web page.
 # Kill (not even -9) the active HMaster
 # Wait for ZK to time out (default 3 minutes).
 # Observe that HM2 is now active.  Tables may show up but RegionServers never 
 report on web page.  Existing connections are fine.  New connections cannot 
 find regionservers.
 Note: 
 * If we replace a new HM1 in the same place and kill HM2, the cluster 
 functions normally again after recovery.  This sees to indicate that 
 regionservers are stuck trying to talk to the old HM1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173502#comment-13173502
 ] 

Phabricator commented on HBASE-5033:


Liyin has commented on the revision [jira][HBASE-5033][[89-fb]]Opening/Closing 
store in parallel to reduce region open/close time.

  Thanks Lars for the reviewing :)
  I just read your comments in the jira and sorry to miss it at first.

  I totally agree with you that we should be very careful about over-parallel 
and overwhelm the region server and name node too much. So these configuration 
parameter really matters.

  Also considering we are still processing each message about region open and 
region close at one time, we may not get a throughput win too much by 
parallelizing the store/store file open and close process.

REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173514#comment-13173514
 ] 

Phabricator commented on HBASE-4218:


mbautin has commented on the revision [jira] [HBASE-4218] Delta encoding for 
keys in HFile.

  Replying to the rest of comments. A new version of the patch will follow.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoder.java:65 
Added missing javadoc for includingMemstoreTS.
  src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoder.java:126 
seekBefore only matters in case of an exact match. I will update the javadoc.
  
src/main/java/org/apache/hadoop/hbase/io/deltaencoder/PrefixKeyDeltaEncoder.java:34
 Updated.
  
src/main/java/org/apache/hadoop/hbase/io/deltaencoder/PrefixKeyDeltaEncoder.java:147
 Added an assertion.
  
src/test/java/org/apache/hadoop/hbase/io/deltaencoder/TestBufferedDeltaEncoder.java:34
 Fixed.
  
src/test/java/org/apache/hadoop/hbase/io/deltaencoder/TestDeltaEncoders.java:47 
Fixed (LargeTests -- runs in 2 minutes).
  
src/test/java/org/apache/hadoop/hbase/io/deltaencoder/TestBufferedDeltaEncoder.java:34
 Fixed (SmallTests).
  src/test/java/org/apache/hadoop/hbase/util/TestByteBufferUtils.java:35 Fixed 
(SmallTests)

REVISION DETAIL
  https://reviews.facebook.net/D447


 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 D447.1.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, 
 D447.6.patch, D447.7.patch, D447.8.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Created) (JIRA)
DistributedLogSplitter failing to split file because it has edits for lots of 
regions
-

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack


Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
many regions and just opening the file per region was taking so long, we were 
never updating our progress and so the split of the log just kept failing; in 
this case, the first 40 edits in a file required our opening 35 files -- 
opening 35 files took longer than the hard-coded 25 seconds its supposed to 
take acquiring the task.

First, here is master's view:

{code}
2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
task not yet acquired 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 ver = 0
...
2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 acquired by sv4r27s44,7003,1324365396664
...
2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
task not yet acquired 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
 ver = 3
{code}

Master then gives it elsewhere.

Over on the regionserver we see:

{code}
2011-12-20 17:54:09,233 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
sv4r27s44,7003,1324365396664 acquired task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679

2011-12-20 17:54:10,714 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
its/0278862.temp, syncFs=true, hflush=false


{code}

 and so on till:

{code}
2011-12-20 17:54:36,876 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 preempted from sv4r27s44,7003,1324365396664, current task state and 
owner=owned sv4r28s44,7003,1324365396678



2011-12-20 17:54:37,112 WARN 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679



{code}

When above happened, we'd only processed 40 edits.  As written, we only 
heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Created) (JIRA)
SplitLogWorker fails to let go of a task, kills the RS
--

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1


I hope I didn't break spacetime continuum, I got this while testing 0.92.0:

{quote}
2011-12-20 03:06:19,838 FATAL 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 done failed because task doesn't exist
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
at java.lang.Thread.run(Thread.java:662)
{quote}

I'll post more logs in a moment. What I can see is that the master shuffled 
that task around a bit and one of the region servers died on this stack trace 
while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Attachment: 5078.txt

Check if we should heartbeat/report progress every time we open a file.

Removed unused local variable totalBytesToSplit

Added new openedNewFile boolean that is set every time we create a new file and 
then cleared each time we go to check if we should report progress.

Removed hard tabs.

Added some to the summary log message.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
 Attachments: 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173548#comment-13173548
 ] 

Jean-Daniel Cryans commented on HBASE-5077:
---

This is from the master's POV:

{quote}
2011-12-20 02:59:42,086 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
put up splitlog task at znode 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
2011-12-20 02:59:42,089 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
task not yet acquired 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 ver = 0
2011-12-20 02:59:42,113 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 acquired by sv4r13s38,62023,1324345934996
2011-12-20 03:00:09,244 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
resubmitting task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
2011-12-20 03:00:09,302 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
task not yet acquired 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 ver = 3
2011-12-20 03:02:53,072 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 acquired by sv4r28s44,62023,1324345934970
2011-12-20 03:03:21,117 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
resubmitting task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
2011-12-20 03:03:21,136 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
task not yet acquired 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 ver = 6
2011-12-20 03:04:40,421 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 acquired by sv4r6s38,62023,1324345935082
2011-12-20 03:05:09,133 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
resubmitting task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
2011-12-20 03:05:09,144 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
task not yet acquired 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 ver = 9
2011-12-20 03:05:09,193 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 acquired by sv4r30s44,62023,1324345935039
2011-12-20 03:05:36,137 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
Skipping resubmissions of task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 because threshold 3 reached
...
2011-12-20 03:05:47,139 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
Skipping resubmissions of task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 because threshold 3 reached
2011-12-20 03:05:50,320 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 entered state done sv4r30s44,62023,1324345935039
{quote}

The one that died is sv4r6s38, the 3rd one to acquire the task. Here's its log:

{quote}
2011-12-20 03:04:40,418 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
sv4r6s38,62023,1324345935082 acquired task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
2011-12-20 03:04:43,574 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 

[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Priority: Critical  (was: Major)

I think this pretty critical; I couldn't successfully split a log for a long 
period of time as the log splitting was moved about among machines failing on 
each.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173560#comment-13173560
 ] 

Zhihong Yu commented on HBASE-5078:
---

Nice finding.

{code}
+// timeout of if that not set, the split log DEFAULT_TIMEOUT)
{code}
The above should read 'timeout or if ...'
{code}
+// ignore edits from this region. It doesn't ezist anymore.
{code}
exist was spelled incorrectly.
{code}
 continue;
   } else {
 logWriters.put(region, wap);
   }
+  openedNewFile = true;
{code}
Assignment to openedNewFile depends on the continue statement. It would be 
better to move the assignment to the else block. Or to remove else block and 
put logWriters.put() call together with the new assignment.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread Mubarak Seyed (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173567#comment-13173567
 ] 

Mubarak Seyed commented on HBASE-4720:
--

when i ran the tests, it fails at 

{code}
Running org.apache.hadoop.hbase.util.TestRegionSplitCalculator
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.077 sec  
FAILURE!

Failed tests:   
testSplitCalculatorEq(org.apache.hadoop.hbase.util.TestRegionSplitCalculator): 
expected:2 but was:1
{code}

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.v1.patch, HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173569#comment-13173569
 ] 

Zhihong Yu commented on HBASE-5077:
---

After preemption log, the following code should have run:
{code}
  void stopTask() {
LOG.info(Sending interrupt to stop the worker thread);
worker.interrupt(); // TODO interrupt often gets swallowed, do what else?
  }
{code}
I think the following method should have been called instead:
{code}
  public void stop() {
exitWorker = true;
stopTask();
  }
{code}

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173574#comment-13173574
 ] 

Jean-Daniel Cryans commented on HBASE-5077:
---

Won't exitWorker kill the SplitLogWorker fully? Like not just the task, but the 
RS will actually stop serving log splitting.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173575#comment-13173575
 ] 

Jean-Daniel Cryans commented on HBASE-5077:
---

One problem I see is that in HLogSplitter.splitLogFileToTemp we do this in the 
finally:

{quote}
if ((progress_failed == false)  (reporter != null) 
(reporter.progress() == false)) {
  progress_failed = true;
}
{quote}

But at this point progress_failed isn't taken into account so the method 
returns true. Looking at other parts of that method it seems it's missing a 
return false which would be correctly handled by SplitLogWorker.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173575#comment-13173575
 ] 

Jean-Daniel Cryans edited comment on HBASE-5077 at 12/20/11 10:35 PM:
--

One problem I see is that in HLogSplitter.splitLogFileToTemp we do this in the 
finally:

{code}
if ((progress_failed == false)  (reporter != null) 
(reporter.progress() == false)) {
  progress_failed = true;
}
{code}

But at this point progress_failed isn't taken into account so the method 
returns true. Looking at other parts of that method it seems it's missing a 
return false which would be correctly handled by SplitLogWorker.

  was (Author: jdcryans):
One problem I see is that in HLogSplitter.splitLogFileToTemp we do this in 
the finally:

{quote}
if ((progress_failed == false)  (reporter != null) 
(reporter.progress() == false)) {
  progress_failed = true;
}
{quote}

But at this point progress_failed isn't taken into account so the method 
returns true. Looking at other parts of that method it seems it's missing a 
return false which would be correctly handled by SplitLogWorker.
  
 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173578#comment-13173578
 ] 

Hudson commented on HBASE-5072:
---

Integrated in HBase-TRUNK #2563 (See 
[https://builds.apache.org/job/HBase-TRUNK/2563/])
[jira] [HBASE-5072] Support Max Value for Per-Store Metrics

Summary:
We were bit in our multi-tenant cluster because one of our Stores
encountered a bug and grew its StoreFile count. We didn't notice this because
the StoreFile count currently reported by the RegionServer is an average of all
Stores in the region. For the per-Store metrics, we should also record the max
so we can notice outliers.

Test Plan: - mvn test -Dtest=TestRegionServerMetrics

Reviewers: JIRA, mbautin, Kannan

Reviewed By: Kannan

CC: stack, nspiegelberg, mbautin, Kannan

Differential Revision: 945

nspiegelberg : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java


 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch, HBASE-5072.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173577#comment-13173577
 ] 

Zhihong Yu commented on HBASE-5077:
---

To answer J-D's question, let me reference the following code from taskLoop():
{code}
  } catch (InterruptedException e) {
LOG.info(SplitLogWorker interrupted while waiting for task, +
   exiting:  + e.toString());
assert exitWorker == true;
return;
  }
{code}
where exitWorker was expected to be true. I think the assertion wasn't 
triggered at runtime.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173584#comment-13173584
 ] 

Jean-Daniel Cryans commented on HBASE-5078:
---

ZK operations are quite expensive, instead of doing it for every file it'd be 
better to do it every 2 or 3 files. 

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5079) DistributedLogSplitter interrupt can be hazardous to regionserver health

2011-12-20 Thread stack (Created) (JIRA)
DistributedLogSplitter interrupt can be hazardous to regionserver health


 Key: HBASE-5079
 URL: https://issues.apache.org/jira/browse/HBASE-5079
 Project: HBase
  Issue Type: Bug
Reporter: stack


The DLS interrupt can kill the regionserver if happens while conversation w/ 
namenode is going on.

The interrupt is used to end a task on regionserver when done whether 
successful or to interrupt an ongoing split since assumed by another server.

I saw this issue testing because I was killing servers.  I also was suffering 
HBASE-5078 DistributedLogSplitter failing to split file because it has edits 
for lots of regions which made it more likely to happen.

Here is what it looks like on the regionserver that died:

{code}
2011-12-20 17:54:58,009 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403495463
 preempted from sv4r13s38,7003,1324365396583, current task state and 
owner=owned sv4r27s44,7003,1324365396664
2011-12-20 17:54:58,009 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop 
the worker thread
2011-12-20 17:54:59,133 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403495463
 preempted from sv4r13s38,7003,1324365396583, current task state and 
owner=owned sv4r27s44,7003,1324365396664
2011-12-20 17:54:59,134 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop 
the worker thread
...
2011-12-20 17:55:25,505 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403495463
 preempted from sv4r13s38,7003,1324365396583, current task state and 
owner=unassigned sv4r11s38,7001,1324365395047
2011-12-20 17:55:25,505 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop 
the worker thread
{code}

Three interrupts are sent over period of 31 seconds or so.

Eventually the interrupt has an effect and I get:

{code}
2011-12-20 17:55:25,505 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop 
the worker thread
2011-12-20 17:55:48,022 DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: 
HLog roll requested
2011-12-20 17:55:58,070 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exception: java.io.IOException: Call to sv4r11s38/10.4.11.38:7000 failed on 
local exception: java.nio.channels.ClosedByInterruptException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy9.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy9.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:779)
at 

[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173592#comment-13173592
 ] 

Jean-Daniel Cryans commented on HBASE-5077:
---

I now understand how I got all the way to closing the files without aborting 
the splitting, the interrupt is being retried by the DFSClient:

{quote}

2011-12-20 03:05:09,194 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 preempted from sv4r6s38,62023,1324345935082, current task state and 
owner=owned sv4r30s44,62023,1324345935039
2011-12-20 03:05:09,194 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop 
the worker thread
2011-12-20 03:05:09,214 INFO org.apache.hadoop.hdfs.DFSClient: Failed to 
connect to /10.4.28.44:51010, add to deadNodes and continue
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:511)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.java:2354)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2033)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.seekToBlockSource(DFSClient.java:2483)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:2119)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2150)
at java.io.DataInputStream.read(DataInputStream.java:132)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1945)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:198)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:172)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:764)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:402)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:351)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:266)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
at java.lang.Thread.run(Thread.java:662)
2011-12-20 03:05:09,216 INFO org.apache.hadoop.hdfs.DFSClient: Failed to 
connect to /10.4.12.38:51010, add to deadNodes and continue
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
...
2011-12-20 03:05:09,220 INFO org.apache.hadoop.hdfs.DFSClient: Failed to 
connect to /10.4.14.38:51010, add to deadNodes and continue
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
...
2011-12-20 03:05:09,223 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain 
block blk_2118163224139708562_43382 from any node: java.io.IOException: No live 
nodes contain current block. Will get new block locations from namenode and 
retry...
{quote}

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 

[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173601#comment-13173601
 ] 

stack commented on HBASE-5078:
--

I'll make changes lads.  Thanks for feedback.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5021:
---

Attachment: D849.3.patch

nspiegelberg updated the revision [jira] [HBase-5021] Enforce upper bound on 
timestamp.
Reviewers: Kannan, Liyin, JIRA

  Talked with Kannan about the latest timestamp bug.  My last iteration didn't 
fix the original issue.  Fixing  adding it to the unit test

REVISION DETAIL
  https://reviews.facebook.net/D849

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173610#comment-13173610
 ] 

Phabricator commented on HBASE-5021:


Kannan has accepted the revision [jira] [HBase-5021] Enforce upper bound on 
timestamp.

  looks good!

REVISION DETAIL
  https://reviews.facebook.net/D849


 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Status: Open  (was: Patch Available)

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Attachment: 5078-v2.txt

How is this?  Addresses Ted and J-D comments.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Status: Patch Available  (was: Open)

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173638#comment-13173638
 ] 

Jean-Daniel Cryans commented on HBASE-5078:
---

Oh also don't bother with progress_failed, I'm going to remove it in HBASE-5077.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173640#comment-13173640
 ] 

Jean-Daniel Cryans commented on HBASE-5078:
---

Oh also don't bother with progress_failed, I'm going to remove it in HBASE-5077.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173637#comment-13173637
 ] 

Jean-Daniel Cryans commented on HBASE-5078:
---

I don't see how everyNopenedFiles is defined in that patch and I don't find it 
in my file.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread Mubarak Seyed (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mubarak Seyed updated HBASE-4720:
-

Attachment: HBASE-4720.trunk.v1.patch

The attached file (HBASE-4720.trunk.v1.patch) contains changes after rebased on 
TRUNK.

Thanks.

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v1.patch, 
 HBASE-4720.v1.patch, HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-5021:
---

Status: Patch Available  (was: Open)

 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch, 
 HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-5021:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch, 
 HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173643#comment-13173643
 ] 

Zhihong Yu commented on HBASE-5078:
---

How about naming everyNopenedFiles as numOpenedFilesBeforeReporting ?

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173648#comment-13173648
 ] 

Phabricator commented on HBASE-5021:


nspiegelberg has committed the revision [jira] [HBase-5021] Enforce upper 
bound on timestamp.

REVISION DETAIL
  https://reviews.facebook.net/D849

COMMIT
  https://reviews.facebook.net/rHBASE1221532


 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch, 
 HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173650#comment-13173650
 ] 

Zhihong Yu commented on HBASE-4720:
---

{code}
[ERROR] 
/Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/rest/RootResource.java:[108,9]
 cannot find symbol
[ERROR] symbol  : class CheckAndPutTableResource
[ERROR] location: class org.apache.hadoop.hbase.rest.RootResource
[ERROR] 
[ERROR] 
/Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/rest/RootResource.java:[114,9]
 cannot find symbol
[ERROR] symbol  : class CheckAndDeleteTableResource
[ERROR] location: class org.apache.hadoop.hbase.rest.RootResource
[ERROR] 
{code}
I think some new files were not added as part of TRUNK patch.

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v1.patch, 
 HBASE-4720.v1.patch, HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173656#comment-13173656
 ] 

Jean-Daniel Cryans commented on HBASE-5021:
---

Nicolas, as this changes some behaviors and adds a configuration option, would 
you mind adding a release note for this jira? Thanks.

 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch, 
 HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-5077:
-

Assignee: Jean-Daniel Cryans

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread Mubarak Seyed (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mubarak Seyed updated HBASE-4720:
-

Attachment: HBASE-4720.trunk.v1.patch

Sorry for the inconvenience, i forgot to do 'svn add file' before the patch. 
The attached file contains updated patch. Thanks.

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v1.patch, 
 HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5077:
--

Attachment: HBASE-5077.patch

Adds the missing return false (I saw it was already fixed in 0.89-fb) and 
removed progress_failed since it doesn't do anything.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5077:
--

Status: Patch Available  (was: Open)

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173664#comment-13173664
 ] 

Zhihong Yu commented on HBASE-5077:
---

Patch looks good.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4720:
--

Attachment: (was: HBASE-4720.trunk.v1.patch)

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4720:
--

Attachment: (was: HBASE-4720.trunk.v1.patch)

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Status: Patch Available  (was: Open)

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Status: Open  (was: Patch Available)

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Attachment: 5078-v3.txt

Address Ted comment and J-Ds suggestion I not use progress_failed.  Hows this?

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173679#comment-13173679
 ] 

Jean-Daniel Cryans commented on HBASE-5078:
---

+1 on v3.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5078:
--

Comment: was deleted

(was: Oh also don't bother with progress_failed, I'm going to remove it in 
HBASE-5077.)

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173678#comment-13173678
 ] 

Hadoop QA commented on HBASE-5078:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508154/5078-v2.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause mvn compile goal to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/557//testReport/
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/557//console

This message is automatically generated.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173682#comment-13173682
 ] 

stack commented on HBASE-5077:
--

Chatting w/ J-D, we shouldn't return out of middle of finally -- should go 
through to end via the file closes.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >