date:20130131


[ 
https://issues.apache.org/jira/browse/HBASE-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567445#comment-13567445
 ] 

Lars Hofhansl commented on HBASE-7725:
--

Lgtm

 Add ability to block on a compaction request for a region
 -

 Key: HBASE-7725
 URL: https://issues.apache.org/jira/browse/HBASE-7725
 Project: HBase
  Issue Type: Bug
  Components: Compaction, Coprocessors, regionserver
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0, 0.94.5

 Attachments: example.java, hbase-7725_0.94-v0.patch


 You can request that a compaction be started, but you can't be sure when that 
 compaction request completes. This is a simple update to the 
 CompactionRequest interface and the compact-split thread on the RS that 
 doesn't actually impact the RS exposed interface.
 This is particularly useful for CPs so they can control starting/running a 
 compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94


[ 
https://issues.apache.org/jira/browse/HBASE-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567447#comment-13567447
 ] 

Lars Hofhansl commented on HBASE-7607:
--

We could keep the RSTracker and just remove the interrupt nonsense. That way 
you test in a loop whether the RS's znode was removed and then end the test.


 Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94
 ---

 Key: HBASE-7607
 URL: https://issues.apache.org/jira/browse/HBASE-7607
 Project: HBase
  Issue Type: Bug
  Components: Client, test
Affects Versions: 0.94.4
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.94.5

 Attachments: HBASE-7607-v2.patch


 TestRegionServerCoprocessorExceptionWithAbort fails sometimes both on trunk 
 and 0.94.X. The codebase is different in both. 
 In 0.94.x, client retries to look at the root region, while the cluster is 
 down and /hbase znode is no longer present.
 Check the value configured in 'zookeeper.znode.parent'. There could be a 
 mismatch with the one configured in the master.
 I will file a separate jira for the trunk as the code is different there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs


[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567448#comment-13567448
 ] 

Lars Hofhansl commented on HBASE-3996:
--

I might be able to deploy this on a test cluster to try tomorrow.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Bryan Baugher
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: 3996-v10.txt, 3996-v11.txt, 3996-v12.txt, 3996-v13.txt, 
 3996-v14.txt, 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 
 3996-v6.txt, 3996-v7.txt, 3996-v8.txt, 3996-v9.txt, HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5664) CP hooks in Scan flow for fast forward when filter filters out a row


[ 
https://issues.apache.org/jira/browse/HBASE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567450#comment-13567450
 ] 

Lars Hofhansl commented on HBASE-5664:
--

Hmm... Yes. Maybe something else slowed it down. Lemme try again.

 CP hooks in Scan flow for fast forward when filter filters out a row
 

 Key: HBASE-5664
 URL: https://issues.apache.org/jira/browse/HBASE-5664
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Filters
Affects Versions: 0.92.1
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-5664_94.patch, HBASE-5664_94_V2.patch, 
 HBASE-5664_94_V3.patch, HBASE-5664_Trunk.patch, HBASE-5664_Trunk_V2.patch


 In HRegion.nextInternal(int limit, String metric)
   We have while(true) loop so as to fetch a next result which satisfies 
 filter condition. When Filter filters out the current fetched row we call 
 nextRow(byte [] currentRow) before going with the next row.
 {code}
 if (results.isEmpty() || filterRow()) {
 // this seems like a redundant step - we already consumed the row
 // there're no left overs.
 // the reasons for calling this method are:
 // 1. reset the filters.
 // 2. provide a hook to fast forward the row (used by subclasses)
 nextRow(currentRow);
 {code}
 // 2. provide a hook to fast forward the row (used by subclasses)
 We can provide same feature of fast forward support for the CP also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7403) Online Merge


[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567457#comment-13567457
 ] 

Hadoop QA commented on HBASE-7403:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12567312/hbase-7403-trunkv13.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.wal.TestHLog

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4269//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4269//console

This message is automatically generated.

 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0, 0.94.6

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch, 
 hbase-7403-trunkv11.patch, hbase-7403-trunkv12.patch, 
 hbase-7403-trunkv13.patch, hbase-7403-trunkv1.patch, 
 hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, 
 hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 Usage:
 1.Tool:  
 bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] 
 table-name region-encodedname-1 region-encodedname-2
 2.API: static void MergeManager#createMergeRequest
 We need merge in the following cases：
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we

[jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients

2013-01-31 Thread nkeywal (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567471#comment-13567471
]

nkeywal commented on HBASE-7590:

bq. How about clients watching the region server's ephemeral nodes.
We would also need to manage the new regionservers and the client disconnect.
There could be extra cases with short lived clients that could hammer ZK. Using
a separate znode allows to share a lot of code between a multicast mode and a
ZK mode. Listening directly all znodes from the client would mean having just a
ZK mode imho (but it could be fine).

Add a costless notifications mechanism from master to regionservers clients
-

Key: HBASE-7590
URL: https://issues.apache.org/jira/browse/HBASE-7590
Project: HBase
Issue Type: Bug
Components: Client, master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal

t would be very useful to add a mechanism to distribute some information to
the clients and regionservers. Especially It would be useful to know globally
(regionservers + clients apps) that some regionservers are dead. This would
allow:
- to lower the load on the system, without clients using staled information
and going on dead machines
- to make the recovery faster from a client point of view. It's common to use
large timeouts on the client side, so the client may need a lot of time
before declaring a region server dead and trying another one. If the client
receives the information separatly about a region server states, it can take
the right decision, and continue/stop to wait accordingly.
We can also send more information, for example instructions like 'slow down'
to instruct the client to increase the retries delay and so on.
Technically, the master could send this information. To lower the load on
the system, we should:
- have a multicast communication (i.e. the master does not have to connect to
all servers by tcp), with once packet every 10 seconds or so.
- receivers should not depend on this: if the information is available great.
If not, it should not break anything.
- it should be optional.
So at the end we would have a thread in the master sending a protobuf message
about the dead servers on a multicast socket. If the socket is not
configured, it does not do anything. On the client side, when we receive an
information that a node is dead, we refresh the cache about it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner


 [ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-7495:
-

Attachment: HBASE-7495-v6.txt

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567496#comment-13567496
 ] 

Anoop Sam John commented on HBASE-7728:
---

LogRoller thread trying to do a rolling over current log file. It captured the 
updateLock already.
{code}
HLog#rollWriter(boolean force)
synchronized (updateLock) {
// Clean up current writer.
Path oldFile = cleanupCurrentWriter(currentFilenum);
this.writer = nextWriter;

}
{code}
As part of the clean up current writer, this thread try to sync the pending 
writes
{code}
HLog#cleanupCurrentWriter(){

sync();
}
this.writer.close();
}
{code}
At the same time logSyncer thread was doing a defered log sync operation
{code}
HLog#syncer(long txid){
 ...
 synchronized (flushLock) {

try {
  logSyncerThread.hlogFlush(tempWriter, pending);
} catch(IOException io) {
  synchronized (this.updateLock) {
// HBASE-4387, HBASE-5623, retry with updateLock held
tempWriter = this.writer;
logSyncerThread.hlogFlush(tempWriter, pending);
  }
}
}
{code}
This thread trying to grab the updateLock and holding the flushLock. Same time 
the roller thread coming and as part of clean up sync it tries to grab 
flushLock.
IOException might have happened in the logSyncer 
thread(logSyncerThread.hlogFlush). At this time our assumption is a log 
rollover already happened. That is why we try to write again with updateLock 
held and getting the writer again. [The writer on which the IOE happened should 
have closed.]

In roller thread the writer close happens after the cleanup operation.
So I guess logSyncerThread.hlogFlush thrown IOE not because of a log roll.
With out assuming the log roll in catch block we can check for tempWriter == 
this.writer; ??

I am not an expert in this area. As per a quick code study adding my 
observation. If wrong pls correct me.  Any logs with you when this happened?

 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker

 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner


 [ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-7495:
-

Attachment: (was: HBASE-7495-v6.txt)

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner


 [ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-7495:
-

Attachment: HBASE-7495-v6.txt

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567502#comment-13567502
 ] 

Anoop Sam John commented on HBASE-7495:
---

{code}
+ListCallableVoid tasks = new 
ArrayListCallableVoid(storeFileScannerCount);
{code}
Why we need this list?

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567508#comment-13567508
 ] 

Liang Xie commented on HBASE-7495:
--

Uploaded patch v6,  move the MVCC  setThreadReadPoint from ScannerSeekWorker 
constructor into call block
For passing config object, seems it need to add a new param in StoreScanner's 
constructor, and probably need to repair many broken test cases? In my patch, 
it's initialized just one time in *static* block, i think it's fine as well.
[~yuzhih...@gmail.com], if we move the ExecutorService to HRegionServer class, 
we need to expose it with *static* getter, right? since in StoreScanner class, 
we could not get the current HRegionServer instance easily in current codebase. 
but if we add a static getter method, it'll bring several FindBug warnings.

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567511#comment-13567511
 ] 

Liang Xie commented on HBASE-7495:
--

Thanks my colleague [~fenghh] for the MVCC code improvement:)

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567519#comment-13567519
 ] 

Anoop Sam John commented on HBASE-7728:
---

With out assuming the log roll in catch block we can check for tempWriter == 
this.writer; ??
- Not correct..
Can we know the IOE because of a parallel writer close?

 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker

 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567534#comment-13567534
 ] 

Hadoop QA commented on HBASE-7495:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567321/HBASE-7495-v6.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4270//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4270//console

This message is automatically generated.

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567539#comment-13567539
 ] 

Liang Xie commented on HBASE-7495:
--

[~anoopsamjohn], good point, it's a trash code came from previous version, let 
me remove it from the v6 file now.

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner


 [ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-7495:
-

Attachment: HBASE-7495-v6.txt

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567541#comment-13567541
 ] 

Hadoop QA commented on HBASE-7495:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567324/HBASE-7495-v6.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4271//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4271//console

This message is automatically generated.

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner


 [ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-7495:
-

Attachment: (was: HBASE-7495-v6.txt)

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567542#comment-13567542
 ] 

Hadoop QA commented on HBASE-7495:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567339/HBASE-7495-v6.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4272//console

This message is automatically generated.

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567546#comment-13567546
 ] 

ramkrishna.s.vasudevan commented on HBASE-7728:
---

Yes  logs will be needed.  
If sync is still going on from cleanUpWriter then it means that this.writer is 
not null still.
If this.writer is not null then the IOE also should not have happened.

If sync has happened then syncedTillHere should have changed.  So nice thing to 
analyse and debug.
The updateLock is needed too while changing the writer.  



 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker

 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner


 [ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-7495:
-

Attachment: HBASE-7495-v7.txt

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt, HBASE-7495-v7.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567550#comment-13567550
 ] 

Anoop Sam John commented on HBASE-7728:
---

Yes Ram the locks are correctly used AFA I have seen
Ideally logSyncerThread.hlogFlush should not throw IOE as it is clear from the 
thread dumb that the roller has not even near the point where it closes and 
reset the current writer.  Still it seems the IOE happened. That is why it is 
asking for the updateLock.

 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker

 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567553#comment-13567553
 ] 

Anoop Sam John commented on HBASE-7728:
---

[~aaronwq] Any chance for logs?

 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker

 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567551#comment-13567551
 ] 

Anoop Sam John commented on HBASE-7728:
---

Sorry.. thread dump ;)

 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker

 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567554#comment-13567554
 ] 

ramkrishna.s.vasudevan commented on HBASE-7728:
---

I am checking with latest 0.94 code.  May be 0.94.2 has some changes as per the 
line no in the thread dump?

 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker

 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5664) CP hooks in Scan flow for fast forward when filter filters out a row


[ 
https://issues.apache.org/jira/browse/HBASE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567557#comment-13567557
 ] 

Anoop Sam John commented on HBASE-5664:
---

Thanks Lars. Yes the performance degrade in your test looks strange.
This patch not adding up any extra lines in the normal scan path.

 CP hooks in Scan flow for fast forward when filter filters out a row
 

 Key: HBASE-5664
 URL: https://issues.apache.org/jira/browse/HBASE-5664
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Filters
Affects Versions: 0.92.1
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-5664_94.patch, HBASE-5664_94_V2.patch, 
 HBASE-5664_94_V3.patch, HBASE-5664_Trunk.patch, HBASE-5664_Trunk_V2.patch


 In HRegion.nextInternal(int limit, String metric)
   We have while(true) loop so as to fetch a next result which satisfies 
 filter condition. When Filter filters out the current fetched row we call 
 nextRow(byte [] currentRow) before going with the next row.
 {code}
 if (results.isEmpty() || filterRow()) {
 // this seems like a redundant step - we already consumed the row
 // there're no left overs.
 // the reasons for calling this method are:
 // 1. reset the filters.
 // 2. provide a hook to fast forward the row (used by subclasses)
 nextRow(currentRow);
 {code}
 // 2. provide a hook to fast forward the row (used by subclasses)
 We can provide same feature of fast forward support for the CP also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


 [ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-7728:
--

Fix Version/s: 0.94.5
   0.96.0

 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker
 Fix For: 0.96.0, 0.94.5


 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer


[ 
https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567570#comment-13567570
 ] 

Anoop Sam John commented on HBASE-7728:
---

I can see getting an IOE wrapping a NPE when the concurrent writer close 
happening
This is from here
{code}
SequenceFileLogWriter
public void append(HLog.Entry entry) throws IOException {
entry.setCompressionContext(compressionContext);
try {
  this.writer.append(entry.getKey(), entry.getEdit());
} catch (NullPointerException npe) {
  // Concurrent close...
  throw new IOException(npe);
}
  }
{code}

 deadlock occurs between hlog roller and hlog syncer
 ---

 Key: HBASE-7728
 URL: https://issues.apache.org/jira/browse/HBASE-7728
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.2
 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux
Reporter: Wang Qiang
Priority: Blocker
 Fix For: 0.96.0, 0.94.5


 the hlog roller thread and hlog syncer thread may occur dead lock with the 
 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread 
 blocked on hlog append. the jstack info is as follow :
 regionserver60020.logRoller:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305)
 - waiting to lock 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657)
 - locked 0x00067d54ace0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 regionserver60020.logSyncer:
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314)
 - waiting to lock 0x00067d54ace0 (a java.lang.Object)
 - locked 0x00067bf88d58 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567575#comment-13567575
 ] 

Hadoop QA commented on HBASE-7495:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567341/HBASE-7495-v7.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4273//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4273//console

This message is automatically generated.

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt, HBASE-7495-v7.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567589#comment-13567589
 ] 

Hudson commented on HBASE-7717:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #386 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/386/])
HBASE-7717 Wait until regions are assigned in TestSplitTransactionOnCluster 
(Lars H and Ted Yu) (Revision 1440800)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java


 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-alternate-94.txt, 
 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567669#comment-13567669
 ] 

Ted Yu commented on HBASE-7495:
---

store.getHRegion() returns the region which has rsServices field.
You can create a package private getter in HRegion so that RegionServerServices 
can be accessed.

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt, HBASE-7495-v7.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567679#comment-13567679
 ] 

Ted Yu commented on HBASE-7495:
---

Can you make TestCoprocessorScanPolicy parameterized test so that 
hbase.storescanner.parallel.seek.enable being true can be exercised ?

 parallel seek in StoreScanner
 -

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, 
 HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, 
 HBASE-7495-v5.txt, HBASE-7495-v6.txt, HBASE-7495-v7.txt


 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients

2013-01-31 Thread nkeywal (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

nkeywal reassigned HBASE-7590:
--

Assignee: nkeywal

Add a costless notifications mechanism from master to regionservers clients
-

Key: HBASE-7590
URL: https://issues.apache.org/jira/browse/HBASE-7590
Project: HBase
Issue Type: Bug
Components: Client, master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal

[jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients

2013-01-31 Thread nkeywal (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567735#comment-13567735
]

nkeywal commented on HBASE-7590:

Actually, one of the issue is that in the client code, we don't really manage
the server name. We use the hostname the port, but we don't use directly the
start code... There is sequence number, but I need to find out if it matches
the start code.

Despite this, I have something working for the server side, and the client
receives the status. The point is to put properly the checks in the client (and
this is unrelated to the communication protocol :-)

Add a costless notifications mechanism from master to regionservers clients
-

[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate


[ 
https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567738#comment-13567738
 ] 

Jonathan Hsieh commented on HBASE-7711:
---


I'm +1 for this patch modulo the nit below, but not convinced this solves all 
the problems.  

The other exception trace adds row name to exn.  Please add this detail to 
exception message?
{code}
 try {
if (!existingLatch.await(this.rowLockWaitDuration,
TimeUnit.MILLISECONDS)) {
  throw new IOException(Timed out on getting lock for row=
  + Bytes.toStringBinary(row));
}
  } catch (InterruptedException ie) {
// Empty
  }
{code}


 rowlock release problem with thread interruptions in batchMutate
 

 Key: HBASE-7711
 URL: https://issues.apache.org/jira/browse/HBASE-7711
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Ted Yu
 Fix For: 0.96.0

 Attachments: 7711.txt, 7711-v2.txt


 An earlier version of snapshots would thread interrupt operations.  In longer 
 term testing we ran into an exception stack trace that indicated that a 
 rowlock was taken an never released.
 {code}
 2013-01-26 01:54:56,417 ERROR 
 org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign 
 exception to subprocedure pe-1
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
 timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
  org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign E
 xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: 
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
 org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign Exception Start:1359194035004, 
 End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71)
 at java.util.TimerThread.mainLoop(Timer.java:512)
 at java.util.TimerThread.run(Timer.java:462)
 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
 Failed getting lock in batch put, row=0001558252
 java.io.IOException: Timed out on getting lock for row=0001558252
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511)
 at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
 .. every snapshot attempt that used this region for the next two days 
 encountered this problem.
 {code}
 Snapshots will now bypass this problem with the fix in HBASE-7703.  However, 
 we should make sure hbase regionserver operations are safe when interrupted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate


 [ 
https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7711:
--

Attachment: 7711-v3.txt

Patch v3 addresses Jon's comment.

 rowlock release problem with thread interruptions in batchMutate
 

 Key: HBASE-7711
 URL: https://issues.apache.org/jira/browse/HBASE-7711
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Ted Yu
 Fix For: 0.96.0

 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt


 An earlier version of snapshots would thread interrupt operations.  In longer 
 term testing we ran into an exception stack trace that indicated that a 
 rowlock was taken an never released.
 {code}
 2013-01-26 01:54:56,417 ERROR 
 org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign 
 exception to subprocedure pe-1
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
 timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
  org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign E
 xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: 
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
 org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign Exception Start:1359194035004, 
 End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71)
 at java.util.TimerThread.mainLoop(Timer.java:512)
 at java.util.TimerThread.run(Timer.java:462)
 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
 Failed getting lock in batch put, row=0001558252
 java.io.IOException: Timed out on getting lock for row=0001558252
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511)
 at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
 .. every snapshot attempt that used this region for the next two days 
 encountered this problem.
 {code}
 Snapshots will now bypass this problem with the fix in HBASE-7703.  However, 
 we should make sure hbase regionserver operations are safe when interrupted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate


[ 
https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567749#comment-13567749
 ] 

Jonathan Hsieh commented on HBASE-7711:
---

lovely.  +1.

 rowlock release problem with thread interruptions in batchMutate
 

 Key: HBASE-7711
 URL: https://issues.apache.org/jira/browse/HBASE-7711
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Ted Yu
 Fix For: 0.96.0

 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt


 An earlier version of snapshots would thread interrupt operations.  In longer 
 term testing we ran into an exception stack trace that indicated that a 
 rowlock was taken an never released.
 {code}
 2013-01-26 01:54:56,417 ERROR 
 org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign 
 exception to subprocedure pe-1
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
 timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
  org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign E
 xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: 
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
 org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign Exception Start:1359194035004, 
 End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71)
 at java.util.TimerThread.mainLoop(Timer.java:512)
 at java.util.TimerThread.run(Timer.java:462)
 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
 Failed getting lock in batch put, row=0001558252
 java.io.IOException: Timed out on getting lock for row=0001558252
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511)
 at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
 .. every snapshot attempt that used this region for the next two days 
 encountered this problem.
 {code}
 Snapshots will now bypass this problem with the fix in HBASE-7703.  However, 
 we should make sure hbase regionserver operations are safe when interrupted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567782#comment-13567782
 ] 

Lars Hofhansl commented on HBASE-7717:
--

Yet another failure 
https://builds.apache.org/job/HBase-0.94/810/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testTableExistsIfTheSpecifiedTableRegionIsSplitParent/.

It seems we always have to wait a bit for the cluster to learn what regions it 
has. Sigh.

Will make a quick addendum.

 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-alternate-94.txt, 
 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


 [ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-7717:
-

Attachment: 7717-addendum-0.94.txt

How's this?

 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 
 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 
 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally

Lars Hofhansl created HBASE-7729:


 Summary: TestCatalogTrackerOnCluster.testbadOriginalRootLocation 
fails occasionally
 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


Failure:
{code}
java.io.IOException: Shutting down
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
at 
org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:18)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:24)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
... 32 more
{code}

Likely caused by this:
{code}
2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] 
master.HMaster(1493): Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
in the failed servers list: example.org/192.0.43.10:1234
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
at 
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at $Proxy19.getProtocolVersion(Unknown Source)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1278)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506)
at

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567807#comment-13567807
 ] 

Ted Yu commented on HBASE-3787:
---

bq. but server failed before returning response. Client retries on the new 
server
HBaseClient should generate a new nonce when request is sent to new server.

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
 Fix For: 0.96.0


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567811#comment-13567811
 ] 

Ted Yu commented on HBASE-7717:
---

lgtm.
{code}
+assertTrue(Table not online, cluster.getRegions(tableName).size() != 0);
{code}
Mind including tableName in the assert message ?

 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 
 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 
 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate


[ 
https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567815#comment-13567815
 ] 

Ted Yu commented on HBASE-7711:
---

Integrated to trunk.

Thanks for the reviews, Matteo and Jon.

 rowlock release problem with thread interruptions in batchMutate
 

 Key: HBASE-7711
 URL: https://issues.apache.org/jira/browse/HBASE-7711
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Ted Yu
 Fix For: 0.96.0

 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt


 An earlier version of snapshots would thread interrupt operations.  In longer 
 term testing we ran into an exception stack trace that indicated that a 
 rowlock was taken an never released.
 {code}
 2013-01-26 01:54:56,417 ERROR 
 org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign 
 exception to subprocedure pe-1
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
 timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
  org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign E
 xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: 
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
 org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign Exception Start:1359194035004, 
 End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71)
 at java.util.TimerThread.mainLoop(Timer.java:512)
 at java.util.TimerThread.run(Timer.java:462)
 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
 Failed getting lock in batch put, row=0001558252
 java.io.IOException: Timed out on getting lock for row=0001558252
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511)
 at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
 .. every snapshot attempt that used this region for the next two days 
 encountered this problem.
 {code}
 Snapshots will now bypass this problem with the fix in HBASE-7703.  However, 
 we should make sure hbase regionserver operations are safe when interrupted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate

[
https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567817#comment-13567817
]

Hadoop QA commented on HBASE-7711:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12567376/7711-v3.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/4274//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/4274//console

This message is automatically generated.

rowlock release problem with thread interruptions in batchMutate

Key: HBASE-7711
URL: https://issues.apache.org/jira/browse/HBASE-7711
Project: HBase
Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Ted Yu
Fix For: 0.96.0

Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt

An earlier version of snapshots would thread interrupt operations. In longer
term testing we ran into an exception stack trace that indicated that a
rowlock was taken an never released.
{code}
2013-01-26 01:54:56,417 ERROR
org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign
exception to subprocedure pe-1
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via
timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed!
Source:Timeout caused Foreign E
xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms
at
org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
at
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321)
at
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150)
at
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56)
at
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
Caused by:
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed!
Source:Timeout caused Foreign Exception Start:1359194035004,
End:1359194095004, diff:6, max:6 ms
at

[jira] [Updated] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate


 [ 
https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7711:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 rowlock release problem with thread interruptions in batchMutate
 

 Key: HBASE-7711
 URL: https://issues.apache.org/jira/browse/HBASE-7711
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Ted Yu
 Fix For: 0.96.0

 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt


 An earlier version of snapshots would thread interrupt operations.  In longer 
 term testing we ran into an exception stack trace that indicated that a 
 rowlock was taken an never released.
 {code}
 2013-01-26 01:54:56,417 ERROR 
 org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign 
 exception to subprocedure pe-1
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
 timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
  org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign E
 xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: 
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
 org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign Exception Start:1359194035004, 
 End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71)
 at java.util.TimerThread.mainLoop(Timer.java:512)
 at java.util.TimerThread.run(Timer.java:462)
 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
 Failed getting lock in batch put, row=0001558252
 java.io.IOException: Timed out on getting lock for row=0001558252
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511)
 at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
 .. every snapshot attempt that used this region for the next two days 
 encountered this problem.
 {code}
 Snapshots will now bypass this problem with the fix in HBASE-7703.  However, 
 we should make sure hbase regionserver operations are safe when interrupted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567834#comment-13567834
 ] 

Lars Hofhansl commented on HBASE-7729:
--

Full test output:
https://builds.apache.org/job/HBase-0.94/808/testReport/org.apache.hadoop.hbase.catalog/TestCatalogTrackerOnCluster/testBadOriginalRootLocation/


 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
   at

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567851#comment-13567851
 ] 

Lars Hofhansl commented on HBASE-7729:
--

[~ghelmling], is this possible related to the client refactor?

 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291)
   at

[jira] [Comment Edited] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567851#comment-13567851
 ] 

Lars Hofhansl edited comment on HBASE-7729 at 1/31/13 5:32 PM:
---

[~ghelmling], is this possibly related to the client refactor?

  was (Author: lhofhansl):
[~ghelmling], is this possible related to the client refactor?
  
 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567862#comment-13567862
 ] 

Lars Hofhansl commented on HBASE-7729:
--

From the logs it looks like that the old master's main thread has exited, but 
the ZK trackers in the Master's CatalogTracker's HConnection are still active 
and receiving events (including the fake root region). It seems we should stop 
the trackers upon HConnection.close().

 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC

[
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567864#comment-13567864
]

Andrew Purtell commented on HBASE-3787:
---

I think the above comments all taken together are a reasonable thing to try:

- Introduce a nonce (generated internally by the client) on non-idempotent
operations to convert them into idempotent ones.

- nonce = hash(client address, table, row, timestamp)

- HBaseClient should generate a new nonce whenever a request is sent to new
server.

- Server tracks nonces by (client address, nonce, timestamp)

- Add the entry when op processing starts, remove it when finished or failed,
refuse to process an op twice by sending back a DoNotRetryException. Perhaps we
introduce a new exception type like OperationInProgressException which inherits
from DoNotRetryException so the client understands the retry operation was
failed because the previous attempt is still pending server side.

- We should append the nonce to the WALEdit, and recover them along with the
entry data.

Increment is non-idempotent but client retries RPC
--

Key: HBASE-3787
URL: https://issues.apache.org/jira/browse/HBASE-3787
Project: HBase
Issue Type: Bug
Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
Fix For: 0.96.0

The HTable.increment() operation is non-idempotent. The client retries the
increment RPC a few times (as specified by configuration) before throwing an
error to the application. This makes it possible that the same increment call
be applied twice at the server.
For increment operations, is it better to use
HConnectionManager.getRegionServerWithoutRetries()? Another option would be
to enhance the IPC module to make the RPC server correctly identify if the
RPC is a retry attempt and handle accordingly.

[jira] [Created] (HBASE-7730) HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92

Jimmy Xiang created HBASE-7730:
--

 Summary: HBaseAdmin#synchronousBalanceSwitch is not compatible 
with 0.92
 Key: HBASE-7730
 URL: https://issues.apache.org/jira/browse/HBASE-7730
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.2
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.94.5


HBASE-4429 introduced synchronousBalanceSwitch to HMaster.  HBaseAdmin uses 
this call (HBASE-5630).  Therefore, hbck and hbase shell are not backward 
compatible with 0.92.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-3787) Increment is non-idempotent but client retries RPC

[
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567864#comment-13567864
]

Andrew Purtell edited comment on HBASE-3787 at 1/31/13 5:50 PM:

I think the above comments all taken together are a reasonable thing to try:

- Introduce a nonce (generated internally by the client) on non-idempotent
operations to convert them into idempotent ones.

- nonce = hash(client address, table, row, timestamp)

- HBaseClient should generate a new nonce whenever a new op is sent to new
server. Reuse the nonce for any retry.

- Server tracks nonces by (client address, nonce, timestamp)

- We should append the nonce to the WALEdit, and recover them along with the
entry data.

was (Author: apurtell):
I think the above comments all taken together are a reasonable thing to try:

- Introduce a nonce (generated internally by the client) on non-idempotent
operations to convert them into idempotent ones.

- nonce = hash(client address, table, row, timestamp)

- HBaseClient should generate a new nonce whenever a request is sent to new
server.

- Server tracks nonces by (client address, nonce, timestamp)

- We should append the nonce to the WALEdit, and recover them along with the
entry data.

Increment is non-idempotent but client retries RPC
--

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567872#comment-13567872
 ] 

Ted Yu commented on HBASE-3787:
---

bq. We should append the nonce to the WALEdit, and recover them along with the 
entry data.
Is the above needed ?

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
 Fix For: 0.96.0


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally

2013-01-31 Thread Gary Helmling (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567873#comment-13567873
 ] 

Gary Helmling commented on HBASE-7729:
--

[~lhofhansl] Certainly possible that the behavior here changed as a result of 
the client refactor, though I don't recall seeing the ZK trackers involved in 
the changed code paths.  But maybe the refactor subtly changed some of the 
previous flow.  I'll take a look.

 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at

[jira] [Comment Edited] (HBASE-3787) Increment is non-idempotent but client retries RPC

[
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567864#comment-13567864
]

Andrew Purtell edited comment on HBASE-3787 at 1/31/13 5:52 PM:

I think the above comments all taken together are a reasonable thing to try:

- Introduce a nonce (generated internally by the client) on non-idempotent
operations to convert them into idempotent ones.

- nonce = hash(client address, table, row, timestamp)

- HBaseClient should generate a new nonce whenever a new op is sent to new
server. Reuse the nonce for any retry.

- Server tracks nonces by (client address, nonce, timestamp). Expire entries
after some grace period. Restart the expiration timer whenever the nonce is
checked as part of op processing. Lazily clean up expired entries either as
part of add/remove or via a chore.

- We should append the nonce to the WALEdit, and recover them along with the
entry data.

was (Author: apurtell):
I think the above comments all taken together are a reasonable thing to try:

- Introduce a nonce (generated internally by the client) on non-idempotent
operations to convert them into idempotent ones.

- nonce = hash(client address, table, row, timestamp)

- HBaseClient should generate a new nonce whenever a new op is sent to new
server. Reuse the nonce for any retry.

- Server tracks nonces by (client address, nonce, timestamp)

- We should append the nonce to the WALEdit, and recover them along with the
entry data.

Increment is non-idempotent but client retries RPC
--

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567874#comment-13567874
 ] 

Andrew Purtell commented on HBASE-3787:
---

bq. Is the above needed ?

I think Enis is right. A server accepts an op, it goes down mid flight, another 
takes over and is processing WAL entries, the client retries and is relocated 
to the new server, without having a nonce the increment would be accepted twice.

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
 Fix For: 0.96.0


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567877#comment-13567877
 ] 

Ted Yu commented on HBASE-3787:
---

bq. without having a nonce the increment would be accepted twice
But there is this assumption:
bq. HBaseClient should generate a new nonce whenever a new op is sent to new 
server

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
 Fix For: 0.96.0


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly

2013-01-31 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567882#comment-13567882
 ] 

Sergey Shelukhin commented on HBASE-7701:
-

The server B crashed after putting the info into meta, before updating ZK. In 
fact we are in SSH for server B... so it should not be expected

 Opening regions on dead server are not reassigned quickly
 -

 Key: HBASE-7701
 URL: https://issues.apache.org/jira/browse/HBASE-7701
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Jimmy Xiang
 Attachments: 
 TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml


 Closed regions are not removed from assignments. I am not sure if it's a 
 general state problem, or just a small bug; for now, one manifestation is 
 that moved region is ignored by SSH of the target server if target server 
 dies before updating ZK.
 {code}
 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] 
 master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:00,997 DEBUG 
 [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] 
 handler.CloseRegionHandler(167): set region closed state in zk successfully 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  sn name: 10.11.2.92,51231,1358906285048
 2013-01-22 17:59:01,088 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.RegionStates(242): Region {NAME =gt; 
 apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;,
  STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, 
 ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=CLOSED, ts=1358906341087, server=null} to 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=OFFLINE, ts=1358906341088, server=null}
 2013-01-22 17:59:01,128 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.AssignmentManager(1596): Assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  to 10.11.2.92,50661,1358906192942
 ... (50661 didn't update ZK to OPEN, only OPENING)
 2013-01-22 17:59:06,605 INFO  
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 
 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are 
 already in transition)
 2013-01-22 17:59:06,605 DEBUG 
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(219): Skip assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  because it has been opened in 10.11.2.92,51231,1358906285048
 {code}
 Note the server in the last line - the one that has long closed the region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly

2013-01-31 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567883#comment-13567883
 ] 

Sergey Shelukhin commented on HBASE-7701:
-

Not assigning region because it's on A is clearly not correct... why would 
master think it's on A when A already sent CloseRegionHandler long ago?

 Opening regions on dead server are not reassigned quickly
 -

 Key: HBASE-7701
 URL: https://issues.apache.org/jira/browse/HBASE-7701
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Jimmy Xiang
 Attachments: 
 TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml


 Closed regions are not removed from assignments. I am not sure if it's a 
 general state problem, or just a small bug; for now, one manifestation is 
 that moved region is ignored by SSH of the target server if target server 
 dies before updating ZK.
 {code}
 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] 
 master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:00,997 DEBUG 
 [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] 
 handler.CloseRegionHandler(167): set region closed state in zk successfully 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  sn name: 10.11.2.92,51231,1358906285048
 2013-01-22 17:59:01,088 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.RegionStates(242): Region {NAME =gt; 
 apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;,
  STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, 
 ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=CLOSED, ts=1358906341087, server=null} to 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=OFFLINE, ts=1358906341088, server=null}
 2013-01-22 17:59:01,128 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.AssignmentManager(1596): Assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  to 10.11.2.92,50661,1358906192942
 ... (50661 didn't update ZK to OPEN, only OPENING)
 2013-01-22 17:59:06,605 INFO  
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 
 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are 
 already in transition)
 2013-01-22 17:59:06,605 DEBUG 
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(219): Skip assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  because it has been opened in 10.11.2.92,51231,1358906285048
 {code}
 Note the server in the last line - the one that has long closed the region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567890#comment-13567890
 ] 

Andrew Purtell commented on HBASE-3787:
---

Good point Ted. So then the client should not retry an increment or append (or 
other nonidempotent op) if it has been relocated. See LarsH's comment at the 
top of this issue, sorry I missed it. And follows the rest of your comment is 
valid too.

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
 Fix For: 0.96.0


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7731) Append/Increment methods in HRegion doesn't check whether the table is readonly or not

2013-01-31 Thread Devaraj Das (JIRA)

Devaraj Das created HBASE-7731:
--

 Summary: Append/Increment methods in HRegion doesn't check whether 
the table is readonly or not
 Key: HBASE-7731
 URL: https://issues.apache.org/jira/browse/HBASE-7731
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das


I bumped into this one - All the mutation calls like Put, Delete check whether 
the region in question is readonly. The append and increment calls don't.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC

[
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567897#comment-13567897
]

Andrew Purtell commented on HBASE-3787:
---

What we really need as a different model for interaction. A bidirectional event
stream between clients and servers. Clients issue requests. Servers (any
server) acknowledges completion. Implies an async client.

In the absence of that we can at least give the client an indication the op has
been processed even through a retry as long as the region doesn't move. (Add to
my OperationInProgressException also OperationAlreadyCompletedException.)

If the region relocates, then we expose some uncertainty to the application by
failing any additional retries. This will be less surprising than current
behavior because we won't have silent application of the same op more than
once, but punts to the app which isn't great either.

Increment is non-idempotent but client retries RPC
--

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567896#comment-13567896
 ] 

Ted Yu commented on HBASE-3787:
---

If client retries on region move, that would allow skipping the append of nonce 
to the WALEdit.

I think that would reduce the complexity of the implementation.

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
 Fix For: 0.96.0


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7730) HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92


[ 
https://issues.apache.org/jira/browse/HBASE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567905#comment-13567905
 ] 

Jonathan Hsieh commented on HBASE-7730:
---

I'd think just changing the calls in hbck in 0.94 to use the now deprecated 
method that is in 0.92 (and commenting why we didn't change it) would be 
sufficient.  Would we expect the cross version shell accesses?

 HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92
 ---

 Key: HBASE-7730
 URL: https://issues.apache.org/jira/browse/HBASE-7730
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.2
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.94.5


 HBASE-4429 introduced synchronousBalanceSwitch to HMaster.  HBaseAdmin uses 
 this call (HBASE-5630).  Therefore, hbck and hbase shell are not backward 
 compatible with 0.92.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7701) Opening regions on dead server are not reassigned quickly


 [ 
https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7701:
---

Attachment: trunk-7701_v1.patch

First version:  https://reviews.apache.org/r/9200/

 Opening regions on dead server are not reassigned quickly
 -

 Key: HBASE-7701
 URL: https://issues.apache.org/jira/browse/HBASE-7701
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Jimmy Xiang
 Attachments: 
 TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml,
  trunk-7701_v1.patch


 Closed regions are not removed from assignments. I am not sure if it's a 
 general state problem, or just a small bug; for now, one manifestation is 
 that moved region is ignored by SSH of the target server if target server 
 dies before updating ZK.
 {code}
 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] 
 master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:00,997 DEBUG 
 [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] 
 handler.CloseRegionHandler(167): set region closed state in zk successfully 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  sn name: 10.11.2.92,51231,1358906285048
 2013-01-22 17:59:01,088 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.RegionStates(242): Region {NAME =gt; 
 apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;,
  STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, 
 ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=CLOSED, ts=1358906341087, server=null} to 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=OFFLINE, ts=1358906341088, server=null}
 2013-01-22 17:59:01,128 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.AssignmentManager(1596): Assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  to 10.11.2.92,50661,1358906192942
 ... (50661 didn't update ZK to OPEN, only OPENING)
 2013-01-22 17:59:06,605 INFO  
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 
 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are 
 already in transition)
 2013-01-22 17:59:06,605 DEBUG 
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(219): Skip assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  because it has been opened in 10.11.2.92,51231,1358906285048
 {code}
 Note the server in the last line - the one that has long closed the region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7701) Opening regions on dead server are not reassigned quickly


 [ 
https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7701:
---

Status: Patch Available  (was: Open)

 Opening regions on dead server are not reassigned quickly
 -

 Key: HBASE-7701
 URL: https://issues.apache.org/jira/browse/HBASE-7701
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Jimmy Xiang
 Attachments: 
 TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml,
  trunk-7701_v1.patch


 Closed regions are not removed from assignments. I am not sure if it's a 
 general state problem, or just a small bug; for now, one manifestation is 
 that moved region is ignored by SSH of the target server if target server 
 dies before updating ZK.
 {code}
 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] 
 master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:00,997 DEBUG 
 [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] 
 handler.CloseRegionHandler(167): set region closed state in zk successfully 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  sn name: 10.11.2.92,51231,1358906285048
 2013-01-22 17:59:01,088 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.RegionStates(242): Region {NAME =gt; 
 apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;,
  STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, 
 ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=CLOSED, ts=1358906341087, server=null} to 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=OFFLINE, ts=1358906341088, server=null}
 2013-01-22 17:59:01,128 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.AssignmentManager(1596): Assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  to 10.11.2.92,50661,1358906192942
 ... (50661 didn't update ZK to OPEN, only OPENING)
 2013-01-22 17:59:06,605 INFO  
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 
 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are 
 already in transition)
 2013-01-22 17:59:06,605 DEBUG 
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(219): Skip assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  because it has been opened in 10.11.2.92,51231,1358906285048
 {code}
 Note the server in the last line - the one that has long closed the region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7730) HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92


[ 
https://issues.apache.org/jira/browse/HBASE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567927#comment-13567927
 ] 

Jimmy Xiang commented on HBASE-7730:


Good point.  I was wondering if it will be used somewhere else in the future.  
I already have a simple fix, just need to make sure it works.

 HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92
 ---

 Key: HBASE-7730
 URL: https://issues.apache.org/jira/browse/HBASE-7730
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.2
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.94.5


 HBASE-4429 introduced synchronousBalanceSwitch to HMaster.  HBaseAdmin uses 
 this call (HBASE-5630).  Therefore, hbck and hbase shell are not backward 
 compatible with 0.92.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC

[
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567930#comment-13567930
]

Andrew Purtell commented on HBASE-3787:
---

[~ted_yu] From the client's point of view, it is still retrying the op even
though the server handling the region has changed. So

- HBaseClient should generate a new nonce for each a new op. Reuse the nonce
for any retry.

Therefore if nonces are persisted to the WAL and recovered from it, the server
will still do the right thing. Your concern is implementation complexity on the
server. I think it is valid, but do you think this outweighs the application
level uncertainty that would happen if a request fails because of a region
relocation? Would the app know if the op applied or not?

Increment is non-idempotent but client retries RPC
--

[jira] [Comment Edited] (HBASE-3787) Increment is non-idempotent but client retries RPC

[
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567930#comment-13567930
]

Andrew Purtell edited comment on HBASE-3787 at 1/31/13 6:48 PM:

[~ted_yu] From the client's point of view, it is still retrying the op even
though the server handling the region has changed. So

- HBaseClient should generate a new nonce for each op. Reuse the nonce for any
retry.

was (Author: apurtell):
[~ted_yu] From the client's point of view, it is still retrying the op even
though the server handling the region has changed. So

- HBaseClient should generate a new nonce for each a new op. Reuse the nonce
for any retry.

Increment is non-idempotent but client retries RPC
--

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC

[
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567944#comment-13567944
]

Ted Yu commented on HBASE-3787:
---

For statement #1:
bq. HBaseClient should generate a new nonce for each op. Reuse the nonce for
any retry.
Agreed.
bq. this outweighs the application level uncertainty that would happen if a
request fails because of a region relocation?
I think statement #1 already achieves what persistence to WAL would achieve.

bq. Would the app know if the op applied or not?
The app would know when the response for operation is not IOException.

Thanks

Increment is non-idempotent but client retries RPC
--

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567953#comment-13567953
 ] 

Lars Hofhansl commented on HBASE-7729:
--

Looking more I doubt it. I seem to recall having seen this before, too.
I ran the test in a loop for an hour, didn't fail, so that could just be a 
weird issue on the jenkins machines.

 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
   at

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567967#comment-13567967
 ] 

Lars Hofhansl commented on HBASE-7717:
--

Should be clear from the call stack, though. Anyway, I'll add the table name to 
the assertion message.

 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 
 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 
 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567974#comment-13567974
 ] 

Ted Yu commented on HBASE-7717:
---

+1

 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 
 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 
 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567990#comment-13567990
 ] 

Lars Hofhansl commented on HBASE-7729:
--

Failed here too: 
https://builds.apache.org/job/HBase-0.94/778/testReport/junit/org.apache.hadoop.hbase.catalog/TestCatalogTrackerOnCluster/testBadOriginalRootLocation/

So not related to the client refactor.

 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
   at

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567997#comment-13567997
 ] 

Lars Hofhansl commented on HBASE-7717:
--

Committed. I very sincerely hope this is the last I'll ever see of this test.

 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 
 7717-addendum-0.94-v2.txt, 7717-addendum-0.96.txt, 7717-alternate-94.txt, 
 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly


[ 
https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568008#comment-13568008
 ] 

ramkrishna.s.vasudevan commented on HBASE-7701:
---

@Jimmy
Went thro the logs and also the patch.  This is again HBASE-6060 or HBASE-7521.
The region is still opening in an RS but that RS goes down before completing 
the transition.  
Jimmy your fix seems fine to me.  Just a small comment on review board.

 Opening regions on dead server are not reassigned quickly
 -

 Key: HBASE-7701
 URL: https://issues.apache.org/jira/browse/HBASE-7701
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Jimmy Xiang
 Attachments: 
 TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml,
  trunk-7701_v1.patch


 Closed regions are not removed from assignments. I am not sure if it's a 
 general state problem, or just a small bug; for now, one manifestation is 
 that moved region is ignored by SSH of the target server if target server 
 dies before updating ZK.
 {code}
 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] 
 master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:00,997 DEBUG 
 [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] 
 handler.CloseRegionHandler(167): set region closed state in zk successfully 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  sn name: 10.11.2.92,51231,1358906285048
 2013-01-22 17:59:01,088 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.RegionStates(242): Region {NAME =gt; 
 apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;,
  STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, 
 ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=CLOSED, ts=1358906341087, server=null} to 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=OFFLINE, ts=1358906341088, server=null}
 2013-01-22 17:59:01,128 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.AssignmentManager(1596): Assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  to 10.11.2.92,50661,1358906192942
 ... (50661 didn't update ZK to OPEN, only OPENING)
 2013-01-22 17:59:06,605 INFO  
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 
 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are 
 already in transition)
 2013-01-22 17:59:06,605 DEBUG 
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(219): Skip assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  because it has been opened in 10.11.2.92,51231,1358906285048
 {code}
 Note the server in the last line - the one that has long closed the region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7730) HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92


[ 
https://issues.apache.org/jira/browse/HBASE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568011#comment-13568011
 ] 

Lars Hofhansl commented on HBASE-7730:
--

We've said in the past that 0.92 and 0.94 should be fully backward and forward 
compatible. That would imply supporting cross version shell access.

 HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92
 ---

 Key: HBASE-7730
 URL: https://issues.apache.org/jira/browse/HBASE-7730
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.2
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.94.5


 HBASE-4429 introduced synchronousBalanceSwitch to HMaster.  HBaseAdmin uses 
 this call (HBASE-5630).  Therefore, hbck and hbase shell are not backward 
 compatible with 0.92.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly


[ 
https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568010#comment-13568010
 ] 

Hadoop QA commented on HBASE-7701:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567403/trunk-7701_v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.wal.TestHLog
  
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor
  org.apache.hadoop.hbase.master.TestAssignmentManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4275//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4275//console

This message is automatically generated.

 Opening regions on dead server are not reassigned quickly
 -

 Key: HBASE-7701
 URL: https://issues.apache.org/jira/browse/HBASE-7701
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Jimmy Xiang
 Attachments: 
 TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml,
  trunk-7701_v1.patch


 Closed regions are not removed from assignments. I am not sure if it's a 
 general state problem, or just a small bug; for now, one manifestation is 
 that moved region is ignored by SSH of the target server if target server 
 dies before updating ZK.
 {code}
 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] 
 master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:00,997 DEBUG 
 [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] 
 handler.CloseRegionHandler(167): set region closed state in zk successfully 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  sn name: 10.11.2.92,51231,1358906285048
 2013-01-22 17:59:01,088 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.RegionStates(242): Region {NAME =gt; 
 apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;,
  STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, 
 ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=CLOSED, ts=1358906341087, server=null} to 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=OFFLINE, ts=1358906341088, server=null}
 2013-01-22 17:59:01,128 INFO

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568015#comment-13568015
 ] 

Lars Hofhansl commented on HBASE-7729:
--

The trackers are not threads unto themselves. So another theory is that the 
reference count of the connection used in the CatalogTracker did not go to 0 
and hence the Connection's ZKW is never stopped.


 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
   at

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568017#comment-13568017
 ] 

ramkrishna.s.vasudevan commented on HBASE-7717:
---

Yes Lars.  Thanks a lot for your patience on this.
I try to spend time on these failures but don't find time.  May be next time 
you can just ping me to take a look at it.  And i will do my part if it fails 
next time :(

 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 
 7717-addendum-0.94-v2.txt, 7717-addendum-0.96.txt, 7717-alternate-94.txt, 
 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7404) Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE

2013-01-31 Thread Rishit Shroff (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568019#comment-13568019
 ] 

Rishit Shroff commented on HBASE-7404:
--

Thanks! Any particular reason that it was used in combination with LRU Block 
Cache and not a replacement of it in the first use case??

 Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE
 --

 Key: HBASE-7404
 URL: https://issues.apache.org/jira/browse/HBASE-7404
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7404-trunk-v10.patch, 7404-trunk-v11.patch, 
 7404-trunk-v12.patch, 7404-trunk-v13.patch, 7404-trunk-v13.txt, 
 7404-trunk-v14.patch, BucketCache.pdf, hbase-7404-94v2.patch, 
 hbase-7404-trunkv2.patch, hbase-7404-trunkv9.patch, Introduction of Bucket 
 Cache.pdf


 First, thanks @neil from Fusion-IO share the source code.
 Usage:
 1.Use bucket cache as main memory cache, configured as the following:
 –hbase.bucketcache.ioengine heap
 –hbase.bucketcache.size 0.4 (size for bucket cache, 0.4 is a percentage of 
 max heap size)
 2.Use bucket cache as a secondary cache, configured as the following:
 –hbase.bucketcache.ioengine file:/disk1/hbase/cache.data(The file path 
 where to store the block data)
 –hbase.bucketcache.size 1024 (size for bucket cache, unit is MB, so 1024 
 means 1GB)
 –hbase.bucketcache.combinedcache.enabled false (default value being true)
 See more configurations from org.apache.hadoop.hbase.io.hfile.CacheConfig and 
 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache
 What's Bucket Cache? 
 It could greatly decrease CMS and heap fragment by GC
 It support a large cache space for High Read Performance by using high speed 
 disk like Fusion-io
 1.An implementation of block cache like LruBlockCache
 2.Self manage blocks' storage position through Bucket Allocator
 3.The cached blocks could be stored in the memory or file system
 4.Bucket Cache could be used as a mainly block cache(see CombinedBlockCache), 
 combined with LruBlockCache to decrease CMS and fragment by GC.
 5.BucketCache also could be used as a secondary cache(e.g. using Fusionio to 
 store block) to enlarge cache space
 How about SlabCache?
 We have studied and test SlabCache first, but the result is bad, because:
 1.SlabCache use SingleSizeCache, its use ratio of memory is low because kinds 
 of block size, especially using DataBlockEncoding
 2.SlabCache is uesd in DoubleBlockCache, block is cached both in SlabCache 
 and LruBlockCache, put the block to LruBlockCache again if hit in SlabCache , 
 it causes CMS and heap fragment don't get any better
 3.Direct heap performance is not good as heap, and maybe cause OOM, so we 
 recommend using heap engine 
 See more in the attachment and in the patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568022#comment-13568022
 ] 

Andrew Purtell commented on HBASE-3787:
---

How does the client know if the op failed before or after it was persisted to 
the WAL without a way to check? 

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
 Fix For: 0.96.0


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck


[ 
https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568028#comment-13568028
 ] 

ramkrishna.s.vasudevan commented on HBASE-7698:
---

Here we can set a boolean saying if transition happened to FAILED_OPEN .
If openSuccess == false and the new flag is also false then we can try doing 
the update to FAILED_OPEN once in finally block.
Any way any ZK exception while doing this FAILED_OPEN update has to be thrown 
out i feel.


 race between RS shutdown thread and openregionhandler causes region to get 
 stuck
 

 Key: HBASE-7698
 URL: https://issues.apache.org/jira/browse/HBASE-7698
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 2013-01-22 17:59:03,237 INFO  [Shutdown of 
 org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] 
 hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing 
 fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08
 ...
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): 
 Closing 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.:
  disabling compactions amp; flushes
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): 
 Updates disabled for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:03,415 ERROR 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.io.IOException: java.io.IOException: java.io.IOException: Filesystem 
 closed
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never 
 called and region can get stuck.
 As an added benefit, the meta is already written by that time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568043#comment-13568043
 ] 

Ted Yu commented on HBASE-3787:
---

clarification:
bq. HBaseClient should generate a new nonce for each op. Reuse the nonce for 
any retry.
Here the nonce is reused when retrying against new region server, right ?

If so, we're on the same page - WALEdit needs to accommodate nonce.

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.96.0, 0.94.4
Reporter: dhruba borthakur
Priority: Critical
 Fix For: 0.96.0


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly


[ 
https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568046#comment-13568046
 ] 

Jimmy Xiang commented on HBASE-7701:


@Ram, you are right.  The original fix is to let timeout monitor handle it.  
But that's not fast enough.

 Opening regions on dead server are not reassigned quickly
 -

 Key: HBASE-7701
 URL: https://issues.apache.org/jira/browse/HBASE-7701
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Jimmy Xiang
 Attachments: 
 TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml,
  trunk-7701_v1.patch


 Closed regions are not removed from assignments. I am not sure if it's a 
 general state problem, or just a small bug; for now, one manifestation is 
 that moved region is ignored by SSH of the target server if target server 
 dies before updating ZK.
 {code}
 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] 
 master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:00,997 DEBUG 
 [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] 
 handler.CloseRegionHandler(167): set region closed state in zk successfully 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  sn name: 10.11.2.92,51231,1358906285048
 2013-01-22 17:59:01,088 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.RegionStates(242): Region {NAME =gt; 
 apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;,
  STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, 
 ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=CLOSED, ts=1358906341087, server=null} to 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=OFFLINE, ts=1358906341088, server=null}
 2013-01-22 17:59:01,128 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.AssignmentManager(1596): Assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  to 10.11.2.92,50661,1358906192942
 ... (50661 didn't update ZK to OPEN, only OPENING)
 2013-01-22 17:59:06,605 INFO  
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 
 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are 
 already in transition)
 2013-01-22 17:59:06,605 DEBUG 
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(219): Skip assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  because it has been opened in 10.11.2.92,51231,1358906285048
 {code}
 Note the server in the last line - the one that has long closed the region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7701) Opening regions on dead server are not reassigned quickly


 [ 
https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7701:
---

Status: Open  (was: Patch Available)

 Opening regions on dead server are not reassigned quickly
 -

 Key: HBASE-7701
 URL: https://issues.apache.org/jira/browse/HBASE-7701
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Jimmy Xiang
 Attachments: 
 TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml,
  trunk-7701_v1.patch


 Closed regions are not removed from assignments. I am not sure if it's a 
 general state problem, or just a small bug; for now, one manifestation is 
 that moved region is ignored by SSH of the target server if target server 
 dies before updating ZK.
 {code}
 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] 
 master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:00,997 DEBUG 
 [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] 
 handler.CloseRegionHandler(167): set region closed state in zk successfully 
 for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  sn name: 10.11.2.92,51231,1358906285048
 2013-01-22 17:59:01,088 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.RegionStates(242): Region {NAME =gt; 
 apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;,
  STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, 
 ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=CLOSED, ts=1358906341087, server=null} to 
 {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  state=OFFLINE, ts=1358906341088, server=null}
 2013-01-22 17:59:01,128 INFO  
 [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] 
 master.AssignmentManager(1596): Assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  to 10.11.2.92,50661,1358906192942
 ... (50661 didn't update ZK to OPEN, only OPENING)
 2013-01-22 17:59:06,605 INFO  
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 
 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are 
 already in transition)
 2013-01-22 17:59:06,605 DEBUG 
 [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] 
 handler.ServerShutdownHandler(219): Skip assigning region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
  because it has been opened in 10.11.2.92,51231,1358906285048
 {code}
 Note the server in the last line - the one that has long closed the region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate


[ 
https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568063#comment-13568063
 ] 

Hudson commented on HBASE-7711:
---

Integrated in HBase-TRUNK #3833 (See 
[https://builds.apache.org/job/HBase-TRUNK/3833/])
HBASE-7711 rowlock release problem with thread interruptions in batchMutate 
(Ted Yu) (Revision 1441066)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


 rowlock release problem with thread interruptions in batchMutate
 

 Key: HBASE-7711
 URL: https://issues.apache.org/jira/browse/HBASE-7711
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh
Assignee: Ted Yu
 Fix For: 0.96.0

 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt


 An earlier version of snapshots would thread interrupt operations.  In longer 
 term testing we ran into an exception stack trace that indicated that a 
 rowlock was taken an never released.
 {code}
 2013-01-26 01:54:56,417 ERROR 
 org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign 
 exception to subprocedure pe-1
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
 timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
  org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign E
 xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: 
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
 org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! 
 Source:Timeout caused Foreign Exception Start:1359194035004, 
 End:1359194095004, diff:6, max:6 ms
 at 
 org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71)
 at java.util.TimerThread.mainLoop(Timer.java:512)
 at java.util.TimerThread.run(Timer.java:462)
 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
 Failed getting lock in batch put, row=0001558252
 java.io.IOException: Timed out on getting lock for row=0001558252
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511)
 at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
 .. every snapshot attempt that used this region for the next two days 
 encountered this problem.
 {code}
 Snapshots will now bypass this problem with the fix in HBASE-7703.  However, 
 we should make sure hbase regionserver operations are safe when interrupted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568066#comment-13568066
 ] 

Lars Hofhansl commented on HBASE-7729:
--

OK... Since I cannot reproduce locally at all... We can close, or I could 
opportunistically add a 1s wait between cluster shutdown and the subsequent 
restart of the cluster to give the ZKWs a chance to settle their affairs.
(It looks like this might be unique scenario with an HMaster stopping and 
restarting in the same JVM)

 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at

[jira] [Assigned] (HBASE-7723) Remove NN URI from ZK splitlogs.


 [ 
https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha reassigned HBASE-7723:
--

Assignee: Himanshu Vashishtha

 Remove NN URI from ZK splitlogs.
 

 Key: HBASE-7723
 URL: https://issues.apache.org/jira/browse/HBASE-7723
 Project: HBase
  Issue Type: Bug
  Components: hadoop2, master
Affects Versions: 0.92.0
Reporter: Kevin Odell
Assignee: Himanshu Vashishtha

 When moving to HDFS HA or removing HA we end up changing the NN namespace.  
 This can cause the HMaster not to start up fully due to trying to split 
 phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: 
 Wrong FS: error messages.  The HLogs in question might not even be on HDFS 
 anymore.  You have to go in a manually clear out the ZK splitlogs directory 
 to get HBase to properly boot up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568090#comment-13568090
 ] 

Lars Hofhansl commented on HBASE-7729:
--

Looks like it just happened again in the latest build.

 TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
 --

 Key: HBASE-7729
 URL: https://issues.apache.org/jira/browse/HBASE-7729
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Failure:
 {code}
 java.io.IOException: Shutting down
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650)
   at 
 org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at org.junit.runners.Suite.runChild(Suite.java:128)
   at org.junit.runners.Suite.runChild(Suite.java:24)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.RuntimeException: Master not initialized after 200 
 seconds
   at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
   at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
   at 
 org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216)
   ... 32 more
 {code}
 Likely caused by this:
 {code}
 2013-01-31 04:52:23,064 FATAL 
 [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
 in the failed servers list: example.org/192.0.43.10:1234
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
   at $Proxy19.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291)
   at

[jira] [Commented] (HBASE-7723) Remove NN URI from ZK splitlogs.


[ 
https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568091#comment-13568091
 ] 

Himanshu Vashishtha commented on HBASE-7723:


a patch which removes storing the NN uri in split log znodes. It keeps the 
.logs znode while creating the znode in the SplitLogManager, and SplitLogWorker 
re-creates the path using the hbase.rootdir to the actual log. 

 Remove NN URI from ZK splitlogs.
 

 Key: HBASE-7723
 URL: https://issues.apache.org/jira/browse/HBASE-7723
 Project: HBase
  Issue Type: Bug
  Components: hadoop2, master
Affects Versions: 0.92.0
Reporter: Kevin Odell
Assignee: Himanshu Vashishtha

 When moving to HDFS HA or removing HA we end up changing the NN namespace.  
 This can cause the HMaster not to start up fully due to trying to split 
 phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: 
 Wrong FS: error messages.  The HLogs in question might not even be on HDFS 
 anymore.  You have to go in a manually clear out the ZK splitlogs directory 
 to get HBase to properly boot up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7723) Remove NN URI from ZK splitlogs.


 [ 
https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-7723:
---

Attachment: HBASE-7723-94.patch

 Remove NN URI from ZK splitlogs.
 

 Key: HBASE-7723
 URL: https://issues.apache.org/jira/browse/HBASE-7723
 Project: HBase
  Issue Type: Bug
  Components: hadoop2, master
Affects Versions: 0.92.0
Reporter: Kevin Odell
Assignee: Himanshu Vashishtha
 Attachments: HBASE-7723-94.patch


 When moving to HDFS HA or removing HA we end up changing the NN namespace.  
 This can cause the HMaster not to start up fully due to trying to split 
 phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: 
 Wrong FS: error messages.  The HLogs in question might not even be on HDFS 
 anymore.  You have to go in a manually clear out the ZK splitlogs directory 
 to get HBase to properly boot up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568096#comment-13568096
 ] 

Hudson commented on HBASE-7717:
---

Integrated in HBase-0.94 #812 (See 
[https://builds.apache.org/job/HBase-0.94/812/])
HBASE-7717 addendum, really wait for all tables in 
TestSplitTransactionOnCluster. (Revision 1441151)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java


 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 
 7717-addendum-0.94-v2.txt, 7717-addendum-0.96.txt, 7717-alternate-94.txt, 
 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7723) Remove NN URI from ZK splitlogs.


[ 
https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568097#comment-13568097
 ] 

Himanshu Vashishtha commented on HBASE-7723:


I tested this with a clean zk slate. This is so because it removes the NN uri, 
otherwise, old znodes will point to non-existent log files. If this approaches 
sounds good, I will do a similar change for replication znodes handling too.

 Remove NN URI from ZK splitlogs.
 

 Key: HBASE-7723
 URL: https://issues.apache.org/jira/browse/HBASE-7723
 Project: HBase
  Issue Type: Bug
  Components: hadoop2, master
Affects Versions: 0.92.0
Reporter: Kevin Odell
Assignee: Himanshu Vashishtha
 Attachments: HBASE-7723-94.patch


 When moving to HDFS HA or removing HA we end up changing the NN namespace.  
 This can cause the HMaster not to start up fully due to trying to split 
 phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: 
 Wrong FS: error messages.  The HLogs in question might not even be on HDFS 
 anymore.  You have to go in a manually clear out the ZK splitlogs directory 
 to get HBase to properly boot up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7723) Remove NN URI from ZK splitlogs.


[ 
https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568114#comment-13568114
 ] 

Jimmy Xiang commented on HBASE-7723:


How do you handle compatibility/migration issue?

 Remove NN URI from ZK splitlogs.
 

 Key: HBASE-7723
 URL: https://issues.apache.org/jira/browse/HBASE-7723
 Project: HBase
  Issue Type: Bug
  Components: hadoop2, master
Affects Versions: 0.92.0
Reporter: Kevin Odell
Assignee: Himanshu Vashishtha
 Attachments: HBASE-7723-94.patch


 When moving to HDFS HA or removing HA we end up changing the NN namespace.  
 This can cause the HMaster not to start up fully due to trying to split 
 phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: 
 Wrong FS: error messages.  The HLogs in question might not even be on HDFS 
 anymore.  You have to go in a manually clear out the ZK splitlogs directory 
 to get HBase to properly boot up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7723) Remove NN URI from ZK splitlogs.


[ 
https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568121#comment-13568121
 ] 

Ted Yu commented on HBASE-7723:
---

I got the following test failure:

testDelayedDeleteOnFailure(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)
  Time elapsed: 25.702 sec   ERROR!
java.util.concurrent.ExecutionException: 
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
  at java.util.concurrent.FutureTask.get(FutureTask.java:91)
  at 
org.apache.hadoop.hbase.master.TestDistributedLogSplitting.testDelayedDeleteOnFailure(TestDistributedLogSplitting.java:316)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
  at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
  at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
  at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
  at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1
  at java.lang.String.substring(String.java:1931)
  at java.lang.String.substring(String.java:1904)
  at 
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:260)
  at 
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:228)
  at 
org.apache.hadoop.hbase.master.TestDistributedLogSplitting$2.run(TestDistributedLogSplitting.java:297)

 Remove NN URI from ZK splitlogs.
 

 Key: HBASE-7723
 URL: https://issues.apache.org/jira/browse/HBASE-7723
 Project: HBase
  Issue Type: Bug
  Components: hadoop2, master
Affects Versions: 0.92.0
Reporter: Kevin Odell
Assignee: Himanshu Vashishtha
 Attachments: HBASE-7723-94.patch


 When moving to HDFS HA or removing HA we end up changing the NN namespace.  
 This can cause the HMaster not to start up fully due to trying to split 
 phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: 
 Wrong FS: error messages.  The HLogs in question might not even be on HDFS 
 anymore.  You have to go in a manually clear out the ZK splitlogs directory 
 to get HBase to properly boot up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster


[ 
https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568126#comment-13568126
 ] 

Hudson commented on HBASE-7717:
---

Integrated in HBase-TRUNK #3834 (See 
[https://builds.apache.org/job/HBase-TRUNK/3834/])
HBASE-7717 addendum, really wait for all tables in 
TestSplitTransactionOnCluster. (Revision 1441150)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java


 Wait until regions are assigned in TestSplitTransactionOnCluster
 

 Key: HBASE-7717
 URL: https://issues.apache.org/jira/browse/HBASE-7717
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.96.0, 0.94.5

 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 
 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 
 7717-addendum-0.94-v2.txt, 7717-addendum-0.96.txt, 7717-alternate-94.txt, 
 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, 
 TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml


 I've seen various failures where a table is created in the tests and then all 
 regions are retrieved from the cluster, where the number of returned regions 
 is 0, because the region have not been assigned, yet, or the AM does not know 
 about them, yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7733) Fix flaky TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare

Jonathan Hsieh created HBASE-7733:
-

 Summary: Fix flaky 
TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare
 Key: HBASE-7733
 URL: https://issues.apache.org/jira/browse/HBASE-7733
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh


Sometimes this test fails with this error message:

{code}
 Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at 
org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344)
  However, there were other interactions with this mock: - at 
org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306)
 - at 
org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311)
 - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:205) - 
at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:205) - at 
org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:228)
 - at 
org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:228)
 - at 
org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:183)
 - at 
org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337)
 
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-7733) Fix flaky TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare