date:20120116

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186768#comment-13186768
]

Hadoop QA commented on HBASE-5203:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12510675/5203.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated -145 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 82 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.master.TestDistributedLogSplitting
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/774//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/774//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/774//console

This message is automatically generated.

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186770#comment-13186770
 ] 

gaojinchao commented on HBASE-5179:
---

@chunhui
Maybe it has a problem. the number of shutdownhandler thread pool is 
3(default), If there are more than 3 deadserver is processing. we will wait 
forever.



 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 
 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 
 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, 
 hbase-5179v6.patch, hbase-5179v7.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

2012-01-16 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186781#comment-13186781
 ] 

chunhui shen commented on HBASE-5179:
-

@Jinchao
I think it is another problem.
If we restart general three RS, and then kill META server,
The first three ServerShutdownHandler will wait meta region, however 
METAServerShutdownHandler will not be processed because shutdownhandler thread 
pool is full until one ServerShutdownHandler is finished. So it exists a 
forever wait.

 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 
 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 
 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, 
 hbase-5179v6.patch, hbase-5179v7.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186786#comment-13186786
 ] 

gaojinchao commented on HBASE-5179:
---

@chunhui
Regarding to a normal flow. METAServerShutdownHandler use different thread 
pool. only init flow, scome cases we can't distinguish meta region server.


 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 
 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 
 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, 
 hbase-5179v6.patch, hbase-5179v7.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

2012-01-16 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186804#comment-13186804
 ] 

chunhui shen commented on HBASE-5179:
-

@Jinchao
So, we need ensure dead meta server （which is consider as a general server） is 
processed by SSH when master initializing? 
Otherwise, one of meta-data loss or waiting forever must happen?

 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 
 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 
 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, 
 hbase-5179v6.patch, hbase-5179v7.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-16 Thread Jieshan Bean (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-5153:


Attachment: HBASE-5153-V6-90-minorchange.patch

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-16 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186809#comment-13186809
 ] 

Hadoop QA commented on HBASE-5153:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510683/HBASE-5153-V6-90-minorchange.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/775//console

This message is automatically generated.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers


 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5153:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510683/HBASE-5153-V6-90-minorchange.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/775//console

This message is automatically generated.)

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

2012-01-16 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186819#comment-13186819
 ] 

ramkrishna.s.vasudevan commented on HBASE-5203:
---

@Lars
The coprocessor postPut and postDelete
{code}
if (m instanceof Put) {
  coprocessorHost.postPut((Put) m, walEdits.get(i),
  m.getWriteToWAL());
} else if (m instanceof Delete) {
  coprocessorHost.postDelete((Delete) m, walEdits.get(i),
  m.getWriteToWAL());
}
{code}
Can this be done even if any failures in the internalPut or internalDelete()? 
Just correct me if am wrong.  

 Group atomic put/delete operation into a single WALEdit to handle region 
 server failures.
 -

 Key: HBASE-5203
 URL: https://issues.apache.org/jira/browse/HBASE-5203
 Project: HBase
  Issue Type: Sub-task
  Components: client, coprocessors, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5203.txt


 HBASE-3584 does not not provide fully atomic operation in case of region 
 server failures (see explanation there).
 What should happen is that either (1) all edits are applied via a single 
 WALEdit, or (2) the WALEdits are applied in async mode and then sync'ed 
 together.
 For #1 it is not clear whether it is advisable to manage multiple *different* 
 operations (Put/Delete) via a single WAL edit. A quick check reveals that WAL 
 replay on region startup would work, but that replication would need to be 
 adapted. The refactoring needed would be non-trivial.
 #2 Might actually not work, as another operation could request sync'ing a 
 later edit and hence flush these entries out as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-16 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5204:
-

 Priority: Blocker  (was: Major)
Fix Version/s: 0.92.0

 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-16 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186906#comment-13186906
 ] 

stack commented on HBASE-5204:
--

+1 on patch.  TestFromClientSide works locally for me.  I'm going to commit.

 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-16 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186907#comment-13186907
 ] 

stack commented on HBASE-5204:
--

Committed branch and trunk.

 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread Nicholas Telford (Created) (JIRA)

Allow setting Scan start/stop row individually in TableInputFormat
--

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor


Currently, TableInputFormat initializes a serialized Scan from 
hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
properties defined in hbase.mapreduce.scan.*. However, of these properties 
the start row and stop row (arguably the most pertinent) are missing.

TableInputFormat should permit the specification of a start/stop row as with 
the other fields using a new pair of properties: 
hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end

The primary use-case for this is to permit Oozie and other job management tools 
that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186928#comment-13186928
 ] 

gaojinchao commented on HBASE-5179:
---

In patch v7, Can we replace process expired server to public void 
splitLog(final String serverName)?

 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 
 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 
 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, 
 hbase-5179v6.patch, hbase-5179v7.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread Nicholas Telford (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Telford updated HBASE-5208:


Attachment: HBASE-5208-001.txt

Adds hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.stop 
options to TableInputFormat to permit defining start/stop row separately.

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread Nicholas Telford (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Telford updated HBASE-5208:


Release Note: Added hbase.mapreduce.scan.row.start and 
hbase.mapreduce.scan.row.stop for defining start and stop rows for a 
MapReduce job without having to serialize a Scan object.
  Status: Patch Available  (was: Open)

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

[
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186936#comment-13186936
]

Hadoop QA commented on HBASE-5208:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12510704/HBASE-5208-001.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/776//console

This message is automatically generated.

Allow setting Scan start/stop row individually in TableInputFormat
--

Key: HBASE-5208
URL: https://issues.apache.org/jira/browse/HBASE-5208
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
Attachments: HBASE-5208-001.txt

Currently, TableInputFormat initializes a serialized Scan from
hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using
properties defined in hbase.mapreduce.scan.*. However, of these properties
the start row and stop row (arguably the most pertinent) are missing.
TableInputFormat should permit the specification of a start/stop row as with
the other fields using a new pair of properties:
hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
The primary use-case for this is to permit Oozie and other job management
tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a
contiguous subset of rows.

[jira] [Commented] (HBASE-4191) hbase load balancer needs locality awareness

[
https://issues.apache.org/jira/browse/HBASE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186941#comment-13186941
]

gaojinchao commented on HBASE-4191:
---

@Liyin
This is a good feature, How do you process now?

hbase load balancer needs locality awareness

Key: HBASE-4191
URL: https://issues.apache.org/jira/browse/HBASE-4191
Project: HBase
Issue Type: New Feature
Reporter: Ted Yu
Assignee: Liyin Tang

Previously, HBASE-4114 implements the metrics for HFile HDFS block locality,
which provides the HFile level locality information.
But in order to work with load balancer and region assignment, we need the
region level locality information.
Let's define the region locality information first, which is almost the same
as HFile locality index.
HRegion locality index (HRegion A, RegionServer B) =
(Total number of HDFS blocks that can be retrieved locally by the
RegionServer B for the HRegion A) / ( Total number of the HDFS blocks for the
Region A)
So the HRegion locality index tells us that how much locality we can get if
the HMaster assign the HRegion A to the RegionServer B.
So there will be 2 steps involved to assign regions based on the locality.
1) During the cluster start up time, the master will scan the hdfs to
calculate the HRegion locality index for each pair of HRegion and Region
Server. It is pretty expensive to scan the dfs. So we only needs to do this
once during the start up time.
2) During the cluster run time, each region server will update the HRegion
locality index as metrics periodically as HBASE-4114 did. The Region Server
can expose them to the Master through ZK, meta table, or just RPC messages.
Based on the HRegion locality index, the assignment manager in the master
would have a global knowledge about the region locality distribution and can
run the MIN COST MAXIMUM FLOW solver to reach the global optimization.
Let's construct the graph first:
[Graph]
Imaging there is a bipartite graph and the left side is the set of regions
and the right side is the set of region servers.
There is a source node which links itself to each node in the region set.
There is a sink node which is linked from each node in the region server set.
[Capacity]
The capacity between the source node and region nodes is 1.
And the capacity between the region nodes and region server nodes is also 1.
(The purpose is each region can ONLY be assigned to one region server at one
time)
The capacity between the region server nodes and sink node are the avg number
of regions which should be assigned each region server.
(The purpose is balance the load for each region server)
[Cost]
The cost between each region and region server is the opposite of locality
index, which means the higher locality is, if region A is assigned to region
server B, the lower cost it is.
The cost function could be more sophisticated when we put more metrics into
account.
So after running the min-cost max flow solver, the master could assign the
regions based on the global locality optimization.
Also the master should share this global view to secondary master in case the
master fail over happens.
In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on
the same metrics, to proactively to scan dfs to calculate the global locality
information in the cluster. It will help us to verify data locality
information during the run time.

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat


[ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186945#comment-13186945
 ] 

Zhihong Yu commented on HBASE-5208:
---

@Nicolas:
You should use --no-prefix to generate your patch so that Hadoop Qa can run it. 

This is a useful feature. Can you add a unit test for it ?

Thanks

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt, HBASE-5208-002.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread Nicholas Telford (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Telford updated HBASE-5208:


Attachment: HBASE-5208-002.txt

Git patches seem to break the QA bot.

Manually edited to remove the a/ prefixes.

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt, HBASE-5208-002.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5204) Backward compatibility fixes for 0.92


 [ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5204:
--

Fix Version/s: 0.94.0
 Hadoop Flags: Incompatible change,Reviewed  (was: Incompatible change)

 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0, 0.94.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers


 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5153:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510383/HBASE-5153-V4-90.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/747//console

This message is automatically generated.)

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers


 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5153:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510179/HBASE-5153-V3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/727//console

This message is automatically generated.)

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-16 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186957#comment-13186957
 ] 

Hudson commented on HBASE-5204:
---

Integrated in HBase-0.92 #245 (See 
[https://builds.apache.org/job/HBase-0.92/245/])
HBASE-5204 Backward compatibility fixes for 0.92

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java


 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0, 0.94.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3724) Load balancer improvements


[ 
https://issues.apache.org/jira/browse/HBASE-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186959#comment-13186959
 ] 

gaojinchao commented on HBASE-3724:
---

I found the balance ago in branch92 is invalid for our scenario. 
So I use this issue to hang all issues related to balance. If someone want to 
see it, 
it will be easy.


 Load balancer improvements
 --

 Key: HBASE-3724
 URL: https://issues.apache.org/jira/browse/HBASE-3724
 Project: HBase
  Issue Type: Umbrella
Reporter: stack

 Umbrella issue under which we hang all regions related to balancer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3724) Load balancer improvements


[ 
https://issues.apache.org/jira/browse/HBASE-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186963#comment-13186963
 ] 

Zhihong Yu commented on HBASE-3724:
---

@Jinchao:
Can you describe your scenario ?
Then we will see which task can best accommodate your requirement.

 Load balancer improvements
 --

 Key: HBASE-3724
 URL: https://issues.apache.org/jira/browse/HBASE-3724
 Project: HBase
  Issue Type: Umbrella
Reporter: stack

 Umbrella issue under which we hang all regions related to balancer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186964#comment-13186964
]

Hadoop QA commented on HBASE-5208:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12510705/HBASE-5208-002.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 javadoc. The javadoc tool appears to have generated -145 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 82 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/777//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/777//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/777//console

This message is automatically generated.

Allow setting Scan start/stop row individually in TableInputFormat
--

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread Nicholas Telford (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186962#comment-13186962
 ] 

Nicholas Telford commented on HBASE-5208:
-

Tests were excluded from the patch as for now I'm unable to get the large 
tests to run in my environment, even from a clean trunk. I do have a patch with 
tests, but I'm not happy submitting them until I can get it working.

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt, HBASE-5208-002.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.


[ 
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186967#comment-13186967
 ] 

jirapos...@reviews.apache.org commented on HBASE-5203:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3510/#review4394
---


Nice work, Lars.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Delete.java
https://reviews.apache.org/r/3510/#comment9902

I think DoNotRetryIOException may be more appropriate here.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3510/#comment9897

White space.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3510/#comment9898

Please replace this parameter with clusterId.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3510/#comment9899

Please add clusterId parameter here.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3510/#comment9903

Should we allow caller to pass clusterId ?
That parameter would be used at line 4213.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3510/#comment9900

The original intent of this check being inside for loop was to populate 
walEdits.
Now we can lift this check to after line 4157.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3510/#comment9901

There is only one WALEdit now, right ?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
https://reviews.apache.org/r/3510/#comment9904

I think the original javadoc should be modified to indicate the support of 
Put and Delete.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
https://reviews.apache.org/r/3510/#comment9906

Good.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
https://reviews.apache.org/r/3510/#comment9907

I think 'to a map from key to values' may be clearer.
Otherwise people have to read the method body to fully understand.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
https://reviews.apache.org/r/3510/#comment9908

I don't see InterruptedException declared to be thrown by this method.
IE is caught at line 171.


- Ted


On 2012-01-16 07:58:33, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3510/
bq.  ---
bq.  
bq.  (Updated 2012-01-16 07:58:33)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Basically a rewrite (sorry about that) of HBASE-3485 Allow atomic 
put/delete in one call.
bq.  This makes this actually correct in the case of RegionServer failures 
(HBASE-3485 was correct for all scenarios but RegionServer failures).
bq.  HRegion.mutateRow(...) now groups all edits into a single WALEdit and 
appends all edits in one call. Only then are the memstore edits applied.
bq.  This is the first time that WALEdits can contain KVs from different types 
of operations. So I also had to fix the replication code to understand that.
bq.  WAL recovery already handles this case.
bq.  
bq.  
bq.  This addresses bug HBASE-5203.
bq.  https://issues.apache.org/jira/browse/HBASE-5203
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Delete.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1231744 
bq.  
bq.  Diff: https://reviews.apache.org/r/3510/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  * Tests added in HBASE-3485
bq.  * manual

[jira] [Commented] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-16 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186977#comment-13186977
 ] 

Hudson commented on HBASE-5204:
---

Integrated in HBase-TRUNK #2634 (See 
[https://builds.apache.org/job/HBase-TRUNK/2634/])
HBASE-5204 Backward compatibility fixes for 0.92

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java


 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0, 0.94.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-16 Thread Jieshan Bean (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-5153:


Attachment: TestResults-hbase5153.out

Ran the tests again, got the same results: 5 tests failed due to the hostName 
problem. Please find the results from the attachment 
TestResults-hbase5153.out.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186988#comment-13186988
]

Lars Hofhansl commented on HBASE-5203:
--

@Ram: are you looking at the right patch? The part you quote was removed with
this.
Or are you saying it should be possible to do that?

I think that would not correct, as I want an atomic operation and I realized in
HBASE-3584 that I need to write a single WALEdit.

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186989#comment-13186989
 ] 

Hadoop QA commented on HBASE-5153:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510715/TestResults-hbase5153.out
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/778//console

This message is automatically generated.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.


[ 
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186993#comment-13186993
 ] 

jirapos...@reviews.apache.org commented on HBASE-5203:
--



bq.  On 2012-01-16 15:27:14, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Delete.java,
 line 149
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68986#file68986line149
bq.  
bq.   I think DoNotRetryIOException may be more appropriate here.

Sure. Although this is client side code, so there is no notion of retry. (Put 
does the same)


bq.  On 2012-01-16 15:27:14, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4170
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68987#file68987line4170
bq.  
bq.   Should we allow caller to pass clusterId ?
bq.   That parameter would be used at line 4213.

The clusterID is only used for replication. Only plain Puts and Deletes need to 
use an optional clusterId (when executed from the ReplicationSink). All other 
operations do (and should) use the local clusterID.


bq.  On 2012-01-16 15:27:14, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4184
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68987#file68987line4184
bq.  
bq.   The original intent of this check being inside for loop was to 
populate walEdits.
bq.   Now we can lift this check to after line 4157.

Correct. But I still need to execute and check all preHooks before the 1st 
WALEdit is written.


bq.  On 2012-01-16 15:27:14, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4214
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68987#file68987line4214
bq.  
bq.   There is only one WALEdit now, right ?

Correct. Should read and apply edits (there are many edits in the one WALEdit)


bq.  On 2012-01-16 15:27:14, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java,
 line 98
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68988#file68988line98
bq.  
bq.   I think the original javadoc should be modified to indicate the 
support of Put and Delete.

Agreed.


bq.  On 2012-01-16 15:27:14, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java,
 line 175
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68988#file68988line175
bq.  
bq.   I think 'to a map from key to values' may be clearer.
bq.   Otherwise people have to read the method body to fully understand.

I think this should be a static util method somewhere(?)


bq.  On 2012-01-16 15:27:14, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java,
 line 202
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68988#file68988line202
bq.  
bq.   I don't see InterruptedException declared to be thrown by this 
method.
bq.   IE is caught at line 171.

Argghh... HTable.batch() throws it, and my first attempt was to pass it on. 
This is a leftover will be remove.
Thanks for the keen eyes.


bq.  On 2012-01-16 15:27:14, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 1788
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68987#file68987line1788
bq.  
bq.   Please replace this parameter with clusterId.

I knew you would find some Javadoc I missed :)
Will fix.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3510/#review4394
---


On 2012-01-16 07:58:33, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3510/
bq.  ---
bq.  
bq.  (Updated 2012-01-16 07:58:33)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Basically a rewrite (sorry about that) of HBASE-3485 Allow atomic 
put/delete in one call.
bq.  This makes this actually correct in the case of RegionServer failures 
(HBASE-3485 was correct for all scenarios but RegionServer failures).
bq.  HRegion.mutateRow(...) now groups all edits into a single WALEdit and 
appends all edits in one call. Only then are the memstore edits applied.
bq.  This is the

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187001#comment-13187001
 ] 

jirapos...@reviews.apache.org commented on HBASE-5203:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3510/#review4397
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3510/#comment9917

What I meant was that if coprocessorHost == null, the for loop can be 
skipped.


- Ted


On 2012-01-16 07:58:33, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3510/
bq.  ---
bq.  
bq.  (Updated 2012-01-16 07:58:33)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Basically a rewrite (sorry about that) of HBASE-3485 Allow atomic 
put/delete in one call.
bq.  This makes this actually correct in the case of RegionServer failures 
(HBASE-3485 was correct for all scenarios but RegionServer failures).
bq.  HRegion.mutateRow(...) now groups all edits into a single WALEdit and 
appends all edits in one call. Only then are the memstore edits applied.
bq.  This is the first time that WALEdits can contain KVs from different types 
of operations. So I also had to fix the replication code to understand that.
bq.  WAL recovery already handles this case.
bq.  
bq.  
bq.  This addresses bug HBASE-5203.
bq.  https://issues.apache.org/jira/browse/HBASE-5203
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Delete.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1231744 
bq.  
bq.  Diff: https://reviews.apache.org/r/3510/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  * Tests added in HBASE-3485
bq.  * manual testing.
bq.  * getting a full test run right now
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Group atomic put/delete operation into a single WALEdit to handle region 
 server failures.
 -

 Key: HBASE-5203
 URL: https://issues.apache.org/jira/browse/HBASE-5203
 Project: HBase
  Issue Type: Sub-task
  Components: client, coprocessors, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5203.txt


 HBASE-3584 does not not provide fully atomic operation in case of region 
 server failures (see explanation there).
 What should happen is that either (1) all edits are applied via a single 
 WALEdit, or (2) the WALEdits are applied in async mode and then sync'ed 
 together.
 For #1 it is not clear whether it is advisable to manage multiple *different* 
 operations (Put/Delete) via a single WAL edit. A quick check reveals that WAL 
 replay on region startup would work, but that replication would need to be 
 adapted. The refactoring needed would be non-trivial.
 #2 Might actually not work, as another operation could request sync'ing a 
 later edit and hence flush these entries out as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers


 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5153:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12510715/TestResults-hbase5153.out
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/778//console

This message is automatically generated.)

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187011#comment-13187011
 ] 

Zhihong Yu commented on HBASE-5153:
---

@Jieshan:
Please find a machine which has access to internet to run the test suite.
maven needs to download artifacts.

I ran the 5 tests on MacBook and they passed.
{code}
  839  mt -Dtest=TestClockSkewDetection
  840  mt -Dtest=TestScanner
  841  mt -Dtest=TestCatalogTrackerOnCluster
  842  mt -Dtest=TestCatalogTracker
{code}
+1 on latest patch.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.


[ 
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187023#comment-13187023
 ] 

jirapos...@reviews.apache.org commented on HBASE-5203:
--



bq.  On 2012-01-16 16:25:31, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4184
bq.   https://reviews.apache.org/r/3510/diff/2/?file=68987#file68987line4184
bq.  
bq.   What I meant was that if coprocessorHost == null, the for loop can 
be skipped.

Oh I see. You're right.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3510/#review4397
---


On 2012-01-16 07:58:33, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3510/
bq.  ---
bq.  
bq.  (Updated 2012-01-16 07:58:33)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Basically a rewrite (sorry about that) of HBASE-3485 Allow atomic 
put/delete in one call.
bq.  This makes this actually correct in the case of RegionServer failures 
(HBASE-3485 was correct for all scenarios but RegionServer failures).
bq.  HRegion.mutateRow(...) now groups all edits into a single WALEdit and 
appends all edits in one call. Only then are the memstore edits applied.
bq.  This is the first time that WALEdits can contain KVs from different types 
of operations. So I also had to fix the replication code to understand that.
bq.  WAL recovery already handles this case.
bq.  
bq.  
bq.  This addresses bug HBASE-5203.
bq.  https://issues.apache.org/jira/browse/HBASE-5203
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Delete.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1231744 
bq.  
bq.  Diff: https://reviews.apache.org/r/3510/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  * Tests added in HBASE-3485
bq.  * manual testing.
bq.  * getting a full test run right now
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Group atomic put/delete operation into a single WALEdit to handle region 
 server failures.
 -

 Key: HBASE-5203
 URL: https://issues.apache.org/jira/browse/HBASE-5203
 Project: HBase
  Issue Type: Sub-task
  Components: client, coprocessors, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5203.txt


 HBASE-3584 does not not provide fully atomic operation in case of region 
 server failures (see explanation there).
 What should happen is that either (1) all edits are applied via a single 
 WALEdit, or (2) the WALEdits are applied in async mode and then sync'ed 
 together.
 For #1 it is not clear whether it is advisable to manage multiple *different* 
 operations (Put/Delete) via a single WAL edit. A quick check reveals that WAL 
 replay on region startup would work, but that replication would need to be 
 adapted. The refactoring needed would be non-trivial.
 #2 Might actually not work, as another operation could request sync'ing a 
 later edit and hence flush these entries out as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187037#comment-13187037
]

jirapos...@reviews.apache.org commented on HBASE-5203:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3510/
---

(Updated 2012-01-16 17:28:09.639619)

Review request for hbase, Ted Yu and Michael Stack.

Changes
---

* Addresses Ted's comments.
* Passes all tests.

Summary
---

This addresses bug HBASE-5203.
https://issues.apache.org/jira/browse/HBASE-5203

Diffs (updated)
-

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Delete.java
1231744

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
1231744

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
1231744

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
1231744

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
1231744

Diff: https://reviews.apache.org/r/3510/diff

Testing
---

* Tests added in HBASE-3485
* manual testing.
* getting a full test run right now

Thanks,

Lars

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Updated] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread Nicholas Telford (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Telford updated HBASE-5208:


Attachment: HBASE-5208-003.txt

Adds tests for Scans defined by a Configuration.

Getting the largeTests suite running proved difficult and I think this actually 
makes the test run too long - I had to comment out the old testScan() tests to 
get it to complete in a reasonable time (i.e. without being killed for taking 
too long).

Should I have separated this out in to a separate test file?

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt, HBASE-5208-002.txt, 
 HBASE-5208-003.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-01-16 Thread Aditya Acharya (Created) (JIRA)

HConnection/HMasterInterface should allow for way to get hostname of currently 
active master in multi-master HBase setup


 Key: HBASE-5209
 URL: https://issues.apache.org/jira/browse/HBASE-5209
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Aditya Acharya


I have a multi-master HBase set up, and I'm trying to programmatically 
determine which of the masters is currently active. But the API does not allow 
me to do this. There is a getMaster() method in the HConnection class, but it 
returns an HMasterInterface, whose methods do not allow me to find out which 
master won the last race. The API should have a getActiveMasterHostname() or 
something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-01-16 Thread Jonathan Hsieh (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5209:
--

Affects Version/s: 0.94.0
   0.92.0
   0.90.5

 HConnection/HMasterInterface should allow for way to get hostname of 
 currently active master in multi-master HBase setup
 

 Key: HBASE-5209
 URL: https://issues.apache.org/jira/browse/HBASE-5209
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.92.0, 0.94.0, 0.90.5
Reporter: Aditya Acharya

 I have a multi-master HBase set up, and I'm trying to programmatically 
 determine which of the masters is currently active. But the API does not 
 allow me to do this. There is a getMaster() method in the HConnection class, 
 but it returns an HMasterInterface, whose methods do not allow me to find out 
 which master won the last race. The API should have a 
 getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187059#comment-13187059
 ] 

jirapos...@reviews.apache.org commented on HBASE-2600:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3466/
---

(Updated 2012-01-16 18:26:39.949854)


Review request for hbase and Michael Stack.


Changes
---

Updating the patch so that 
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java

uses the endkey instead of the startkey as it's more oftenly populated.

it fixes the occasional test breakage of 
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster#testShutdownSimpleFixup


Summary
---

This is an idea that Ryan and I have been kicking around on and off for a while 
now.

If regionnames were made of tablename+endrow instead of tablename+startrow, 
then in the metatables, doing a search for the region that contains the wanted 
row, we'd just have to open a scanner using passed row and the first row found 
by the scan would be that of the region we need (If offlined parent, we'd have 
to scan to the next row).

If we redid the meta tables in this format, we'd be using an access that is 
natural to hbase, a scan as opposed to the perverse, expensive 
getClosestRowBefore we currently have that has to walk backward in meta finding 
a containing region.

This issue is about changing the way we name regions.

If we were using scans, prewarming client cache would be near costless (as 
opposed to what we'll currently have to do which is first a getClosestRowBefore 
and then a scan from the closestrowbefore forward).

Converting to the new method, we'd have to run a migration on startup changing 
the content in meta.

Up to this, the randomid component of a region name has been the timestamp of 
region creation. HBASE-2531 32-bit encoding of regionnames waaay too 
susceptible to hash clashes proposes changing the randomid so that it contains 
actual name of the directory in the filesystem that hosts the region. If we had 
this in place, I think it would help with the migration to this new way of 
doing the meta because as is, the region name in fs is a hash of regionname... 
changing the format of the regionname would mean we generate a different 
hash... so we'd need hbase-2531 to be in place before we could do this change.


This addresses bug HBASE-2600.
https://issues.apache.org/jira/browse/HBASE-2600


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 904e2d2 
  src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 74cb821 
  src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 133759d 
  src/main/java/org/apache/hadoop/hbase/KeyValue.java be7e2d8 
  src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java e5e60a8 
  src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 88c381f 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 99f90b2 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f0c6828 
  
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 
8f4f4b8 
  src/main/java/org/apache/hadoop/hbase/rest/RegionsResource.java bf85bc1 
  src/main/java/org/apache/hadoop/hbase/rest/model/TableRegionModel.java 
67e7a04 
  src/test/java/org/apache/hadoop/hbase/TestKeyValue.java dc4ee8d 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestGetClosestAtOrBefore.java
 5f97167 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java 
6e1211b 
  src/test/java/org/apache/hadoop/hbase/rest/TestStatusResource.java cffdcb6 
  src/test/java/org/apache/hadoop/hbase/rest/model/TestTableRegionModel.java 
b6f0ab5 

Diff: https://reviews.apache.org/r/3466/diff


Testing
---

Unit tests started table. 


Tests in error: 
  org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD: Table 'TestTable 
we searched for the StartKey: TestTable ,, startKey lastChar's int value: 32 
with the stopKey: TestTable#,, stopRow lastChar's int value: 35 with 
parentTable:.META.

I need to know how to update/recreate the tar ball which is the source for that 
test.


Thanks,

Alex



 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch,

[jira] [Updated] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

2012-01-16 Thread Lars Hofhansl (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HBASE-5203:
-

Status: Open (was: Patch Available)

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187060#comment-13187060
]

Hadoop QA commented on HBASE-5208:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12510724/HBASE-5208-003.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated -145 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 82 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.replication.TestReplicationPeer
org.apache.hadoop.hbase.regionserver.TestSplitLogWorker
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/779//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/779//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/779//console

This message is automatically generated.

Allow setting Scan start/stop row individually in TableInputFormat
--

[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-16 Thread Alex Newman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-2600:
---

Attachment: 
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187064#comment-13187064
]

ramkrishna.s.vasudevan commented on HBASE-5203:
---

@Lars
Sorry i pasted the snippet from the code. If you take doMiniBatchPuts the
postPut() will be done onlly if the put is successful.
Here in mutateRow() we do the postPut in finally block.
So just i wanted to know if we the MutatedRow's log.append() fails we still
execute the postPut(). Pls do correct me if am wrong. I get the intent behind
the patch but this part am not sure.

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187065#comment-13187065
 ] 

Zhihong Yu commented on HBASE-5208:
---

Looks like testScan() is always followed by testScanFromConfiguration() with 
the same parameters:
{code}
 testScan(null, app, apo);
+testScanFromConfiguration(null, app, apo);
{code}
I suggest adding an intermediary method that calls both 
testScanFromConfiguration() and testScan().

So using the existing TestTableInputFormatScan should be fine.

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt, HBASE-5208-002.txt, 
 HBASE-5208-003.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.


[ 
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187067#comment-13187067
 ] 

jirapos...@reviews.apache.org commented on HBASE-5203:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3510/#review4400
---

Ship it!


- Ted


On 2012-01-16 17:28:09, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3510/
bq.  ---
bq.  
bq.  (Updated 2012-01-16 17:28:09)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Basically a rewrite (sorry about that) of HBASE-3485 Allow atomic 
put/delete in one call.
bq.  This makes this actually correct in the case of RegionServer failures 
(HBASE-3485 was correct for all scenarios but RegionServer failures).
bq.  HRegion.mutateRow(...) now groups all edits into a single WALEdit and 
appends all edits in one call. Only then are the memstore edits applied.
bq.  This is the first time that WALEdits can contain KVs from different types 
of operations. So I also had to fix the replication code to understand that.
bq.  WAL recovery already handles this case.
bq.  
bq.  
bq.  This addresses bug HBASE-5203.
bq.  https://issues.apache.org/jira/browse/HBASE-5203
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Delete.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1231744 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1231744 
bq.  
bq.  Diff: https://reviews.apache.org/r/3510/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  * Tests added in HBASE-3485
bq.  * manual testing.
bq.  * getting a full test run right now
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Group atomic put/delete operation into a single WALEdit to handle region 
 server failures.
 -

 Key: HBASE-5203
 URL: https://issues.apache.org/jira/browse/HBASE-5203
 Project: HBase
  Issue Type: Sub-task
  Components: client, coprocessors, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5203.txt


 HBASE-3584 does not not provide fully atomic operation in case of region 
 server failures (see explanation there).
 What should happen is that either (1) all edits are applied via a single 
 WALEdit, or (2) the WALEdits are applied in async mode and then sync'ed 
 together.
 For #1 it is not clear whether it is advisable to manage multiple *different* 
 operations (Put/Delete) via a single WAL edit. A quick check reveals that WAL 
 replay on region startup would work, but that replication would need to be 
 adapted. The refactoring needed would be non-trivial.
 #2 Might actually not work, as another operation could request sync'ing a 
 later edit and hence flush these entries out as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat

2012-01-16 Thread Nicholas Telford (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187071#comment-13187071
 ] 

Nicholas Telford commented on HBASE-5208:
-

That was my intention. I can extract that out to an intermediary method if 
that's preferable, however that doesn't really solve the problem that doubling 
the number of MR jobs spun up causes the test to timeout. Any ideas on that one?

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt, HBASE-5208-002.txt, 
 HBASE-5208-003.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

2012-01-16 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187074#comment-13187074
]

Lars Hofhansl commented on HBASE-5203:
--

@Ram: I see what you mean. Good point.
Unlike doMiniBatchPut there is no partial completion here, but the postHooks
should indeed only be run if the (entire) operation was successful. I'll have a
change soon.

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler


[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187081#comment-13187081
 ] 

ramkrishna.s.vasudevan commented on HBASE-5120:
---

Just was browsing thro HBASE-4015 where Timeout monitor was refactored.
With 5 secs as timeout period it was tested by balancing, killing and bringing 
up RS. Things came out fine.
But this disable scenario was missed out.
Another change that i could see is when HBASE-4015 was done for forceful 
unassign() we check if the node is present in CLOSING state then we did not 
proceed with it.
Now in recent code the check is removed.  May be that exposed the problem.
Thanks to JD for pointing this out. As per JD if after reducing the timeout 
period if we don't run to such type of issues then we can say TM is really 
fixed.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, 
 HBASE-5120_5.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG

[jira] [Created] (HBASE-5210) HFiles are missing from an incremental load

2012-01-16 Thread Lawrence Simpson (Created) (JIRA)

HFiles are missing from an incremental load
---

 Key: HBASE-5210
 URL: https://issues.apache.org/jira/browse/HBASE-5210
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.2
 Environment: HBase 0.90.2 with Hadoop-0.20.2 (with durable sync).  
RHEL 2.6.18-164.15.1.el5.  4 node cluster (1 master, 3 slaves)
Reporter: Lawrence Simpson


We run an overnight map/reduce job that loads data from an external source and 
adds that data to an existing HBase table.  The input files have been loaded 
into hdfs.  The map/reduce job uses the HFileOutputFormat (and the 
TotalOrderPartitioner) to create HFiles which are subsequently added to the 
HBase table.  On at least two separate occasions (that we know of), a range of 
output would be missing for a given day.  The range of keys for the missing 
values corresponded to those of a particular region.  This implied that a 
complete HFile somehow went missing from the job.  Further investigation 
revealed the following:

 * Two different reducers (running in separate JVMs and thus separate class 
loaders)
 * in the same server can end up using the same file names for their
 * HFiles.  The scenario is as follows:
 *  1.  Both reducers start near the same time.
 *  2.  The first reducer reaches the point where it wants to write its 
first file.
 *  3.  It uses the StoreFile class which contains a static Random 
object 
 *  which is initialized by default using a timestamp.
 *  4.  The file name is generated using the random number generator.
 *  5.  The file name is checked against other existing files.
 *  6.  The file is written into temporary files in a directory named
 *  after the reducer attempt.
 *  7.  The second reduce task reaches the same point, but its 
StoreClass
 *  (which is now in the file system's cache) gets loaded within the
 *  time resolution of the OS and thus initializes its Random()
 *  object with the same seed as the first task.
 *  8.  The second task also checks for an existing file with the name
 *  generated by the random number generator and finds no conflict
 *  because each task is writing files in its own temporary folder.
 *  9.  The first task finishes and gets its temporary files committed
 *  to the real folder specified for output of the HFiles.
 * 10.  The second task then reaches its own conclusion and commits its
 *  files (moveTaskOutputs).  The released Hadoop code just 
overwrites
 *  any files with the same name.  No warning messages or anything.
 *  The first task's HFiles just go missing.
 * 
 *  Note:  The reducers here are NOT different attempts at the same 
 *  reduce task.  They are different reduce tasks so data is
 *  really lost.

I am currently testing a fix in which I have added code to the Hadoop 
FileOutputCommitter.moveTaskOutputs method to check for a conflict with
an existing file in the final output folder and to rename the HFile if
needed.  This may not be appropriate for all uses of FileOutputFormat.
So I have put this into a new class which is then used by a subclass of
HFileOutputFormat.  Subclassing of FileOutputCommitter itself was a bit 
more of a problem due to private declarations.

I don't know if my approach is the best fix for the problem.  If someone
more knowledgeable than myself deems that it is, I will be happy to share
what I have done and by that time I may have some information on the
results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187089#comment-13187089
]

jirapos...@reviews.apache.org commented on HBASE-5203:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3510/
---

(Updated 2012-01-16 19:05:01.756466)

Review request for hbase, Ted Yu and Michael Stack.

Changes
---

* Addressing Ram's comments. Coprocessor postHooks are now only executed if all
operations were successful (threw no exceptions).

Summary
---

This addresses bug HBASE-5203.
https://issues.apache.org/jira/browse/HBASE-5203

Diffs (updated)
-

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Delete.java
1232110

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
1232110

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
1232110

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
1232110

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
1232110

Diff: https://reviews.apache.org/r/3510/diff

Testing (updated)
---

* Tests added in HBASE-3485
* manual testing.
* passes all tests.

Thanks,

Lars

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187091#comment-13187091
]

Lars Hofhansl commented on HBASE-5203:
--

@Ram: added a new patch to RB. Only change is wrapping the whole operation in
try/finally and running the postHooks outside the finally blocks (but still
after the mvcc is rolled forward and the regionlock was released). Otherwise
the patch is identical.
Please have a look. Thanks.

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-16 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187093#comment-13187093
]

Hadoop QA commented on HBASE-2600:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12510727/0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 20 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated -145 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 83 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
org.apache.hadoop.hbase.replication.TestReplicationPeer
org.apache.hadoop.hbase.replication.TestReplication
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/780//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/780//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/780//console

This message is automatically generated.

Change how we do meta tables; from tablename+STARTROW+randomid to instead,
tablename+ENDROW+randomid

Key: HBASE-2600
URL: https://issues.apache.org/jira/browse/HBASE-2600
Project: HBase
Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
Attachments:
0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch

This is an idea that Ryan and I have been kicking around on and off for a
while now.
If regionnames were made of tablename+endrow instead of tablename+startrow,
then in the metatables, doing a search for the region that contains the
wanted row, we'd just have to open a scanner using passed row and the first
row found by the scan would be that of the region we need (If offlined
parent, we'd have to scan to the next row).
If we redid the meta tables in this format, we'd be using an access that is
natural to hbase, a scan as opposed to the perverse, expensive
getClosestRowBefore we currently have that has to walk backward in meta
finding a containing region.
This issue is about changing the way we name regions.
If we were using scans, prewarming client cache would be near costless (as
opposed to what we'll currently have to do which is first a
getClosestRowBefore and then a scan from the closestrowbefore forward).
Converting to the new method, we'd have to run a migration on startup
changing the content in meta.
Up to this, the randomid component of a region name has been the timestamp of
region creation. HBASE-2531 32-bit encoding of regionnames waaay
too susceptible to hash clashes proposes changing the randomid so that it
contains actual name of the directory in the filesystem that hosts the
region. If we had this in place, I think it would help with the migration to
this new way of doing the meta because as is, the region name in fs is a hash
of regionname... changing the format of the regionname would mean we generate
a different hash... so we'd need hbase-2531 to be in place before we could do
this change.

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-01-16 Thread Jai Kumar Singh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187094#comment-13187094
]

Jai Kumar Singh commented on HBASE-5166:

Hi stack,
Thanks for the comment. I've modified the patch accordingly.
Added Executors.newFixedThreadPool(numberOfThreads) for executor part.

-- JK

MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
--

Key: HBASE-5166
URL: https://issues.apache.org/jira/browse/HBASE-5166
Project: HBase
Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
Labels: multithreaded, tablemapper
Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch

Original Estimate: 0.5h
Remaining Estimate: 0.5h

There is no MultiThreadedTableMapper in hbase currently just like we have a
MultiThreadedMapper in Hadoop for IO Bound Jobs.
UseCase, webcrawler: take input (urls) from a hbase table and put the content
(urls, content) back into hbase.
Running these kind of hbase mapreduce job with normal table mapper is quite
slow as we are not utilizing CPU fully (N/W IO Bound).
Moreover, I want to know whether It would be a good/bad idea to use HBase for
these kind of usecases ?.

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-16 Thread Alex Newman (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187095#comment-13187095
]

Alex Newman commented on HBASE-2600:

I'll take a look at these broken tests. Weird that these didn't break on my
jenkins.

Change how we do meta tables; from tablename+STARTROW+randomid to instead,
tablename+ENDROW+randomid

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-01-16 Thread Jai Kumar Singh (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Kumar Singh updated HBASE-5166:
---

Attachment: 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

Modified patch

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187100#comment-13187100
]

ramkrishna.s.vasudevan commented on HBASE-5203:
---

+1. :)
Thanks Lars.

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Updated] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-16 Thread Benoit Sigoure (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure updated HBASE-5204:
--

Affects Version/s: (was: 0.92.0)
Fix Version/s: (was: 0.94.0)

 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-16 Thread Benoit Sigoure (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure updated HBASE-5204:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the quick turnaround.  And once again, sorry for submitting this so 
late in the release process.

 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-01-16 Thread Jonathan Hsieh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187127#comment-13187127
]

Jonathan Hsieh commented on HBASE-5128:
---

@Ted sounds good.

[uber hbck] Enable hbck to automatically repair table integrity problems as
well as region consistency problems while online.
-

Key: HBASE-5128
URL: https://issues.apache.org/jira/browse/HBASE-5128
Project: HBase
Issue Type: New Feature
Components: hbck
Affects Versions: 0.92.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region
consistency and table integrity invariant violations. However with '-fix' it
can only automatically repair region consistency cases having to do with
deployment problems. This updated version should be able to handle all cases
(including a new orphan regiondir case). When complete will likely deprecate
the OfflineMetaRepair tool and subsume several open META-hole related issue.
Here's the approach (from the comment of at the top of the new version of the
file).
{code}
/**
* HBaseFsck (hbck) is a tool for checking and repairing region consistency
and
* table integrity.
*
* Region consistency checks verify that META, region deployment on
* region servers and the state of data in HDFS (.regioninfo files) all are in
* accordance.
*
* Table integrity checks verify that that all possible row keys can resolve
to
* exactly one region of a table. This means there are no individual
degenerate
* or backwards regions; no holes between regions; and that there no
overlapping
* regions.
*
* The general repair strategy works in these steps.
* 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
* 2) Repair Region Consistency with META and assignments
*
* For table integrity repairs, the tables their region directories are
scanned
* for .regioninfo files. Each table's integrity is then verified. If there
* are any orphan regions (regions with no .regioninfo files), or holes, new
* regions are fabricated. Backwards regions are sidelined as well as empty
* degenerate (endkey==startkey) regions. If there are any overlapping
regions,
* a new region is created and all data is merged into the new region.
*
* Table integrity repairs deal solely with HDFS and can be done offline --
the
* hbase region servers or master do not need to be running. These phase can
be
* use to completely reconstruct the META table in an offline fashion.
*
* Region consistency requires three conditions -- 1) valid .regioninfo file
* present in an hdfs region dir, 2) valid row with .regioninfo data in META,
* and 3) a region is deployed only at the regionserver that is was assigned
to.
*
* Region consistency requires hbck to contact the HBase master and region
* servers, so the connect() must first be called successfully. Much of the
* region consistency information is transient and less risky to repair.
*/
{code}

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-01-16 Thread Jonathan Hsieh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187128#comment-13187128
]

Jonathan Hsieh commented on HBASE-5128:
---

@Ted sounds good.

[uber hbck] Enable hbck to automatically repair table integrity problems as
well as region consistency problems while online.
-

[jira] [Commented] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-16 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187129#comment-13187129
 ] 

Hudson commented on HBASE-5204:
---

Integrated in HBase-0.92-security #77 (See 
[https://builds.apache.org/job/HBase-0.92-security/77/])
HBASE-5204 Backward compatibility fixes for 0.92

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java


 Backward compatibility fixes for 0.92
 -

 Key: HBASE-5204
 URL: https://issues.apache.org/jira/browse/HBASE-5204
 Project: HBase
  Issue Type: Bug
  Components: ipc
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
  Labels: backwards-compatibility
 Fix For: 0.92.0

 Attachments: 
 0001-Add-some-backward-compatible-support-for-reading-old.patch, 
 0002-Make-sure-that-a-connection-always-uses-a-protocol.patch, 
 0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt, 
 5204-trunk.txt


 Attached are 3 patches that are necessary to allow compatibility between 
 HBase 0.90.x (and previous releases) and HBase 0.92.0.
 First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of 
 people and would probably wind up being released as 0.92.0 tomorrow, so I 
 sincerely apologize for creating this issue so late in the process.  I spent 
 a lot of time trying to work around the quirks of 0.92 but once I realized 
 that with a few very quasi-trivial changes compatibility would be made 
 significantly easier, I immediately sent these 3 patches to Stack, who 
 suggested I create this issue.
 The first patch is required as without it clients sending a 0.90-style RPC to 
 a 0.92-style server causes the server to die uncleanly.  It seems that 0.92 
 ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server 
 fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer 
 because it doesn't read fields of 0.90-style RPCs properly.  This allocation 
 attempt immediately triggers an OOME, which causes the JVM to die abruptly of 
 a {{SIGKILL}}.  So whenever a 0.90.x client attempts to connect to HBase, it 
 kills whichever RS is hosting the {{\-ROOT-}} region.
 The second patch fixes a bug introduced by HBASE-2002, which added support 
 for letting clients specify what protocol they want to speak.  If a client 
 doesn't properly specify what protocol to use, the connection's {{protocol}} 
 field will be left {{null}}, which causes any subsequent RPC on that 
 connection to trigger an NPE in the server, even though the connection was 
 successfully established from the client's point of view.  The fix is to 
 simply give the connection a default protocol, by assuming the client meant 
 to speak to a RegionServer.
 The third patch fixes an oversight that slipped in HBASE-451, where a change 
 to {{HbaseObjectWritable}} caused all the codes used to serialize 
 {{Writables}} to shift by one.  This was carefully avoided in other changes 
 such as HBASE-1502, which cleanly removed entries for {{HMsg}} and 
 {{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat


[ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187132#comment-13187132
 ] 

Zhihong Yu commented on HBASE-5208:
---

From 
https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2634/console:
{code}
Running org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 454.79 sec
{code}
It was indeed long - without the new test cases.

Can you pick only a few of the test cases from TestTableInputFormatScan for 
your new tests ?



 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt, HBASE-5208-002.txt, 
 HBASE-5208-003.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-01-16 Thread Jonathan Hsieh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187134#comment-13187134
]

Jonathan Hsieh commented on HBASE-5128:
---

I've been testing using failed splits generated by cycling the hbase master
while doing a heavy write load with a high split frequency prior to HBASE-5196
patch. A subset of problems has been fixed automatically but it seems to be a
class of problems with splitting regions that isn't being handled properly.
This actually is probably the case we are most likely to encounter.

[uber hbck] Enable hbck to automatically repair table integrity problems as
well as region consistency problems while online.
-

[jira] [Commented] (HBASE-5208) Allow setting Scan start/stop row individually in TableInputFormat


[ 
https://issues.apache.org/jira/browse/HBASE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187143#comment-13187143
 ] 

Zhihong Yu commented on HBASE-5208:
---

Running the test based on patch v3 timed out. Here is strace:
{code}
main prio=5 tid=101801000 nid=0x100601000 waiting on condition [1005fe000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1295)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:498)
at 
org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan.testScanFromConfiguration(TestTableInputFormatScan.java:355)
at 
org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan.testScanYZYToEmpty(TestTableInputFormatScan.java:319)
{code}

You can use the following command to verify that the new test case passes (just 
an example):
{code}
mvn test -P localTests TestTableInputFormatScan#testScanEmptyToEmpty
{code}

 Allow setting Scan start/stop row individually in TableInputFormat
 --

 Key: HBASE-5208
 URL: https://issues.apache.org/jira/browse/HBASE-5208
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nicholas Telford
Priority: Minor
 Attachments: HBASE-5208-001.txt, HBASE-5208-002.txt, 
 HBASE-5208-003.txt


 Currently, TableInputFormat initializes a serialized Scan from 
 hbase.mapreduce.scan. Alternatively, it will instantiate a new Scan using 
 properties defined in hbase.mapreduce.scan.*. However, of these properties 
 the start row and stop row (arguably the most pertinent) are missing.
 TableInputFormat should permit the specification of a start/stop row as with 
 the other fields using a new pair of properties: 
 hbase.mapreduce.scan.row.start and hbase.mapreduce.scan.row.end
 The primary use-case for this is to permit Oozie and other job management 
 tools that can't call TableMapReduceUtil.initTableMapperJob() to operate on a 
 contiguous subset of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-01-16 Thread Jean-Daniel Cryans (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-5155:
-

Assignee: ramkrishna.s.vasudevan

Please don't forget to set the assignee.

 ServerShutDownHandler And Disable/Delete should not happen parallely leading 
 to recreation of regions that were deleted
 ---

 Key: HBASE-5155
 URL: https://issues.apache.org/jira/browse/HBASE-5155
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.90.6

 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
 HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch


 ServerShutDownHandler and disable/delete table handler races.  This is not an 
 issue due to TM.
 - A regionserver goes down.  In our cluster the regionserver holds lot of 
 regions.
 - A region R1 has two daughters D1 and D2.
 - The ServerShutdownHandler gets called and scans the META and gets all the 
 user regions
 - Parallely a table is disabled. (No problem in this step).
 - Delete table is done.
 - The tables and its regions are deleted including R1, D1 and D2.. (So META 
 is cleaned)
 - Now ServerShutdownhandler starts to processTheDeadRegion
 {code}
  if (hri.isOffline()  hri.isSplit()) {
   LOG.debug(Offlined and split region  + hri.getRegionNameAsString() +
 ; checking daughter presence);
   fixupDaughters(result, assignmentManager, catalogTracker);
 {code}
 As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
 {code}
 if (isDaughterMissing(catalogTracker, daughter)) {
   LOG.info(Fixup; missing daughter  + daughter.getRegionNameAsString());
   MetaEditor.addDaughter(catalogTracker, daughter, null);
   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
   // there then something wonky about the split -- things will keep going
   // but could be missing references to parent region.
   // And assign it.
   assignmentManager.assign(daughter, true);
 {code}
 we call assign of the daughers.  
 Now after this we again start with the below code.
 {code}
 if (processDeadRegion(e.getKey(), e.getValue(),
 this.services.getAssignmentManager(),
 this.server.getCatalogTracker())) {
   this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Now when the SSH scanned the META it had R1, D1 and D2.
 So as part of the above code D1 and D2 which where assigned by fixUpDaughters
 is again assigned by 
 {code}
 this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Thus leading to a zookeeper issue due to bad version and killing the master.
 The important part here is the regions that were deleted are recreated which 
 i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5207) Apply HBASE-5155 to trunk

2012-01-16 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187156#comment-13187156
 ] 

Jean-Daniel Cryans commented on HBASE-5207:
---

Collision with HBASE-5206?

 Apply HBASE-5155  to trunk
 --

 Key: HBASE-5207
 URL: https://issues.apache.org/jira/browse/HBASE-5207
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan

 The issue HBASE-5155 has been fixed on branch(0.90).  The same has to be 
 applied on trunk also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187163#comment-13187163
]

Lars Hofhansl commented on HBASE-5203:
--

Ok... Will commit later today.
@Stack: Wanna have a quick look?

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203.txt

[jira] [Commented] (HBASE-3489) .oldlogs not being cleaned out

2012-01-16 Thread Josh Wymer (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187172#comment-13187172
 ] 

Josh Wymer commented on HBASE-3489:
---

We are seeing this on our replication cluster using 0.90.4. The /hbase/.oldlogs 
is filled with logs that are ~ 1 month old.

 .oldlogs not being cleaned out
 --

 Key: HBASE-3489
 URL: https://issues.apache.org/jira/browse/HBASE-3489
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
 Environment: 10 Nodes Write Heavy Cluster
Reporter: Wayne
 Attachments: oldlog.txt


 The .oldlogs folder is never being cleaned up. The 
 hbase.master.logcleaner.ttl has been set to clean up the old logs but the 
 clean up is never kicking in. The limit of 10 files is not the problem. After 
 running for 5 days not a single log file has ever been deleted and the 
 logcleaner is set to 2 days (from the default of 7 days). It is assumed that 
 the replication changes that want to be sure to keep these logs around if 
 needed have caused the cleanup to be blocked. There is no replication defined 
 (knowingly).
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-16 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187217#comment-13187217
 ] 

Todd Lipcon commented on HBASE-5196:


Should this also be committed to the 0.90 branch?

 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5211) org.apache.hadoop.hbase.replication.TestMultiSlaveReplication#testMultiSlaveReplication is flakey

2012-01-16 Thread Alex Newman (Created) (JIRA)

org.apache.hadoop.hbase.replication.TestMultiSlaveReplication#testMultiSlaveReplication
 is flakey
-

 Key: HBASE-5211
 URL: https://issues.apache.org/jira/browse/HBASE-5211
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: trunk.txt

I can't seem to get this test to pass consistently on my laptop. Also my hudson 
occasionally tripps up on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5211) org.apache.hadoop.hbase.replication.TestMultiSlaveReplication#testMultiSlaveReplication is flakey

2012-01-16 Thread Alex Newman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-5211:
---

Attachment: trunk.txt

I attached a log file.

 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication#testMultiSlaveReplication
  is flakey
 -

 Key: HBASE-5211
 URL: https://issues.apache.org/jira/browse/HBASE-5211
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: trunk.txt


 I can't seem to get this test to pass consistently on my laptop. Also my 
 hudson occasionally tripps up on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers


[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187277#comment-13187277
 ] 

Lars Hofhansl commented on HBASE-5153:
--

+1 on latest patch

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

2012-01-16 Thread Lars Hofhansl (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HBASE-5203:
-

Attachment: 5203-v3.txt

Double checking latest patch (same as on RB)

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203-v3.txt, 5203.txt

[jira] [Updated] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

2012-01-16 Thread Lars Hofhansl (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HBASE-5203:
-

Status: Patch Available (was: Open)

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203-v3.txt, 5203.txt

[jira] [Created] (HBASE-5212) Fix test TestTableMapReduce against 0.23.

2012-01-16 Thread Mahadev konar (Created) (JIRA)

Fix test TestTableMapReduce against 0.23.
-

 Key: HBASE-5212
 URL: https://issues.apache.org/jira/browse/HBASE-5212
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Mahadev konar
 Fix For: 0.92.1


As reported by Andrew on the hadoop mailing list, mvn -Dhadoop.profile=23 clean 
test -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce fails on 0.92 
branch. There are minor changes to HBase poms required to fix that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5191) Fix compilation error against hadoop 0.23.1

2012-01-16 Thread Mahadev konar (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187298#comment-13187298
 ] 

Mahadev konar commented on HBASE-5191:
--

@Ted, 
 Thanks for running through this. Any further update? Can I help in any way? 

 Fix compilation error against hadoop 0.23.1
 ---

 Key: HBASE-5191
 URL: https://issues.apache.org/jira/browse/HBASE-5191
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0

 Attachments: 5191.txt


 From Mahadev:
 I just checked out 0.92 branch and tried running:
 mvn -Dhadoop.profile=23 clean test
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 Looks like a compilation issue:
 
 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile
 (default-testCompile) on project hbase: Compilation failure
 [ERROR] 
 /Users/mahadev/workspace/hbase-workspace/hbase-git/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java:[341,33]
 cannot find symbol
 [ERROR] symbol  : variable dnRegistration
 [ERROR] location: class org.apache.hadoop.hdfs.server.datanode.DataNode
 [ERROR] - [Help 1]
 [ERROR]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5191) Fix compilation error against hadoop 0.23.1


[ 
https://issues.apache.org/jira/browse/HBASE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187300#comment-13187300
 ] 

Zhihong Yu commented on HBASE-5191:
---

I haven't found out why the assertion doesn't fail in HBase trunk - I basically 
used an equivalent dfsCluster.stopDataNode() call.
Since any patch has to make TestLogRolling pass for hadoop 1.0, I am still 
searching for the transformation.

This effort was partially sidetracked by work on 0.92

 Fix compilation error against hadoop 0.23.1
 ---

 Key: HBASE-5191
 URL: https://issues.apache.org/jira/browse/HBASE-5191
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
 Fix For: 0.92.0, 0.94.0

 Attachments: 5191.txt


 From Mahadev:
 I just checked out 0.92 branch and tried running:
 mvn -Dhadoop.profile=23 clean test
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 Looks like a compilation issue:
 
 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile
 (default-testCompile) on project hbase: Compilation failure
 [ERROR] 
 /Users/mahadev/workspace/hbase-workspace/hbase-git/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java:[341,33]
 cannot find symbol
 [ERROR] symbol  : variable dnRegistration
 [ERROR] location: class org.apache.hadoop.hdfs.server.datanode.DataNode
 [ERROR] - [Help 1]
 [ERROR]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5211) org.apache.hadoop.hbase.replication.TestMultiSlaveReplication#testMultiSlaveReplication is flakey


[ 
https://issues.apache.org/jira/browse/HBASE-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187308#comment-13187308
 ] 

Lars Hofhansl commented on HBASE-5211:
--

Hmmm... This seems to be the crux...?
{code}
2012-01-16 13:15:39,965 ERROR [Thread-3] hbase.MiniHBaseCluster(201): Error 
starting cluster
java.lang.RuntimeException: Master not initialized after 200 seconds
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:196)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:76)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:627)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:601)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:549)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:518)
at 
org.apache.hadoop.hbase.replication.TestMultiSlaveReplication.testMultiSlaveReplication(TestMultiSlaveReplication.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
{code}

 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication#testMultiSlaveReplication
  is flakey
 -

 Key: HBASE-5211
 URL: https://issues.apache.org/jira/browse/HBASE-5211
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: trunk.txt


 I can't seem to get this test to pass consistently on my laptop. Also my 
 hudson occasionally tripps up on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187309#comment-13187309
 ] 

Lars Hofhansl commented on HBASE-2600:
--

These three always fail it seems:
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat


 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-01-16 Thread Alex Newman (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187311#comment-13187311
 ] 

Alex Newman commented on HBASE-2600:


On all jenkins?

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5211) org.apache.hadoop.hbase.replication.TestMultiSlaveReplication#testMultiSlaveReplication is flakey

2012-01-16 Thread Alex Newman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-5211:
---

Attachment: log2.txt

Lars your right, although this is the other error I get.

 org.apache.hadoop.hbase.replication.TestMultiSlaveReplication#testMultiSlaveReplication
  is flakey
 -

 Key: HBASE-5211
 URL: https://issues.apache.org/jira/browse/HBASE-5211
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: log2.txt, trunk.txt


 I can't seem to get this test to pass consistently on my laptop. Also my 
 hudson occasionally tripps up on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5203) Group atomic put/delete operation into a single WALEdit to handle region server failures.

[
https://issues.apache.org/jira/browse/HBASE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187315#comment-13187315
]

Hadoop QA commented on HBASE-5203:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12510770/5203-v3.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated -145 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 82 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/781//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/781//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/781//console

This message is automatically generated.

Group atomic put/delete operation into a single WALEdit to handle region
server failures.
-

Attachments: 5203-v3.txt, 5203.txt

[jira] [Updated] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-16 Thread Jimmy Xiang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5196:
---

Attachment: hbase-5196_0.90.txt

 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt, hbase-5196_0.90.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-16 Thread Jimmy Xiang (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187318#comment-13187318
 ] 

Jimmy Xiang commented on HBASE-5196:


I attached a patch for 0.90 branch: hbase-5196_0.90.txt

Could anyone please check it in?

 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt, hbase-5196_0.90.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

[
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187320#comment-13187320
]

Lars Hofhansl commented on HBASE-2600:
--

Something to do with the Hadoop version on the jenkins machines.
Ted might know the details.

Change how we do meta tables; from tablename+STARTROW+randomid to instead,
tablename+ENDROW+randomid

[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187322#comment-13187322
 ] 

Zhihong Yu commented on HBASE-5196:
---

@Jimmy:
Have you run 0.90 test suite over the new patch ?

 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt, hbase-5196_0.90.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5212) Fix test TestTableMapReduce against 0.23.

2012-01-16 Thread Mahadev konar (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated HBASE-5212:
-

Attachment: HBASE-5212.patch

Thanks to Hitesh for helping me out on this. This should fix most of the issue 
with 0.23 tests. 

 Fix test TestTableMapReduce against 0.23.
 -

 Key: HBASE-5212
 URL: https://issues.apache.org/jira/browse/HBASE-5212
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Mahadev konar
 Fix For: 0.92.1

 Attachments: HBASE-5212.patch


 As reported by Andrew on the hadoop mailing list, mvn -Dhadoop.profile=23 
 clean test -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce fails 
 on 0.92 branch. There are minor changes to HBase poms required to fix that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5212) Fix test TestTableMapReduce against 0.23.

2012-01-16 Thread Mahadev konar (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated HBASE-5212:
-

Status: Patch Available  (was: Open)

 Fix test TestTableMapReduce against 0.23.
 -

 Key: HBASE-5212
 URL: https://issues.apache.org/jira/browse/HBASE-5212
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Mahadev konar
 Fix For: 0.92.1

 Attachments: HBASE-5212.patch


 As reported by Andrew on the hadoop mailing list, mvn -Dhadoop.profile=23 
 clean test -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce fails 
 on 0.92 branch. There are minor changes to HBase poms required to fix that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

[
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187330#comment-13187330
]

jirapos...@reviews.apache.org commented on HBASE-2600:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3466/#review4418
---

This looks pretty good. Thanks for being persistent and patient Alex!
Devil is probably still in the details.

All the getClosestBefore huh hah can now be removed from
HTable/Region[Server]/Store, right?

src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
https://reviews.apache.org/r/3466/#comment9924

! and
Although it's not very intuitive.
So the encoded region is now?
tableName!,endKey,...
tableName,endKey,...

Is that simpler than replacing the separator?
That could look like this:
tableName,endKey,...
tableName/endKey,...

src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
https://reviews.apache.org/r/3466/#comment9923

addEncoding does not use the startKey. Could just remove it from there, and
hence from here as well so that this method just needs to know the endKey.

src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
https://reviews.apache.org/r/3466/#comment9925

I like this. Captures what it is doing without being too complicated.

src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
https://reviews.apache.org/r/3466/#comment9926

Why is this needed?

src/test/java/org/apache/hadoop/hbase/regionserver/TestGetClosestAtOrBefore.java
https://reviews.apache.org/r/3466/#comment9927

Yeah... Be gone!

- Lars

On 2012-01-16 18:26:39, Alex Newman wrote:
bq.
bq. ---
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3466/
bq. ---
bq.
bq. (Updated 2012-01-16 18:26:39)
bq.
bq.
bq. Review request for hbase and Michael Stack.
bq.
bq.
bq. Summary
bq. ---
bq.
bq. This is an idea that Ryan and I have been kicking around on and off for a
while now.
bq.
bq. If regionnames were made of tablename+endrow instead of
tablename+startrow, then in the metatables, doing a search for the region that
contains the wanted row, we'd just have to open a scanner using passed row and
the first row found by the scan would be that of the region we need (If
offlined parent, we'd have to scan to the next row).
bq.
bq. If we redid the meta tables in this format, we'd be using an access that
is natural to hbase, a scan as opposed to the perverse, expensive
getClosestRowBefore we currently have that has to walk backward in meta finding
a containing region.
bq.
bq. This issue is about changing the way we name regions.
bq.
bq. If we were using scans, prewarming client cache would be near costless (as
opposed to what we'll currently have to do which is first a getClosestRowBefore
and then a scan from the closestrowbefore forward).
bq.
bq. Converting to the new method, we'd have to run a migration on startup
changing the content in meta.
bq.
bq. Up to this, the randomid component of a region name has been the timestamp
of region creation. HBASE-2531 32-bit encoding of regionnames waaay
too susceptible to hash clashes proposes changing the randomid so that it
contains actual name of the directory in the filesystem that hosts the region.
If we had this in place, I think it would help with the migration to this new
way of doing the meta because as is, the region name in fs is a hash of
regionname... changing the format of the regionname would mean we generate a
different hash... so we'd need hbase-2531 to be in place before we could do
this change.
bq.
bq.
bq. This addresses bug HBASE-2600.
bq. https://issues.apache.org/jira/browse/HBASE-2600
bq.
bq.
bq. Diffs
bq. -
bq.
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 904e2d2
bq.src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 74cb821
bq.src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 133759d
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java be7e2d8
bq.src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java e5e60a8
bq.src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 88c381f
bq.src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
99f90b2
bq.src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f0c6828
bq.
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
8f4f4b8
bq.src/main/java/org/apache/hadoop/hbase/rest/RegionsResource.java bf85bc1
bq.

[jira] [Commented] (HBASE-5212) Fix test TestTableMapReduce against 0.23.

[
https://issues.apache.org/jira/browse/HBASE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187331#comment-13187331
]

Hadoop QA commented on HBASE-5212:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12510776/HBASE-5212.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated -145 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 82 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/782//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/782//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/782//console

This message is automatically generated.

Fix test TestTableMapReduce against 0.23.
-

Key: HBASE-5212
URL: https://issues.apache.org/jira/browse/HBASE-5212
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Mahadev konar
Fix For: 0.92.1

Attachments: HBASE-5212.patch

As reported by Andrew on the hadoop mailing list, mvn -Dhadoop.profile=23
clean test -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce fails
on 0.92 branch. There are minor changes to HBase poms required to fix that.

[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-16 Thread Jimmy Xiang (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187332#comment-13187332
 ] 

Jimmy Xiang commented on HBASE-5196:


@Ted, I ran the test suite, and verified the fix on CDH3u3.
Let me run the test suite on 0.90 now. 


 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt, hbase-5196_0.90.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187333#comment-13187333
 ] 

Zhihong Yu commented on HBASE-2600:
---

See MAPREDUCE-3583 for background on test failures for:
{code}
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
{code}
TestMetaMigrationRemovingHTD needs attention for this feature.

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5212) Fix test TestTableMapReduce against 0.23.