[jira] [Updated] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-03 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6067:
--

Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Yu
 Fix For: 0.96.0

 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-03 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288166#comment-13288166
 ] 

Zhihong Yu commented on HBASE-6067:
---

I ran TestRegionRebalancing on MacBook and it passed.

@Eli:
Do you think the patch should go to 0.92 or 0.94 as well ?

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Yu
 Fix For: 0.96.0

 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-03 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6060:
--

Fix Version/s: 0.92.3
   0.94.1
   0.96.0
 Hadoop Flags: Reviewed

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-03 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288168#comment-13288168
 ] 

Zhihong Yu commented on HBASE-6060:
---

When I tried to produce a patch for 0.92 branch, I found that HBASE-5546 wasn't 
integrated to 0.92

@Ram:
Do you think HBASE-5546 and this JIRA should go to 0.92 ?

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-03 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288187#comment-13288187
 ] 

Zhihong Yu commented on HBASE-6060:
---

@Ram:
That is fine.

@Rajesh:
Can you prepare patch for 0.92 (and run test suite) ? In the future, publishing 
trunk patch on review board would allow reviewers to easily spot the 
differences across patches.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 
 HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-03 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288190#comment-13288190
 ] 

Zhihong Yu commented on HBASE-6060:
---

I ran the two tests flagged by Hadoop QA and they passed.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.96.0, 0.94.1, 0.92.3

 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 
 HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287937#comment-13287937
 ] 

Zhihong Yu commented on HBASE-6060:
---

Test suite passed.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect

2012-06-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287949#comment-13287949
 ] 

Zhihong Yu commented on HBASE-5974:
---

HRegionInterface.java doesn't exist in trunk so patch v2 wouldn't apply to 
trunk.
I would suggest creating patch for trunk and run through hadoop QA.
{code}
+  LOG.info(Seq number based scan API not present at RS side! 
Trying with API: 
{code}
I think the above log should be at warn level.
{code}
+} else if (ioe instanceof CallSequenceOutOfOrderException) {
+  // The callSeq from the client not matched with the one expected at 
the RS side
+  // This means the RS might have done extra scanning of data which is 
not received by the
+  // client.Throw a DNRE so that we close the current scanner and 
opens a new one with RS.
+  throw new DoNotRetryIOException(Reset scanner, ioe);
{code}
Should we disclose a little more detail in the message of DNRIOE ? The above is 
the same as response to NotServingRegionException and 
RegionServerStoppedException.
'not matched with' - 'does not match'
'is not received' - 'has not been received'
'opens a new' - 'open a new'
{code}
+// if callSeq do not match throw Exception straight away. This needs to be 
performed even
{code}
'do not match' - 'does not match'
{code}
+public class TestClientScannerRPCTimesout {^M
{code}
Please add short javadoc for the test class. I think it should be called 
TestClientScannerRPCTimeout.
Please use utility such as dos2unix to remove the trailing ^M from the patch 
file.
{code}
+  public static class RegionServerWithScanTimesout extends 
MiniHBaseClusterRegionServer {^M
{code}
The above class can be made private. It should be named 
RegionServerWithScanTimeout.
{code}
+ * Thrown by a region server while scan related next() calls. Both client and 
server maintain a^M
+ * callSequence and if the both do not match, RS will throw this exception.^M
+ */^M
+public class CallSequenceOutOfOrderException extends IOException {^M
{code}
CallSequenceOutOfOrderException should extend DoNotRetryIOException so that we 
don't need to create DoNotRetryIOException instance (shown above).
'while scan related next()' - 'while doing scan related next()'
'the both do not' - 'they do not'

It would be nice for Todd to take a look at the patch.

 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5974_0.94.patch, HBASE-5974_94-V2.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect

2012-06-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287953#comment-13287953
 ] 

Zhihong Yu commented on HBASE-5974:
---

w.r.t. keeping RegionScannerHolder, I posted a poll to dev@hbase for use case 
of letting [pre,post]ScannerOpen() return a custom RegionScanner implementation.

 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5974_0.94.patch, HBASE-5974_94-V2.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6151) Master can die if RegionServer throws ServerNotRunningYet

2012-06-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287971#comment-13287971
 ] 

Zhihong Yu commented on HBASE-6151:
---

ServerNotRunningYetException should be handled in the last catch block of 
getCachedConnection():
{code}
} catch (IOException ioe) {
{code}

 Master can die if RegionServer throws ServerNotRunningYet
 -

 Key: HBASE-6151
 URL: https://issues.apache.org/jira/browse/HBASE-6151
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.90.7, 0.92.2, 0.96.0, 0.94.1
Reporter: Gregory Chanan
Assignee: Gregory Chanan

 See, for example:
 {noformat}
 2012-05-23 16:49:22,745 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
 org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running 
 yet
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1240)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:444)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:343)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:540)
   at 
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:474)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:412)
 {noformat}
 The HRegionServer calls HBaseServer:
 {code}
   public void start() {
 startThreads();
 openServer();
   }
 {code}
 but the server can start accepting RPCs once the threads have been started, 
 but if they do, they throw ServerNotRunningException until openServer runs.  
 We should probably
 1) Catch the remote exception and retry on the master
 2) Look into whether the start() behavior of HBaseServer makes any sense.  
 Why would you start accepting RPCs only to throw back 
 ServerNotRunningException?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-02 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6067:
--

Attachment: 6067.txt

Patch v1 introduces reflection to detect the presence of 
getDefaultBlockSize(Path f)

TestHLog passes.

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-02 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6067:
--

Status: Patch Available  (was: Open)

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288017#comment-13288017
 ] 

Zhihong Yu commented on HBASE-6067:
---

@Eli:
Do you think the patch is Okay.

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288017#comment-13288017
 ] 

Zhihong Yu edited comment on HBASE-6067 at 6/2/12 10:11 PM:


@Eli:
Do you think the patch is Okay ?

  was (Author: zhi...@ebaysf.com):
@Eli:
Do you think the patch is Okay.
  
 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288057#comment-13288057
 ] 

Zhihong Yu commented on HBASE-6067:
---

fs.getDefaultBlockSize() is only called in one place:
{code}
this.blocksize = conf.getLong(hbase.regionserver.hlog.blocksize,
getDefaultBlockSize());
{code}
So I didn't a Method member.
I will upload a new patch with setAccessible() call.

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-02 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-6067:
-

Assignee: Zhihong Yu  (was: Eli Collins)

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Yu
 Attachments: 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-02 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6067:
--

Attachment: 6067-v2.txt

Added setAccessible() call.

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Yu
 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6067) HBase won't start when hbase.rootdir uses ViewFileSystem

2012-06-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288059#comment-13288059
 ] 

Zhihong Yu commented on HBASE-6067:
---

findbugs functionality hasn't been fixed 
(https://builds.apache.org/job/PreCommit-HBASE-Build/2088//console):
{code}
[ERROR] Could not find resource 
'${parent.basedir}/../dev-support/findbugs-exclude.xml'. - [Help 1]
{code}

 HBase won't start when hbase.rootdir uses ViewFileSystem
 

 Key: HBASE-6067
 URL: https://issues.apache.org/jira/browse/HBASE-6067
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Eli Collins
Assignee: Zhihong Yu
 Attachments: 6067-v2.txt, 6067.txt


 HBase currently doesn't work with HDFS federation (hbase.rootdir with a 
 client that uses viewfs) because HLog#init uses 
 FileSystem#getDefaultBlockSize and getDefaultReplication. These throw an 
 exception because there is no default filesystem in a viewfs client so 
 there's no way to determine a default block size or replication factor. They 
 could use the versions of these methods that take a path, however these were 
 introduced in HADOOP-8014 and are not yet available in Hadoop 1.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287526#comment-13287526
 ] 

Zhihong Yu commented on HBASE-6046:
---

I ran the new test and it passed.
{code}
   }
+  public void finishInitialization() {
+finishInitialization(false);
{code}
Please add javadoc for the above method. Leave one empty line between the 
previous method and finishInitialization().

In test code:
{code}
+  public static class MockLoadBalancer extends DefaultLoadBalancer {
{code}
The above class can be private.


 Master retry on ZK session expiry causes inconsistent region assignments.
 -

 Key: HBASE-6046
 URL: https://issues.apache.org/jira/browse/HBASE-6046
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: Gopinathan A
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch


 1 ZK Session timeout in the hmaster leads to bulk assignment though all the 
 RSs are online.
 2 While doing bulk assignment, if the master again goes down  restart(or 
 backup comes up) all the node created in the ZK will now be tried to reassign 
 to the new RSs. This is leading to double assignment.
 we had 2800 regions, among this 1900 region got double assignment, taking the 
 region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287489#comment-13287489
 ] 

Zhihong Yu edited comment on HBASE-6060 at 6/1/12 4:47 PM:
---

This patch is also a backport of HBASE-5396.  But this is more exhaustive and 
also tries to address HBASE-5816.
HBASE-6147 has been raised to solve other assign related issues that comes from 
SSH and joincluster.  Pls review and provide your comments.

  was (Author: rajesh23):
This patch is also a backport of HBASe-5396.  But this is more exhaustive 
and also tries to address HBASE-5816.
HBASE-6147 has been raised to solve other assign related issues that comes from 
SSH and joincluster.  Pls review and provide your comments.
  
 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287547#comment-13287547
 ] 

Zhihong Yu commented on HBASE-6060:
---

{code}
+  if(!plan.canUsePlan()){
+return;
{code}
Please insert a space after if. It would be helpful to add LOG.debug() before 
returning.
{code}
+  public void usePlan(boolean usePlan) {
+this.usePlan = usePlan;
+  }
{code}
I would name the boolean 'usable'. The setter can be named setUsable().
A bigger question is:
{code}
+  if (newPlan) {
+randomPlan.usePlan(false);
+this.regionPlans.remove(randomPlan.getRegionName());
+  } else {
+existingPlan.usePlan(false);
+this.regionPlans.remove(existingPlan.getRegionName());
+  }
{code}
why cannot we return null above so that we don't need to add the boolean member 
to RegionPlan ?
At least we shouldn't return an unusable randomPlan.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287561#comment-13287561
 ] 

Zhihong Yu commented on HBASE-6060:
---

I ran the tests in TestAssignmentManager and they passed.
{code}
 synchronized (this.regionPlans) {
+  regionsOnDeadServer = new RegionsOnDeadServer();
+  regionsFromRegionPlansForServer = new 
ConcurrentSkipListSetHRegionInfo();
+  this.deadServerRegionsFromRegionPlan.put(sn, regionsOnDeadServer);
{code}
Can the first two assignments be placed outside synchronized block ?
Before making the deadServerRegionsFromRegionPlan.put() call, I think we should 
check that sn isn't currently in deadServerRegionsFromRegionPlan.
For isRegionOnline(HRegionInfo hri):
{code}
+return true;
+  } else {
+// Remove the assignment mapping for sn.
+SetHRegionInfo hriSet = this.servers.get(sn);
+if (hriSet != null) {
+  hriSet.remove(hri);
+}
{code}
The else keyword isn't needed.
What if hriSet contains other regions apart from hri, should they be removed as 
well ?

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287579#comment-13287579
 ] 

Zhihong Yu commented on HBASE-6060:
---

Thanks for working on this issue.

I will review the next version in more detail :-)

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287595#comment-13287595
 ] 

Zhihong Yu commented on HBASE-5924:
---

For #2 above, I think we can remove the callback in 0.96

 In the client code, don't wait for all the requests to be executed before 
 resubmitting a request in error.
 --

 Key: HBASE-5924
 URL: https://issues.apache.org/jira/browse/HBASE-5924
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor

 The client (in the function HConnectionManager#processBatchCallback) works in 
 two steps:
  - make the requests
  - collect the failures and successes and prepare for retry
 It means that when there is an immediate error (region moved, split, dead 
 server, ...) we still wait for all the initial requests to be executed before 
 submitting again the failed request. If we have a scenario with all the 
 requests taking 5 seconds we have a final execution time of: 5 (initial 
 requests) + 1 (wait time) + 5 (final request) = 11s.
 We could improve this by analyzing immediately the results. This would lead 
 us, for the scenario mentioned above, to 6 seconds. 
 So we could have a performance improvement of nearly 50% in many cases, and 
 much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287599#comment-13287599
 ] 

Zhihong Yu commented on HBASE-6060:
---

Thinking more about the usable RegionPlan flag, we don't really need it.
We can introduce an 'unusable' RegionPlan singleton which signifies the fact 
that it is not to be used.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287632#comment-13287632
 ] 

Zhihong Yu commented on HBASE-6046:
---

Patch v2 looks good.
Minor comment:
{code}
-  .splitLogManagerTimeoutMonitor);
+  public void finishInitialization(boolean masterRecovery) {
{code}
Add javadoc for the method and masterRecovery parameter.

 Master retry on ZK session expiry causes inconsistent region assignments.
 -

 Key: HBASE-6046
 URL: https://issues.apache.org/jira/browse/HBASE-6046
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: Gopinathan A
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch, 
 HBASE_6046_0.94_2.patch


 1 ZK Session timeout in the hmaster leads to bulk assignment though all the 
 RSs are online.
 2 While doing bulk assignment, if the master again goes down  restart(or 
 backup comes up) all the node created in the ZK will now be tried to reassign 
 to the new RSs. This is leading to double assignment.
 we had 2800 regions, among this 1900 region got double assignment, taking the 
 region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6060:
--

Attachment: 6060-94-v3.patch

Patch v3 illustrates my proposal.

I also created a singleton for the null RegionPlan that signifies there is no 
server to assign region.

TestAssignmentManager passes.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: 6060-94-v3.patch, HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6138) HadoopQA not running findbugs [Trunk]

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287753#comment-13287753
 ] 

Zhihong Yu commented on HBASE-6138:
---

Will integrate the suggested fix if there is no objection.

 HadoopQA not running findbugs [Trunk]
 -

 Key: HBASE-6138
 URL: https://issues.apache.org/jira/browse/HBASE-6138
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Anoop Sam John
 Fix For: 0.96.0


 HadoopQA shows like
  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
 But not able to see any reports link
 When I checked the console output for the build I can see
 {code}
 [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
 ---
 [INFO] Fork Value is true
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] HBase . SUCCESS [1.890s]
 [INFO] HBase - Common  FAILURE [2.238s]
 [INFO] HBase - Server  SKIPPED
 [INFO] HBase - Assembly .. SKIPPED
 [INFO] HBase - Site .. SKIPPED
 [INFO] 
 
 [INFO] BUILD FAILURE
 [INFO] 
 
 [INFO] Total time: 4.856s
 [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
 [INFO] Final Memory: 23M/154M
 [INFO] 
 
 [ERROR] Could not find resource 
 '${parent.basedir}/dev-support/findbugs-exclude.xml'. - [Help 1]
 [ERROR] 
 {code}
 Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6138) HadoopQA not running findbugs [Trunk]

2012-06-01 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6138:
--

Attachment: 6138.txt

Patch that I am going to apply.

 HadoopQA not running findbugs [Trunk]
 -

 Key: HBASE-6138
 URL: https://issues.apache.org/jira/browse/HBASE-6138
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Anoop Sam John
 Fix For: 0.96.0

 Attachments: 6138.txt


 HadoopQA shows like
  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
 But not able to see any reports link
 When I checked the console output for the build I can see
 {code}
 [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
 ---
 [INFO] Fork Value is true
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] HBase . SUCCESS [1.890s]
 [INFO] HBase - Common  FAILURE [2.238s]
 [INFO] HBase - Server  SKIPPED
 [INFO] HBase - Assembly .. SKIPPED
 [INFO] HBase - Site .. SKIPPED
 [INFO] 
 
 [INFO] BUILD FAILURE
 [INFO] 
 
 [INFO] Total time: 4.856s
 [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
 [INFO] Final Memory: 23M/154M
 [INFO] 
 
 [ERROR] Could not find resource 
 '${parent.basedir}/dev-support/findbugs-exclude.xml'. - [Help 1]
 [ERROR] 
 {code}
 Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6138) HadoopQA not running findbugs [Trunk]

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287771#comment-13287771
 ] 

Zhihong Yu commented on HBASE-6138:
---

Integrated to trunk.

Thanks for the patch Anoop.

 HadoopQA not running findbugs [Trunk]
 -

 Key: HBASE-6138
 URL: https://issues.apache.org/jira/browse/HBASE-6138
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.96.0

 Attachments: 6138.txt


 HadoopQA shows like
  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
 But not able to see any reports link
 When I checked the console output for the build I can see
 {code}
 [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
 ---
 [INFO] Fork Value is true
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] HBase . SUCCESS [1.890s]
 [INFO] HBase - Common  FAILURE [2.238s]
 [INFO] HBase - Server  SKIPPED
 [INFO] HBase - Assembly .. SKIPPED
 [INFO] HBase - Site .. SKIPPED
 [INFO] 
 
 [INFO] BUILD FAILURE
 [INFO] 
 
 [INFO] Total time: 4.856s
 [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
 [INFO] Final Memory: 23M/154M
 [INFO] 
 
 [ERROR] Could not find resource 
 '${parent.basedir}/dev-support/findbugs-exclude.xml'. - [Help 1]
 [ERROR] 
 {code}
 Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287813#comment-13287813
 ] 

Zhihong Yu commented on HBASE-5936:
---

I can easily reproduce one of the test failures seen on Jenkins 
(https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2972/testReport/org.apache.hadoop.hbase.master/TestHMasterRPCException/testRPCException/):
{code}
Failed tests:   
testRPCException(org.apache.hadoop.hbase.master.TestHMasterRPCException): 
Unexpected throwable: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running 
yet
{code}

 Add Column-level PB-based calls to HMasterInterface
 ---

 Key: HBASE-5936
 URL: https://issues.apache.org/jira/browse/HBASE-5936
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
 HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
 HBASE-5936.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 This is for converting the column-level calls, i.e.:
 addColumn
 deleteColumn
 modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6060:
--

Attachment: 6060-94-v4.patch

Patch v4 addresses Rajesh's comment and some of my own comments.

TestAssignmentManager passes.

Running test suite.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, HBASE-6060-94.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface

2012-06-01 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5936:
--

Attachment: 5936-addendum.txt

Looks like the snippet from patch v6 for TestHMasterRPCException wasn't applied 
to trunk.
Addendum attached.

 Add Column-level PB-based calls to HMasterInterface
 ---

 Key: HBASE-5936
 URL: https://issues.apache.org/jira/browse/HBASE-5936
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: 5936-addendum.txt, HBASE-5936-v3.patch, 
 HBASE-5936-v4.patch, HBASE-5936-v4.patch, HBASE-5936-v5.patch, 
 HBASE-5936-v6.patch, HBASE-5936.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 This is for converting the column-level calls, i.e.:
 addColumn
 deleteColumn
 modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface

2012-06-01 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5936:
--

Attachment: (was: 5936-addendum.txt)

 Add Column-level PB-based calls to HMasterInterface
 ---

 Key: HBASE-5936
 URL: https://issues.apache.org/jira/browse/HBASE-5936
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
 HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
 HBASE-5936.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 This is for converting the column-level calls, i.e.:
 addColumn
 deleteColumn
 modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface

2012-06-01 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5936:
--

Attachment: 5936-addendum-v2.txt

The exception came out of HBaseRPC.getProxy() call.
Addendum v2 passes TestHMasterRPCException.

 Add Column-level PB-based calls to HMasterInterface
 ---

 Key: HBASE-5936
 URL: https://issues.apache.org/jira/browse/HBASE-5936
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: 5936-addendum-v2.txt, HBASE-5936-v3.patch, 
 HBASE-5936-v4.patch, HBASE-5936-v4.patch, HBASE-5936-v5.patch, 
 HBASE-5936-v6.patch, HBASE-5936.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 This is for converting the column-level calls, i.e.:
 addColumn
 deleteColumn
 modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface

2012-06-01 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287855#comment-13287855
 ] 

Zhihong Yu commented on HBASE-5936:
---

Addendum v2 integrated to trunk.

 Add Column-level PB-based calls to HMasterInterface
 ---

 Key: HBASE-5936
 URL: https://issues.apache.org/jira/browse/HBASE-5936
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: 5936-addendum-v2.txt, HBASE-5936-v3.patch, 
 HBASE-5936-v4.patch, HBASE-5936-v4.patch, HBASE-5936-v5.patch, 
 HBASE-5936-v6.patch, HBASE-5936.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 This is for converting the column-level calls, i.e.:
 addColumn
 deleteColumn
 modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-31 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286691#comment-13286691
 ] 

Zhihong Yu commented on HBASE-6134:
---

From 
https://builds.apache.org/job/PreCommit-HBASE-Build/2074//testReport/org.apache.hadoop.hbase.regionserver.wal/TestHLog/testAppendClose/:
{code}
java.net.BindException: Problem binding to localhost/127.0.0.1:55283 : Address 
already in use
{code}

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch, HBASE-6134v2.patch, HBASE-6134v3.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

2012-05-31 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286718#comment-13286718
 ] 

Zhihong Yu commented on HBASE-6109:
---

In ZKUtil.java, the only change was:
{code}
+  LOG.debug(Deleting node +node);
{code}
Can I take this out at time of integration ?

 Improve RIT performances during assignment on large clusters
 

 Key: HBASE-6109
 URL: https://issues.apache.org/jira/browse/HBASE-6109
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 
 6109.v24.patch, 6109.v7.patch


 The main points in this patch are:
  - lowering the number of copy of the RIT list
  - lowering the number of synchronization
  - synchronizing on a region rather than on everything
 It also contains:
  - some fixes around the RIT notification: the list was sometimes modified 
 without a corresponding 'notify'.
  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

2012-05-31 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286725#comment-13286725
 ] 

Zhihong Yu commented on HBASE-6109:
---

Integrated to trunk.

Thanks for the patch, N.

Thanks for the review, Stack and Ram.

 Improve RIT performances during assignment on large clusters
 

 Key: HBASE-6109
 URL: https://issues.apache.org/jira/browse/HBASE-6109
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v23.patch, 
 6109.v24.patch, 6109.v7.patch


 The main points in this patch are:
  - lowering the number of copy of the RIT list
  - lowering the number of synchronization
  - synchronizing on a region rather than on everything
 It also contains:
  - some fixes around the RIT notification: the list was sometimes modified 
 without a corresponding 'notify'.
  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework

2012-05-31 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286763#comment-13286763
 ] 

Zhihong Yu commented on HBASE-4050:
---

@Otis:
It has been 11 months.
Please take this if you have time.

 Update HBase metrics framework to metrics2 framework
 

 Key: HBASE-4050
 URL: https://issues.apache.org/jira/browse/HBASE-4050
 Project: HBase
  Issue Type: New Feature
  Components: metrics
Affects Versions: 0.90.4
 Environment: Java 6
Reporter: Eric Yang
Assignee: Shaneal Manek
Priority: Critical

 Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, 
 and it might get removed in future Hadoop release.  Hence, HBase needs to 
 revise the dependency of MetricsContext to use Metrics2 framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6140) Make distributed log splitting faster by changing call site of tmp log renaming

2012-05-31 Thread Zhihong Yu (JIRA)
Zhihong Yu created HBASE-6140:
-

 Summary: Make distributed log splitting faster by changing call 
site of tmp log renaming
 Key: HBASE-6140
 URL: https://issues.apache.org/jira/browse/HBASE-6140
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu


For 1 regions, current distributed log splitting took 22 minutes.

After moving the tmp log renaming, we observed duration of 7 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface

2012-05-31 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286821#comment-13286821
 ] 

Zhihong Yu commented on HBASE-5936:
---

No Hadoop QA activity so far.

Probably you need to run the suite yourself.

 Add Column-level PB-based calls to HMasterInterface
 ---

 Key: HBASE-5936
 URL: https://issues.apache.org/jira/browse/HBASE-5936
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
 HBASE-5936.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 This is for converting the column-level calls, i.e.:
 addColumn
 deleteColumn
 modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6140) Make distributed log splitting faster by changing call site of tmp log renaming

2012-05-31 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286825#comment-13286825
 ] 

Zhihong Yu commented on HBASE-6140:
---

There are 105 machines.

We'll try Chunhui's solution after HBASE-6134 is finalized.

 Make distributed log splitting faster by changing call site of tmp log 
 renaming
 ---

 Key: HBASE-6140
 URL: https://issues.apache.org/jira/browse/HBASE-6140
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu

 For 1 regions, current distributed log splitting took 22 minutes.
 After moving the tmp log renaming, we observed duration of 7 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface

2012-05-31 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286847#comment-13286847
 ] 

Zhihong Yu commented on HBASE-5936:
---

Still no activity.

 Add Column-level PB-based calls to HMasterInterface
 ---

 Key: HBASE-5936
 URL: https://issues.apache.org/jira/browse/HBASE-5936
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
 HBASE-5936-v4.patch, HBASE-5936.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 This is for converting the column-level calls, i.e.:
 addColumn
 deleteColumn
 modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6144) Master mistakenly splits live server's HLog file

2012-05-31 Thread Zhihong Yu (JIRA)
Zhihong Yu created HBASE-6144:
-

 Summary: Master mistakenly splits live server's HLog file
 Key: HBASE-6144
 URL: https://issues.apache.org/jira/browse/HBASE-6144
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu


RS abcdn0590 is live, but Master does not have it on its onlineserver list. So, 
Master put up the hlog for splitting as shown in the Master log below:
2012-05-17 21:43:57,692 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
 acquired by abcdn0770.xyz.com,60020,1337315956278. 
 
After splitting succeeded, Master deleted the file:
2012-05-17 21:43:58,721 DEBUG 
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
/hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
 
RS abcdn0590  lost the lease to RS dn0770, and try to do a Log Roller which 
closes the current hlog, and create a new one, as shown in the namenode log:
2012-05-17 21:43:58,422 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
commitBlockSynchronization(newblock=blk_2867982016684075739_12741027, 
file=/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711,
 newgenerationstamp=12911920, newlength=134, newtargets=[10.115.13.24:50010, 
10.115.15.46:50010, 10.115.15.23:50010]) successful
2012-05-17 21:43:59,883 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
 blk_3811725326431482476_12913541{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.115.13.24:50010|RBW], 
ReplicaUnderConstruction[10.115.17.18:50010|RBW], 
ReplicaUnderConstruction[10.115.17.15:50010|RBW]]}
 
When RS 0590 try to close the old hlog 1337315957711, it received fatal error 
below due to the original hlog is already deleted. The fatal error will cause 
RS abcdn0590 to shutdown itself later.
2012-05-17 21:43:58,889 ERROR org.apache.hadoop.hbase.master.HMaster: Region 
server ^@^@abcdn0590.xyz.com,60020,1337315957185 reported a fatal error:
ABORTING region server abcdn0590.xyz.com,60020,1337315957185: IOE in log roller
Cause:
java.io.FileNotFoundException: File does not exist: 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:583)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 
 
RS abcdn0590 shutdown at around 21:44. But in the /hbase/.logs dir, it left two 
sub folder for the RS abcdn0590 with the same startcode 1337315957185 , they are
· /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/
· /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/
 
Later on, at around 21:46:30, Master retry log splitting, this time,  it still 
consider RS abcdn0590 as dead RS and try to put up its hlog for others to grab 
and split. It finds the folder 
/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/, and the first step it does 
is to rename it to adding suffix of –splitting.  However, the same folder 
already exist. The rename function does not handle the case where the 
destination folder already exist, instead, the behavior is putting the src 
folder under the dst folder, so the path structure looks like dst/src/file. In 
our case, It is 
/hbase/.logs.20120518.1204/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
 
This is from the master log, we can see that two folders for the same RS 0590 
at same startcode exists:
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1329941607395-splitting
 doesn't belong to a known region server, splitting
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185 
doesn't belong to a known region server, splitting
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting
 doesn't belong to a known region server, splitting
 
2012-05-17 21:46:30,962 DEBUG org.apache.hadoop.hbase.master.MasterFileSystem: 
Renamed region 

[jira] [Updated] (HBASE-6144) Master mistakenly splits live server's HLog file

2012-05-31 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6144:
--

Description: 
RS abcdn0590 is live, but Master does not have it on its onlineserver list. So, 
Master put up the hlog for splitting as shown in the Master log below:
{code}
2012-05-17 21:43:57,692 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
 acquired by abcdn0770.xyz.com,60020,1337315956278. 
{code}

After splitting succeeded, Master deleted the file:
{code}
2012-05-17 21:43:58,721 DEBUG 
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
/hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
{code}

RS abcdn0590 lost the lease to RS dn0770, and try to do a Log Roller which 
closes the current hlog, and create a new one, as shown in the namenode log:
{code}
2012-05-17 21:43:58,422 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
commitBlockSynchronization(newblock=blk_2867982016684075739_12741027, 
file=/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711,
 newgenerationstamp=12911920, newlength=134, newtargets=[10.115.13.24:50010, 
10.115.15.46:50010, 10.115.15.23:50010]) successful
2012-05-17 21:43:59,883 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
 blk_3811725326431482476_12913541{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.115.13.24:50010|RBW], 
ReplicaUnderConstruction[10.115.17.18:50010|RBW], 
ReplicaUnderConstruction[10.115.17.15:50010|RBW]]}
{code}
 
When RS 0590 try to close the old hlog 1337315957711, it received fatal error 
below due to the original hlog is already deleted. The fatal error will cause 
RS abcdn0590 to shutdown itself later.
{code}
2012-05-17 21:43:58,889 ERROR org.apache.hadoop.hbase.master.HMaster: Region 
server ^@^@abcdn0590.xyz.com,60020,1337315957185 reported a fatal error:
ABORTING region server abcdn0590.xyz.com,60020,1337315957185: IOE in log roller
Cause:
java.io.FileNotFoundException: File does not exist: 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:583)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
{code}
 
RS abcdn0590 shutdown at around 21:44. But in the /hbase/.logs dir, it left two 
sub folder for the RS abcdn0590 with the same startcode 1337315957185 , they are
· /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/
· /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/
 
Later on, at around 21:46:30, Master retry log splitting, this time,  it still 
consider RS abcdn0590 as dead RS and try to put up its hlog for others to grab 
and split. It finds the folder 
/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/, and the first step it does 
is to rename it to adding suffix of –splitting.  However, the same folder 
already exist. The rename function does not handle the case where the 
destination folder already exist, instead, the behavior is putting the src 
folder under the dst folder, so the path structure looks like dst/src/file. In 
our case, It is 
/hbase/.logs.20120518.1204/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
 
This is from the master log, we can see that two folders for the same RS 0590 
at same startcode exists:
{code}
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1329941607395-splitting
 doesn't belong to a known region server, splitting
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185 
doesn't belong to a known region server, splitting
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting
 doesn't belong to a known region server, splitting
 
2012-05-17 21:46:30,962 DEBUG org.apache.hadoop.hbase.master.MasterFileSystem: 
Renamed region directory: 

[jira] [Updated] (HBASE-6144) Master mistakenly splits live server's HLog file

2012-05-31 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6144:
--

Description: 
RS abcdn0590 is live, but Master does not have it on its onlineserver list. So, 
Master put up the hlog for splitting as shown in the Master log below:
{code}
2012-05-17 21:43:57,692 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
 acquired by abcdn0770.xyz.com,60020,1337315956278. 
{code}

After splitting succeeded, Master deleted the file:
{code}
2012-05-17 21:43:58,721 DEBUG 
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
/hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
{code}

RS abcdn0590 lost the lease to RS abcdn0770, and try to do a Log Roller which 
closes the current hlog, and create a new one, as shown in the namenode log:
{code}
2012-05-17 21:43:58,422 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
commitBlockSynchronization(newblock=blk_2867982016684075739_12741027, 
file=/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711,
 newgenerationstamp=12911920, newlength=134, newtargets=[10.115.13.24:50010, 
10.115.15.46:50010, 10.115.15.23:50010]) successful
2012-05-17 21:43:59,883 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
 blk_3811725326431482476_12913541{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.115.13.24:50010|RBW], 
ReplicaUnderConstruction[10.115.17.18:50010|RBW], 
ReplicaUnderConstruction[10.115.17.15:50010|RBW]]}
{code}
 
When RS 0590 try to close the old hlog 1337315957711, it received fatal error 
below due to the original hlog is already deleted. The fatal error will cause 
RS abcdn0590 to shutdown itself later.
{code}
2012-05-17 21:43:58,889 ERROR org.apache.hadoop.hbase.master.HMaster: Region 
server ^@^@abcdn0590.xyz.com,60020,1337315957185 reported a fatal error:
ABORTING region server abcdn0590.xyz.com,60020,1337315957185: IOE in log roller
Cause:
java.io.FileNotFoundException: File does not exist: 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:583)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
{code}
 
RS abcdn0590 shutdown at around 21:44. But in the /hbase/.logs dir, it left two 
sub folder for the RS abcdn0590 with the same startcode 1337315957185 , they are
· /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/
· /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/
 
Later on, at around 21:46:30, Master retry log splitting, this time,  it still 
consider RS abcdn0590 as dead RS and try to put up its hlog for others to grab 
and split. It finds the folder 
/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/, and the first step it does 
is to rename it to adding suffix of –splitting.  However, the same folder 
already exist. The rename function does not handle the case where the 
destination folder already exist, instead, the behavior is putting the src 
folder under the dst folder, so the path structure looks like dst/src/file. In 
our case, It is 
/hbase/.logs.20120518.1204/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
 
This is from the master log, we can see that two folders for the same RS 0590 
at same startcode exists:
{code}
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1329941607395-splitting
 doesn't belong to a known region server, splitting
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185 
doesn't belong to a known region server, splitting
2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting
 doesn't belong to a known region server, splitting
 
2012-05-17 21:46:30,962 DEBUG org.apache.hadoop.hbase.master.MasterFileSystem: 
Renamed region directory: 

[jira] [Updated] (HBASE-6014) Support for block-granularity bitmap indexes

2012-05-31 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6014:
--

Attachment: 6014-bitmap-hacking.txt

I didn't rebase Todd's patch for trunk cause the pom.xml structure changed.

I moved TestByteArrayCuckooMap from src/main to src/test

 Support for block-granularity bitmap indexes
 

 Key: HBASE-6014
 URL: https://issues.apache.org/jira/browse/HBASE-6014
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Todd Lipcon
 Attachments: 6014-bitmap-hacking.txt, bitmap-hacking.txt


 This came up in a discussion with Kannan today, so I promised to write 
 something brief on JIRA -- this was suggested as a potential summer intern 
 project. The idea is as follows:
 We have several customers who periodically run full table scan MR jobs 
 against large HBase tables while applying fairly restrictive predicates. The 
 predicates are often reasonably simple boolean expressions across known 
 columns, and those columns often are enum-typed or otherwise have a fairly 
 restricted range of values. For example, a real time process may mark rows as 
 dirty, and a background MR job may scan for dirty rows in order to perform 
 further processing like rebuilding inverted indexes.
 One way to speed up this type of query is to add bitmap indexes. In the 
 context of HBase, I would envision this as a new type of metadata block 
 included in the HFile which has a series of tuples: (qualifier, value range, 
 compressed bitmap). A 1 bit in the bitmap indicates that the corresponding 
 HFile block has at least one cell for which a column with the given qualifier 
 falls within the given range. Queries which have an equality or comparison 
 predicate against an indexed qualifier can then use the bitmap index to seek 
 directly to those blocks which may contain relevant data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6144) Master mistakenly splits live server's HLog file

2012-05-31 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6144:
--

Affects Version/s: 0.92.0
 Release Note: Underlying hadoop is 0.22

 Master mistakenly splits live server's HLog file
 

 Key: HBASE-6144
 URL: https://issues.apache.org/jira/browse/HBASE-6144
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu

 RS abcdn0590 is live, but Master does not have it on its onlineserver list. 
 So, Master put up the hlog for splitting as shown in the Master log below:
 {code}
 2012-05-17 21:43:57,692 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
  acquired by abcdn0770.xyz.com,60020,1337315956278. 
 {code}
 After splitting succeeded, Master deleted the file:
 {code}
 2012-05-17 21:43:58,721 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2Fnamenode.xyz.com%2Fhbase%2F.logs%2Fabcdn0590.xyz.com%2C60020%2C1337315957185-splitting%2Fabcdn0590.xyz.com%252C60020%252C1337315957185.1337315957711
 {code}
 RS abcdn0590 lost the lease to RS abcdn0770, and try to do a Log Roller which 
 closes the current hlog, and create a new one, as shown in the namenode log:
 {code}
 2012-05-17 21:43:58,422 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(newblock=blk_2867982016684075739_12741027, 
 file=/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711,
  newgenerationstamp=12911920, newlength=134, newtargets=[10.115.13.24:50010, 
 10.115.15.46:50010, 10.115.15.23:50010]) successful
 2012-05-17 21:43:59,883 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
  blk_3811725326431482476_12913541{blockUCState=UNDER_CONSTRUCTION, 
 primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[10.115.13.24:50010|RBW], 
 ReplicaUnderConstruction[10.115.17.18:50010|RBW], 
 ReplicaUnderConstruction[10.115.17.15:50010|RBW]]}
 {code}
  
 When RS 0590 try to close the old hlog 1337315957711, it received fatal error 
 below due to the original hlog is already deleted. The fatal error will cause 
 RS abcdn0590 to shutdown itself later.
 {code}
 2012-05-17 21:43:58,889 ERROR org.apache.hadoop.hbase.master.HMaster: Region 
 server ^@^@abcdn0590.xyz.com,60020,1337315957185 reported a fatal error:
 ABORTING region server abcdn0590.xyz.com,60020,1337315957185: IOE in log 
 roller
 Cause:
 java.io.FileNotFoundException: File does not exist: 
 hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337315957711
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:583)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 {code}
  
 RS abcdn0590 shutdown at around 21:44. But in the /hbase/.logs dir, it left 
 two sub folder for the RS abcdn0590 with the same startcode 1337315957185 , 
 they are
 · /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185-splitting/
 · /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/
  
 Later on, at around 21:46:30, Master retry log splitting, this time,  it 
 still consider RS abcdn0590 as dead RS and try to put up its hlog for others 
 to grab and split. It finds the folder 
 /hbase/.logs/abcdn0590.xyz.com,60020,1337315957185/, and the first step it 
 does is to rename it to adding suffix of –splitting.  However, the same 
 folder already exist. The rename function does not handle the case where the 
 destination folder already exist, instead, the behavior is putting the src 
 folder under the dst folder, so the path structure looks like dst/src/file. 
 In our case, It is 
 /hbase/.logs.20120518.1204/abcdn0590.xyz.com,60020,1337315957185-splitting/abcdn0590.xyz.com,60020,1337315957185/abcdn0590.xyz.com%2C60020%2C1337315957185.1337316238882.
  
 This is from the master log, we can see that two folders for the same RS 0590 
 at same startcode exists:
 {code}
 2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
 Log folder 
 hdfs://namenode.xyz.com/hbase/.logs/abcdn0590.xyz.com,60020,1329941607395-splitting
  doesn't belong to a known region server, splitting
 2012-05-17 21:46:30,749 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
 Log folder 
 

[jira] [Commented] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285423#comment-13285423
 ] 

Zhihong Yu commented on HBASE-6088:
---

Addendum integrated to 0.94 branch.

  Region splitting not happened for long time due to ZK exception while 
 creating RS_ZK_SPLITTING node
 

 Key: HBASE-6088
 URL: https://issues.apache.org/jira/browse/HBASE-6088
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Gopinathan A
Assignee: rajeshbabu
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6088_92.patch, HBASE-6088_94.patch, 
 HBASE-6088_94_2.patch, HBASE-6088_94_3.patch, HBASE-6088_trunk.patch, 
 HBASE-6088_trunk_2.patch, HBASE-6088_trunk_3.patch, HBASE-6088_trunk_4.patch, 
 addendum_6088_94.patch


 Region splitting not happened for long time due to ZK exception while 
 creating RS_ZK_SPLITTING node
 {noformat}
 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session 
 timed out, have not heard from server in 26668ms for sessionid 
 0x1377a75f41d0012, closing socket connection and attempting reconnect
 2012-05-24 01:45:41,464 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
 ZooKeeper exception: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
 {noformat}
 {noformat}
 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
 cleanupCurrentWriter  waiting for transactions to get synced  total 189377 
 synced till here 189365
 2012-05-24 01:45:48,474 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
 setting SPLITTING znode on 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
 java.io.IOException: Failed setting SPLITTING znode on 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
   at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
 KeeperErrorCode = BadVersion for 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
   ... 5 more
 2012-05-24 01:45:48,476 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of 
 failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
 {noformat}
 {noformat}
 2012-05-24 01:47:28,141 ERROR 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is 
 not a retry
 2012-05-24 01:47:28,142 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
 create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
 java.io.IOException: Failed create of ephemeral 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
   at 
 

[jira] [Commented] (HBASE-6049) Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject()

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285434#comment-13285434
 ] 

Zhihong Yu commented on HBASE-6049:
---

@Maryann:
Please rebase patch on latest trunk.
Path for source would begin with hbase-server/

 Serializing List containing null elements will cause NullPointerException 
 in HbaseObjectWritable.writeObject()
 

 Key: HBASE-6049
 URL: https://issues.apache.org/jira/browse/HBASE-6049
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.94.0
Reporter: Maryann Xue
 Attachments: HBASE-6049-v2.patch, HBASE-6049-v3.patch, 
 HBASE-6049.patch


 An error case could be in Coprocessor AggregationClient, the median() 
 function handles an empty region and returns a List Object with the first 
 element as a Null value. NPE occurs in the RPC response stage and the 
 response never gets sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Speed up distribued split log

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285442#comment-13285442
 ] 

Zhihong Yu commented on HBASE-6134:
---

So your suggestion is to estimate log splitting duration based on number of 
HLog files ?
If the estimate is low, we stay with master log splitting.

 Speed up distribued split log
 -

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen

 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6134) Speed up distribued split log

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6134:
--

Fix Version/s: 0.96.0

 Speed up distribued split log
 -

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6134:
--

Status: Patch Available  (was: Open)

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6109:
--

Fix Version/s: 0.96.0

 Improve RIT performances during assignment on large clusters
 

 Key: HBASE-6109
 URL: https://issues.apache.org/jira/browse/HBASE-6109
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6109.v19.patch, 6109.v7.patch


 The main points in this patch are:
  - lowering the number of copy of the RIT list
  - lowering the number of synchronization
  - synchronizing on a region rather than on everything
 It also contains:
  - some fixes around the RIT notification: the list was sometimes modified 
 without a corresponding 'notify'.
  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6109:
--

Attachment: 6049-v3.patch

Patch v3 rebased on trunk.

 Improve RIT performances during assignment on large clusters
 

 Key: HBASE-6109
 URL: https://issues.apache.org/jira/browse/HBASE-6109
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6109.v19.patch, 6109.v7.patch


 The main points in this patch are:
  - lowering the number of copy of the RIT list
  - lowering the number of synchronization
  - synchronizing on a region rather than on everything
 It also contains:
  - some fixes around the RIT notification: the list was sometimes modified 
 without a corresponding 'notify'.
  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6109) Improve RIT performances during assignment on large clusters

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6109:
--

Attachment: (was: 6049-v3.patch)

 Improve RIT performances during assignment on large clusters
 

 Key: HBASE-6109
 URL: https://issues.apache.org/jira/browse/HBASE-6109
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6109.v19.patch, 6109.v7.patch


 The main points in this patch are:
  - lowering the number of copy of the RIT list
  - lowering the number of synchronization
  - synchronizing on a region rather than on everything
 It also contains:
  - some fixes around the RIT notification: the list was sometimes modified 
 without a corresponding 'notify'.
  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6049) Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject()

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6049:
--

Attachment: 6049-v3.patch

Patch v3 rebased on trunk.

 Serializing List containing null elements will cause NullPointerException 
 in HbaseObjectWritable.writeObject()
 

 Key: HBASE-6049
 URL: https://issues.apache.org/jira/browse/HBASE-6049
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.94.0
Reporter: Maryann Xue
 Attachments: 6049-v3.patch, HBASE-6049-v2.patch, HBASE-6049-v3.patch, 
 HBASE-6049.patch


 An error case could be in Coprocessor AggregationClient, the median() 
 function handles an empty region and returns a List Object with the first 
 element as a Null value. NPE occurs in the RPC response stage and the 
 response never gets sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285809#comment-13285809
 ] 

Zhihong Yu commented on HBASE-6109:
---

Rename TestLocker class to TestKeyLocker ?
{code}
+// It has no reason to be a lock shares with the other operations.
{code}
'shares with' - 'shared with'

Indentation in AssignmentManager.addToRITandCallClose() was off. It would be 
nice to correct the existing lines.
{code}
+// No lock concurrency: adding a share synchronized here would not prevent 
to have two
+//  entries as we don't check if the region is already there. This must be 
ensured by the
+//  method callers. todo nli: check
{code}
'share synchronized' - 'synchronized'. Remove the 'todo nli:' at the end.
{code}
-MapString, RegionPlan plans=new HashMapString, RegionPlan();
+MapString, RegionPlan plans=new HashMapString, 
RegionPlan(regions.size());
{code}
Insert spaces around = sign.
{code}
+   * @return True if none of the regions in the set are in transition
{code}
'are in' - 'is in'
{code}
+  public NavigableMapK, V copyMap() {
+return delegatee.clone();
{code}
Why not call the method clone() ?
{code}
+  public void clear() {
+if (!delegatee.isEmpty()) {
+  synchronized (delegatee) {
{code}
Suppose delegatee is empty upon entry to the above method, what if an entry is 
added after the isEmpty() check ?

 Improve RIT performances during assignment on large clusters
 

 Key: HBASE-6109
 URL: https://issues.apache.org/jira/browse/HBASE-6109
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6109.v19.patch, 6109.v7.patch


 The main points in this patch are:
  - lowering the number of copy of the RIT list
  - lowering the number of synchronization
  - synchronizing on a region rather than on everything
 It also contains:
  - some fixes around the RIT notification: the list was sometimes modified 
 without a corresponding 'notify'.
  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285821#comment-13285821
 ] 

Zhihong Yu commented on HBASE-6109:
---

{code}
+  // A number of lock we want to easily support. It's not a maximum.
{code}
'A number' - 'The number'
{code}
+  // We need an atomic counter to manage the number of people using the lock 
and free it when
+  //  it's equals to zero.
{code}
'number of people' - 'number of users'
'it's equals to zero.' - 'it's equal to zero.'
{code}
+  static class RegionLockK extends ReentrantLock {
{code}
The outer class is generic. The inner class shouldn't mention Region.

 Improve RIT performances during assignment on large clusters
 

 Key: HBASE-6109
 URL: https://issues.apache.org/jira/browse/HBASE-6109
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6109.v19.patch, 6109.v7.patch


 The main points in this patch are:
  - lowering the number of copy of the RIT list
  - lowering the number of synchronization
  - synchronizing on a region rather than on everything
 It also contains:
  - some fixes around the RIT notification: the list was sometimes modified 
 without a corresponding 'notify'.
  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6049) Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject()

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6049:
--

Fix Version/s: 0.94.1
   0.96.0
 Hadoop Flags: Reviewed

@Stack:
Do you think patch v3 can be integrated ?

 Serializing List containing null elements will cause NullPointerException 
 in HbaseObjectWritable.writeObject()
 

 Key: HBASE-6049
 URL: https://issues.apache.org/jira/browse/HBASE-6049
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.94.0
Reporter: Maryann Xue
 Fix For: 0.96.0, 0.94.1

 Attachments: 6049-v3.patch, HBASE-6049-v2.patch, HBASE-6049-v3.patch, 
 HBASE-6049.patch


 An error case could be in Coprocessor AggregationClient, the median() 
 function handles an empty region and returns a List Object with the first 
 element as a Null value. NPE occurs in the RPC response stage and the 
 response never gets sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6109) Improve RIT performances during assignment on large clusters

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285840#comment-13285840
 ] 

Zhihong Yu commented on HBASE-6109:
---

@N:
Thanks for the quick turn-around.

 Improve RIT performances during assignment on large clusters
 

 Key: HBASE-6109
 URL: https://issues.apache.org/jira/browse/HBASE-6109
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6109.v19.patch, 6109.v21.patch, 6109.v7.patch


 The main points in this patch are:
  - lowering the number of copy of the RIT list
  - lowering the number of synchronization
  - synchronizing on a region rather than on everything
 It also contains:
  - some fixes around the RIT notification: the list was sometimes modified 
 without a corresponding 'notify'.
  - some tests flakiness correction, actually unrelated to this patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285864#comment-13285864
 ] 

Zhihong Yu commented on HBASE-5892:
---

Minor comment:
{code}
+ListWorkItemRegion work = new 
ArrayListWorkItemRegion(regionServerList.size());
{code}
variable work should be named works.
{code}
+if(hdfsEntry.hri != null) {
{code}
Insert space between if and (.


 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
Assignee: Andrew Wang
  Labels: noob
 Attachments: hbase-5892-1.patch, hbase-5892-2-0.90.patch, 
 hbase-5892-2.patch, hbase-5892-3.patch, hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6049) Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject()

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-6049:
-

Assignee: Maryann Xue

 Serializing List containing null elements will cause NullPointerException 
 in HbaseObjectWritable.writeObject()
 

 Key: HBASE-6049
 URL: https://issues.apache.org/jira/browse/HBASE-6049
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
 Fix For: 0.96.0, 0.94.1

 Attachments: 6049-v3.patch, HBASE-6049-v2.patch, HBASE-6049-v3.patch, 
 HBASE-6049.patch


 An error case could be in Coprocessor AggregationClient, the median() 
 function handles an empty region and returns a List Object with the first 
 element as a Null value. NPE occurs in the RPC response stage and the 
 response never gets sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6134) Improvement for split-worker to speed up distributed-split-log

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285891#comment-13285891
 ] 

Zhihong Yu commented on HBASE-6134:
---

Test failed with NPE at:
{code}
for (FileStatus fileStatus : listStatus1) {
{code}
Checking for null wouldn't help either:
{code}
Failed tests:   
testSequentialEditLogSeqNum(org.apache.hadoop.hbase.regionserver.wal.TestWALReplay):
 The sequence number of the recoverd.edits and the current edit seq should be 
same expected:17 but was:0
{code}

 Improvement for split-worker to speed up distributed-split-log
 --

 Key: HBASE-6134
 URL: https://issues.apache.org/jira/browse/HBASE-6134
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6134.patch


 First,we do the test between local-master-split and distributed split log
 Environment:34 hlog files, 5 regionservers,(after kill one, only 4 rs do ths 
 splitting work)
 local-master-split:60s+
 distributed-split-log:165s+
 In fact, in our production environment, distributed-split-log also took 60s 
 with 30 regionservers for 34 hlog files (regionserver may be in high load)
 We found split-worker split one log file took about 20s.
 I think we should do the improvement for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5959) Add other load balancers

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285961#comment-13285961
 ] 

Zhihong Yu commented on HBASE-5959:
---

From https://builds.apache.org/job/PreCommit-HBASE-Build/2060/console :
{code}
/home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests 
-DHBasePatchProcess  
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/patchprocess/trunkJavacWarnings.txt
 21
Trunk compilation is broken?
{code}

 Add other load balancers
 

 Key: HBASE-5959
 URL: https://issues.apache.org/jira/browse/HBASE-5959
 Project: HBase
  Issue Type: New Feature
  Components: master
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-5959-0.patch, HBASE-5959-1.patch, 
 HBASE-5959-11.patch, HBASE-5959-12.patch, HBASE-5959-13.patch, 
 HBASE-5959-2.patch, HBASE-5959-3.patch, HBASE-5959-6.patch, 
 HBASE-5959-7.patch, HBASE-5959-8.patch, HBASE-5959-9.patch, 
 HBASE-5959.D3189.1.patch, HBASE-5959.D3189.2.patch, HBASE-5959.D3189.3.patch, 
 HBASE-5959.D3189.4.patch, HBASE-5959.D3189.5.patch, HBASE-5959.D3189.6.patch, 
 HBASE-5959.D3189.7.patch


 Now that balancers are pluggable we should give some options.b

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285975#comment-13285975
 ] 

Zhihong Yu commented on HBASE-5936:
---

I tried one of the tests listed above and got:
{code}
testExceptionFromCoprocessorWhenCreatingTable(org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithRemove)
  Time elapsed: 0.247 sec   FAILURE!
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at 
org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithRemove.testExceptionFromCoprocessorWhenCreatingTable(TestMasterCoprocessorExceptionWithRemove.java:191)
{code}

 Add Column-level PB-based calls to HMasterInterface
 ---

 Key: HBASE-5936
 URL: https://issues.apache.org/jira/browse/HBASE-5936
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: HBASE-5936.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 This is for converting the column-level calls, i.e.:
 addColumn
 deleteColumn
 modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5959) Add other load balancers

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286012#comment-13286012
 ] 

Zhihong Yu commented on HBASE-5959:
---

I got the following when running against patch v14:
{code}
testContendedLogRolling(org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster)
  Time elapsed: 0.201 sec   FAILURE!
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertFalse(Assert.java:68)
  at org.junit.Assert.assertFalse(Assert.java:79)
  at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster.testContendedLogRolling(TestLogRollingNoCluster.java:78)
{code}

 Add other load balancers
 

 Key: HBASE-5959
 URL: https://issues.apache.org/jira/browse/HBASE-5959
 Project: HBase
  Issue Type: New Feature
  Components: master
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-5959-0.patch, HBASE-5959-1.patch, 
 HBASE-5959-11.patch, HBASE-5959-12.patch, HBASE-5959-13.patch, 
 HBASE-5959-14.patch, HBASE-5959-2.patch, HBASE-5959-3.patch, 
 HBASE-5959-6.patch, HBASE-5959-7.patch, HBASE-5959-8.patch, 
 HBASE-5959-9.patch, HBASE-5959.D3189.1.patch, HBASE-5959.D3189.2.patch, 
 HBASE-5959.D3189.3.patch, HBASE-5959.D3189.4.patch, HBASE-5959.D3189.5.patch, 
 HBASE-5959.D3189.6.patch, HBASE-5959.D3189.7.patch


 Now that balancers are pluggable we should give some options.b

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5959) Add other load balancers

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286023#comment-13286023
 ] 

Zhihong Yu commented on HBASE-5959:
---

The test failure was due to HBASE-5936.
I updated workspace and it passed.

Let's see what Hadoop QA tells us:
https://builds.apache.org/job/PreCommit-HBASE-Build/2062/console

 Add other load balancers
 

 Key: HBASE-5959
 URL: https://issues.apache.org/jira/browse/HBASE-5959
 Project: HBase
  Issue Type: New Feature
  Components: master
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-5959-0.patch, HBASE-5959-1.patch, 
 HBASE-5959-11.patch, HBASE-5959-12.patch, HBASE-5959-13.patch, 
 HBASE-5959-14.patch, HBASE-5959-2.patch, HBASE-5959-3.patch, 
 HBASE-5959-6.patch, HBASE-5959-7.patch, HBASE-5959-8.patch, 
 HBASE-5959-9.patch, HBASE-5959.D3189.1.patch, HBASE-5959.D3189.2.patch, 
 HBASE-5959.D3189.3.patch, HBASE-5959.D3189.4.patch, HBASE-5959.D3189.5.patch, 
 HBASE-5959.D3189.6.patch, HBASE-5959.D3189.7.patch


 Now that balancers are pluggable we should give some options.b

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6087) Add hbase-common module

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286037#comment-13286037
 ] 

Zhihong Yu commented on HBASE-6087:
---

In AssignmentManager.java:
{code}
+  public static final ServerName HBCK_CODE_SERVERNAME = new 
ServerName(HConstants.HBCK_CODE_NAME,
+  -1, -1L);
{code}
Why cannot we use HConstants.HBCK_CODE_SERVERNAME ?

For the files in hbase-common module, it would be nice to remove year in their 
license header.

 Add hbase-common module
 ---

 Key: HBASE-6087
 URL: https://issues.apache.org/jira/browse/HBASE-6087
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: hbase-6087-v0.patch, hbase-6087-v1.patch


 Add an hbase-common module so common/utility classes can be pulled up out of 
 hbase-server. This is _not_ the moving of classes, just the general project 
 setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6059:
--

Attachment: 6059v7.txt

Patch v7 is rebased on trunk.

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 6059v6.txt, 6059v7.txt, HBASE-6059-testcase.patch, 
 HBASE-6059.patch, HBASE-6059v2.patch, HBASE-6059v3.patch, HBASE-6059v4.patch, 
 HBASE-6059v5.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6059:
--

Fix Version/s: 0.96.0

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 6059v6.txt, 6059v7.txt, HBASE-6059-testcase.patch, 
 HBASE-6059.patch, HBASE-6059v2.patch, HBASE-6059v3.patch, HBASE-6059v4.patch, 
 HBASE-6059v5.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6119) Region server logs its own address at the end of getMaster()

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286105#comment-13286105
 ] 

Zhihong Yu commented on HBASE-6119:
---

Integrated to trunk.

Thanks for the review.

 Region server logs its own address at the end of getMaster()
 

 Key: HBASE-6119
 URL: https://issues.apache.org/jira/browse/HBASE-6119
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6119-trunk.txt


 I saw the following in region server log where a.ebay.com is region server 
 itself:
 {code}
 2012-05-28 08:56:35,315 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 a.ebay.com/10.115.13.20:60020
 {code}
 We should be logging the address of master

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6059:
--

Attachment: (was: 6059v7.txt)

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 6059v6.txt, 6059v7.txt, 6059v7.txt, 
 HBASE-6059-testcase.patch, HBASE-6059.patch, HBASE-6059v2.patch, 
 HBASE-6059v3.patch, HBASE-6059v4.patch, HBASE-6059v5.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6059:
--

Attachment: 6059v7.txt

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 6059v6.txt, 6059v7.txt, 6059v7.txt, 
 HBASE-6059-testcase.patch, HBASE-6059.patch, HBASE-6059v2.patch, 
 HBASE-6059v3.patch, HBASE-6059v4.patch, HBASE-6059v5.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6087) Add hbase-common module

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286257#comment-13286257
 ] 

Zhihong Yu commented on HBASE-6087:
---

Trunk build #2960 failed:
{code}
 testHomePath(org.apache.hadoop.hbase.util.TestProcessBasedCluster): Expected 
to find slash in jar path 
https://builds.apache.org/job/HBase-TRUNK/ws/trunk/hbase-common/target/hbase-common-0.95-SNAPSHOT.jar!
{code}

 Add hbase-common module
 ---

 Key: HBASE-6087
 URL: https://issues.apache.org/jira/browse/HBASE-6087
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-6087-v0.patch, hbase-6087-v1.patch, 
 hbase-6087-v2.patch


 Add an hbase-common module so common/utility classes can be pulled up out of 
 hbase-server. This is _not_ the moving of classes, just the general project 
 setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6087) Add hbase-common module

2012-05-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286280#comment-13286280
 ] 

Zhihong Yu commented on HBASE-6087:
---

I was able to run TestBytes successfully under hbase-common module.
I ran TestSplitLogManager which failed due to:
{code}
2012-05-30 19:55:07,321 WARN  [Thread-15] master.TestSplitLogManager$4(526): 
splitLogDistributed failed
java.io.IOException: error or interrupt while splitting logs in 
[file:/home/hduser/trunk/hbase-server/ca60bcb1-ab93-4b5e-8a0c-9fa7bc2b3fab] 
Task = installed = 1 done = 0 error = 0
  at 
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:267)
  at 
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:227)
  at 
org.apache.hadoop.hbase.master.TestSplitLogManager$4.run(TestSplitLogManager.java:524)
{code}

 Add hbase-common module
 ---

 Key: HBASE-6087
 URL: https://issues.apache.org/jira/browse/HBASE-6087
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-6087-addendum.patch, hbase-6087-v0.patch, 
 hbase-6087-v1.patch, hbase-6087-v2.patch


 Add an hbase-common module so common/utility classes can be pulled up out of 
 hbase-server. This is _not_ the moving of classes, just the general project 
 setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284817#comment-13284817
 ] 

Zhihong Yu commented on HBASE-6088:
---

Minor comment:
{code}
+   * This test case to test the znode is deleted(if created) or not in roll 
back.
{code}
'case to test' - 'case is to test'

  Region splitting not happened for long time due to ZK exception while 
 creating RS_ZK_SPLITTING node
 

 Key: HBASE-6088
 URL: https://issues.apache.org/jira/browse/HBASE-6088
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Gopinathan A
Assignee: rajeshbabu
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-6088_92.patch, HBASE-6088_94.patch, 
 HBASE-6088_94_2.patch, HBASE-6088_94_3.patch, HBASE-6088_trunk.patch, 
 HBASE-6088_trunk_2.patch, HBASE-6088_trunk_3.patch, HBASE-6088_trunk_4.patch


 Region splitting not happened for long time due to ZK exception while 
 creating RS_ZK_SPLITTING node
 {noformat}
 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session 
 timed out, have not heard from server in 26668ms for sessionid 
 0x1377a75f41d0012, closing socket connection and attempting reconnect
 2012-05-24 01:45:41,464 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
 ZooKeeper exception: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
 {noformat}
 {noformat}
 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
 cleanupCurrentWriter  waiting for transactions to get synced  total 189377 
 synced till here 189365
 2012-05-24 01:45:48,474 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
 setting SPLITTING znode on 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
 java.io.IOException: Failed setting SPLITTING znode on 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
   at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
 KeeperErrorCode = BadVersion for 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
   ... 5 more
 2012-05-24 01:45:48,476 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of 
 failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
 {noformat}
 {noformat}
 2012-05-24 01:47:28,141 ERROR 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is 
 not a retry
 2012-05-24 01:47:28,142 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
 create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
 java.io.IOException: Failed create of ephemeral 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
   at 
 

[jira] [Commented] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284895#comment-13284895
 ] 

Zhihong Yu commented on HBASE-6089:
---

Pstch for trunk looks good.

 SSH and AM.joinCluster causes Concurrent Modification exception.
 

 Key: HBASE-6089
 URL: https://issues.apache.org/jira/browse/HBASE-6089
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6089_92.patch, HBASE-6089_94.patch, 
 HBASE-6089_trunk.patch


 AM.regions map is parallely accessed in SSH and Master initialization leading 
 to ConcurrentModificationException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284904#comment-13284904
 ] 

Zhihong Yu commented on HBASE-5974:
---

Should CallSequenceOutOfOrderException be extending DoNotRetryIOException ?
That way you don't need to create a new exception below:
{code}
+} else if (ioe instanceof CallSequenceOutOfOrderException) {
...
+  throw new DoNotRetryIOException(Reset scanner, ioe);
{code}
I think users haven't experienced this bug.
In solving the bug, some kludge is introduced.
We should think twice before integration.

 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5974_0.94.patch, HBASE-5974_94-V2.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6123) dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead of hadoop 0.23

2012-05-29 Thread Zhihong Yu (JIRA)
Zhihong Yu created HBASE-6123:
-

 Summary: dev-support/test-patch.sh should compile against hadoop 
2.0.0-alpha instead of hadoop 0.23
 Key: HBASE-6123
 URL: https://issues.apache.org/jira/browse/HBASE-6123
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu


test-patch.sh currently does this:
{code}
  $MVN clean test -DskipTests -Dhadoop.profile=23 -D${PROJECT_NAME}PatchProcess 
 $PATCH_DIR/trunk23JavacWarnings.txt 21
{code}
we should compile against hadoop 2.0.0-alpha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284895#comment-13284895
 ] 

Zhihong Yu edited comment on HBASE-6089 at 5/29/12 6:03 PM:


Patch for trunk looks good.

  was (Author: zhi...@ebaysf.com):
Pstch for trunk looks good.
  
 SSH and AM.joinCluster causes Concurrent Modification exception.
 

 Key: HBASE-6089
 URL: https://issues.apache.org/jira/browse/HBASE-6089
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6089_92.patch, HBASE-6089_94.patch, 
 HBASE-6089_trunk.patch


 AM.regions map is parallely accessed in SSH and Master initialization leading 
 to ConcurrentModificationException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6124) Backport HBASE-6033 to 0.90, 0.92 and 0.94

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285008#comment-13285008
 ] 

Zhihong Yu commented on HBASE-6124:
---

In patch for 0.92:
{code}
+  return CompactionState.valueOf(
+rs.getCompactionState(pair.getFirst().getRegionName()));
{code}
what if user only updates jar on client side and the cluster doesn't support 
getCompactionState() ?

 Backport HBASE-6033 to 0.90, 0.92 and 0.94
 --

 Key: HBASE-6124
 URL: https://issues.apache.org/jira/browse/HBASE-6124
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.90.7, 0.92.1, 0.94.1

 Attachments: patch-0.90.txt, patch-0.92.txt, patch-0.94.txt


 HBASE-6033 is pushed into 0.96. It's better to have it for previous version 
 too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6123) dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead of hadoop 0.23

2012-05-29 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6123:
--

Attachment: 6123.txt

Patch performs compilation against hadoop 2.0

 dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead 
 of hadoop 0.23
 --

 Key: HBASE-6123
 URL: https://issues.apache.org/jira/browse/HBASE-6123
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu
 Attachments: 6123.txt


 test-patch.sh currently does this:
 {code}
   $MVN clean test -DskipTests -Dhadoop.profile=23 
 -D${PROJECT_NAME}PatchProcess  $PATCH_DIR/trunk23JavacWarnings.txt 21
 {code}
 we should compile against hadoop 2.0.0-alpha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6123) dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead of hadoop 0.23

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285078#comment-13285078
 ] 

Zhihong Yu commented on HBASE-6123:
---

Will integrate the patch if there is no objection.

 dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead 
 of hadoop 0.23
 --

 Key: HBASE-6123
 URL: https://issues.apache.org/jira/browse/HBASE-6123
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu
 Attachments: 6123.txt


 test-patch.sh currently does this:
 {code}
   $MVN clean test -DskipTests -Dhadoop.profile=23 
 -D${PROJECT_NAME}PatchProcess  $PATCH_DIR/trunk23JavacWarnings.txt 21
 {code}
 we should compile against hadoop 2.0.0-alpha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6123) dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead of hadoop 0.23

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285085#comment-13285085
 ] 

Zhihong Yu commented on HBASE-6123:
---

Thanks for the suggestion, Jesse.

Patch integrated to trunk.

 dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead 
 of hadoop 0.23
 --

 Key: HBASE-6123
 URL: https://issues.apache.org/jira/browse/HBASE-6123
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu
 Attachments: 6123.txt


 test-patch.sh currently does this:
 {code}
   $MVN clean test -DskipTests -Dhadoop.profile=23 
 -D${PROJECT_NAME}PatchProcess  $PATCH_DIR/trunk23JavacWarnings.txt 21
 {code}
 we should compile against hadoop 2.0.0-alpha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6123) dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead of hadoop 0.23

2012-05-29 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-6123:
-

Assignee: Zhihong Yu

 dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead 
 of hadoop 0.23
 --

 Key: HBASE-6123
 URL: https://issues.apache.org/jira/browse/HBASE-6123
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Attachments: 6123.txt


 test-patch.sh currently does this:
 {code}
   $MVN clean test -DskipTests -Dhadoop.profile=23 
 -D${PROJECT_NAME}PatchProcess  $PATCH_DIR/trunk23JavacWarnings.txt 21
 {code}
 we should compile against hadoop 2.0.0-alpha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6107) Distributed log splitting hangs even there is no task under /hbase/splitlog

2012-05-29 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6107:
--

Attachment: 6107_v3-new.patch

Re-attaching patch v3.

 Distributed log splitting hangs even there is no task under /hbase/splitlog
 ---

 Key: HBASE-6107
 URL: https://issues.apache.org/jira/browse/HBASE-6107
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.96.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: 6107_v3-new.patch, hbase-6107.patch, 
 hbase-6107_v3-new.patch, hbase_6107_v2.patch, hbase_6107_v3.patch


 Sometimes, master web UI shows the distributed log splitting is going on, 
 waiting for one last task to be done.  However, in ZK, there is no task under 
 /hbase/splitlog at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6123) dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead of hadoop 0.23

2012-05-29 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6123:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: In Progress)

 dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead 
 of hadoop 0.23
 --

 Key: HBASE-6123
 URL: https://issues.apache.org/jira/browse/HBASE-6123
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Attachments: 6123.txt


 test-patch.sh currently does this:
 {code}
   $MVN clean test -DskipTests -Dhadoop.profile=23 
 -D${PROJECT_NAME}PatchProcess  $PATCH_DIR/trunk23JavacWarnings.txt 21
 {code}
 we should compile against hadoop 2.0.0-alpha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6123) dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead of hadoop 0.23

2012-05-29 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6123:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Hadoop QA got pass compilation against hadoop 2.0:
https://builds.apache.org/job/PreCommit-HBASE-Build/2032/console

 dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead 
 of hadoop 0.23
 --

 Key: HBASE-6123
 URL: https://issues.apache.org/jira/browse/HBASE-6123
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Attachments: 6123.txt


 test-patch.sh currently does this:
 {code}
   $MVN clean test -DskipTests -Dhadoop.profile=23 
 -D${PROJECT_NAME}PatchProcess  $PATCH_DIR/trunk23JavacWarnings.txt 21
 {code}
 we should compile against hadoop 2.0.0-alpha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Work started] (HBASE-6123) dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead of hadoop 0.23

2012-05-29 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-6123 started by Zhihong Yu.

 dev-support/test-patch.sh should compile against hadoop 2.0.0-alpha instead 
 of hadoop 0.23
 --

 Key: HBASE-6123
 URL: https://issues.apache.org/jira/browse/HBASE-6123
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Attachments: 6123.txt


 test-patch.sh currently does this:
 {code}
   $MVN clean test -DskipTests -Dhadoop.profile=23 
 -D${PROJECT_NAME}PatchProcess  $PATCH_DIR/trunk23JavacWarnings.txt 21
 {code}
 we should compile against hadoop 2.0.0-alpha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6127) TestAtomicOperation#testMultiRowMutationMultiThreads occasionally fails

2012-05-29 Thread Zhihong Yu (JIRA)
Zhihong Yu created HBASE-6127:
-

 Summary: TestAtomicOperation#testMultiRowMutationMultiThreads 
occasionally fails
 Key: HBASE-6127
 URL: https://issues.apache.org/jira/browse/HBASE-6127
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu


TestAtomicOperation occasionally fails.

Here is one instance:
https://builds.apache.org/job/HBase-TRUNK/2944/testReport/junit/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6127) TestAtomicOperation#testMultiRowMutationMultiThreads occasionally fails

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285124#comment-13285124
 ] 

Zhihong Yu commented on HBASE-6127:
---

This was another instance:
https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2938/testReport/junit/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/

 TestAtomicOperation#testMultiRowMutationMultiThreads occasionally fails
 ---

 Key: HBASE-6127
 URL: https://issues.apache.org/jira/browse/HBASE-6127
 Project: HBase
  Issue Type: Bug
Reporter: Zhihong Yu

 TestAtomicOperation occasionally fails.
 Here is one instance:
 https://builds.apache.org/job/HBase-TRUNK/2944/testReport/junit/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5959) Add other load balancers

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285302#comment-13285302
 ] 

Zhihong Yu commented on HBASE-5959:
---

I was trying to see the difference between Diff 10 and Diff 11.
Due to rebasing, the diff is not easy to read.

 Add other load balancers
 

 Key: HBASE-5959
 URL: https://issues.apache.org/jira/browse/HBASE-5959
 Project: HBase
  Issue Type: New Feature
  Components: master
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-5959-0.patch, HBASE-5959-1.patch, 
 HBASE-5959-11.patch, HBASE-5959-12.patch, HBASE-5959-2.patch, 
 HBASE-5959-3.patch, HBASE-5959-6.patch, HBASE-5959-7.patch, 
 HBASE-5959-8.patch, HBASE-5959-9.patch, HBASE-5959.D3189.1.patch, 
 HBASE-5959.D3189.2.patch, HBASE-5959.D3189.3.patch, HBASE-5959.D3189.4.patch, 
 HBASE-5959.D3189.5.patch, HBASE-5959.D3189.6.patch, HBASE-5959.D3189.7.patch


 Now that balancers are pluggable we should give some options.b

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285303#comment-13285303
 ] 

Zhihong Yu commented on HBASE-5892:
---

The practice is to attach trunk patch separately after attaching the other 
patches.

 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
Assignee: Andrew Wang
  Labels: noob
 Attachments: hbase-5892-1.patch, hbase-5892-2-0.90.patch, 
 hbase-5892-2.patch, hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285349#comment-13285349
 ] 

Zhihong Yu commented on HBASE-5892:
---

@Andrew:
The path for source file should have hbase-server/ prefix. This is due to 
modularization change.

 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
Assignee: Andrew Wang
  Labels: noob
 Attachments: hbase-5892-1.patch, hbase-5892-2-0.90.patch, 
 hbase-5892-2.patch, hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6114) CacheControl flags should be tunable per table schema per CF

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285350#comment-13285350
 ] 

Zhihong Yu commented on HBASE-6114:
---

{code}
+ * Copyright 2011 The Apache Software Foundation
{code}
Year is not needed.
Since TestCacheOnWrite is in io.hfile package, can TestCacheOnWriteInSchema be 
placed in the same package ?
That way getOnDiskSizeWithHeader() can be made package private.

Please submit patch for Hadoop QA.

 CacheControl flags should be tunable per table schema per CF
 

 Key: HBASE-6114
 URL: https://issues.apache.org/jira/browse/HBASE-6114
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: 6114-0.92-v2.patch, 6114-0.92.patch, 6114-0.94-v2.patch, 
 6114-0.94.patch, 6114-trunk-v2.patch, 6114-trunk.patch


 CacheControl flags should be tunable per table schema per CF, especially
 cacheDataOnWrite, also cacheIndexesOnWrite and cacheBloomsOnWrite.
 It looks like Store uses CacheConfig(Configuration conf, HColumnDescriptor 
 family) to construct the CacheConfig, so it's a simple change there to 
 override configuration properties with values of table schema attributes if 
 present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6114) CacheControl flags should be tunable per table schema per CF

2012-05-29 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285359#comment-13285359
 ] 

Zhihong Yu commented on HBASE-6114:
---

getOnDiskSizeWithoutHeader() is public. I am fine with making another method 
public.

Pass from Hadoop QA is desirable.

 CacheControl flags should be tunable per table schema per CF
 

 Key: HBASE-6114
 URL: https://issues.apache.org/jira/browse/HBASE-6114
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: 6114-0.92-v2.patch, 6114-0.92.patch, 6114-0.94-v2.patch, 
 6114-0.94.patch, 6114-trunk-v2.patch, 6114-trunk.patch


 CacheControl flags should be tunable per table schema per CF, especially
 cacheDataOnWrite, also cacheIndexesOnWrite and cacheBloomsOnWrite.
 It looks like Store uses CacheConfig(Configuration conf, HColumnDescriptor 
 family) to construct the CacheConfig, so it's a simple change there to 
 override configuration properties with values of table schema attributes if 
 present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-28 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284416#comment-13284416
 ] 

Zhihong Yu commented on HBASE-6059:
---

I would listen to opinion from people who are more familiar with store files 
about the current solution.

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 6059v6.txt, HBASE-6059-testcase.patch, HBASE-6059.patch, 
 HBASE-6059v2.patch, HBASE-6059v3.patch, HBASE-6059v4.patch, HBASE-6059v5.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6118) Add a testcase for HBASE-6065

2012-05-28 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284440#comment-13284440
 ] 

Zhihong Yu commented on HBASE-6118:
---

I applied the patch manually to trunk and the new test passed.
Will attach patch for trunk soon.

I renamed the test case.

 Add a testcase for HBASE-6065
 -

 Key: HBASE-6118
 URL: https://issues.apache.org/jira/browse/HBASE-6118
 Project: HBase
  Issue Type: Test
Reporter: ramkrishna.s.vasudevan
Assignee: Ashutosh Jindal
 Attachments: 6118-trunk.txt, HBASE-6118_0.94.patch


 It would be nice to have a testcase for HBASE-6065.  Internally we have 
 written a test case to simulate the problem.  Thought that it would be better 
 to contribute the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   6   7   >