date:20130107

[jira] [Commented] (HBASE-7403) Online Merge

2013-01-07 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545706#comment-13545706
 ] 

chunhui shen commented on HBASE-7403:
-

bq.When can we get this below situation?
{code}
if ((regionInfoA == null || regionInfoB == null)) {
{code}
After deleting the table or redo merging from the state COMPLETE_MERGING, this 
situation is OK.

bq.Should we not check this even before the merge has started
We receive the merge request in a zookeeper event handler, and I don't want to 
check the above situation in that handler because that handler is synchronized

bq.Y should not we resubmit here?
It is a new merge request, but we can't get the regioninfo of merging regions, 
it seems user send a wrong request.

bq.Do we need to wait for the main region to be assigned
It's no necessary

bq.In EXECUTE_MERGING do we need to handle any failure saying it must move to 
CANCEL_MERGING.
Once we entered the state EXECUTE_MERGING, we couldn't CANCEL_MERGING, we only 
ensure that we could redo it until successful after abort or failture

bq.will it be reassigned by SSH?
It won't happen, the merging regions won't be assigned to anywhere, see 
SSH#processDeadRegion and RegionMergeManager#mergingRegions

bq.I think the Merger thread should read the znode and try to restart the merge 
right?
Yes, bingo! We record the merge state on ZK, so we can be easy to redo it



 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv1.patch, 
 hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 We need merge in the following cases：
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we could redo 
 it until successful if it throws exception or abort or server restart, but 
 couldn’t be rolled back. 
 It depends on
 Use zookeeper to record the transaction journal state, make redo easier
 Use zookeeper to send/receive merge request
 Merge transaction is executed on the master
 Support calling merge request through API or shell tool
 About the merge process, please see the attachment and patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7403) Online Merge

2013-01-07 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7403:


Attachment: hbase-7403-trunkv7.patch

Uploading patchv7
Also in review board:
https://reviews.apache.org/r/8716/

 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv1.patch, 
 hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, 
 merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 We need merge in the following cases：
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we could redo 
 it until successful if it throws exception or abort or server restart, but 
 couldn’t be rolled back. 
 It depends on
 Use zookeeper to record the transaction journal state, make redo easier
 Use zookeeper to send/receive merge request
 Merge transaction is executed on the master
 Support calling merge request through API or shell tool
 About the merge process, please see the attachment and patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7495) parallel scanner seek in StoreScanner's constructor

2013-01-07 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545713#comment-13545713
 ] 

chunhui shen commented on HBASE-7495:
-

[~xieliang007]
We have do the parallel scanner seek testing long ago, but seems no improvement.

Waiting for your new result.

 parallel scanner seek in StoreScanner's constructor
 ---

 Key: HBASE-7495
 URL: https://issues.apache.org/jira/browse/HBASE-7495
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.94.3, 0.96.0
Reporter: liang xie
Assignee: liang xie

 seems there's a potential improvable space before doing scanner.next:
 {code:title=StoreScanner.java|borderStyle=solid}
 if (explicitColumnQuery  lazySeekEnabledGlobally) {
   for (KeyValueScanner scanner : scanners) {
 scanner.requestSeek(matcher.getStartKey(), false, true);
   }
 } else {
   for (KeyValueScanner scanner : scanners) {
 scanner.seek(matcher.getStartKey());
   }
 }
 {code} 
 we can do scanner.requestSeek or scanner.seek in parallel, instead of current 
 serialization, to reduce latency for special case.
 Any ideas on it ?  I'll have a try if the comments/suggestions are positive：）

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6031) RegionServer does not go down while aborting

2013-01-07 Thread liang xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545720#comment-13545720
 ] 

liang xie commented on HBASE-6031:
--

we hit this issue today as well:)

Attached rs_shutdown_hung20130107.jstack was a thread dump during hung ocurred

I've filed HADOOP-9181 to resolve it

 RegionServer does not go down while aborting
 

 Key: HBASE-6031
 URL: https://issues.apache.org/jira/browse/HBASE-6031
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: rs_shutdown_hung20130107.jstack, rsthread.txt


 Following is the thread dump.
 {code}
 1997531088@qtp-716941846-5 prio=10 tid=0x7f7c5820c800 nid=0xe1b in 
 Object.wait() [0x7f7c56ae8000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:279)
   - locked 0x7f7cfe0616d0 (a 
 org.mortbay.jetty.nio.SelectChannelConnector$ConnectorEndPoint)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:545)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:639)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
   at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:109)
   - locked 0x7f7cfe74d758 (a 
 org.mortbay.util.ByteArrayOutputStream2)
   at 
 org.mortbay.jetty.AbstractGenerator$OutputWriter.write(AbstractGenerator.java:904)
   at java.io.Writer.write(Writer.java:96)
   - locked 0x7f7cfca02fc0 (a 
 org.mortbay.jetty.HttpConnection$OutputWriter)
   at java.io.PrintWriter.write(PrintWriter.java:361)
   - locked 0x7f7cfca02fc0 (a 
 org.mortbay.jetty.HttpConnection$OutputWriter)
   at org.jamon.escaping.HtmlEscaping.write(HtmlEscaping.java:43)
   at 
 org.jamon.escaping.AbstractCharacterEscaping.write(AbstractCharacterEscaping.java:35)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:222)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:180)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:171)
   at 
 org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:932)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 1374615312@qtp-716941846-3 prio=10 tid=0x7f7c58214800 nid=0xc42 in 
 Object.wait() [0x7f7c55bd9000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:279)

[jira] [Updated] (HBASE-6031) RegionServer does not go down while aborting

2013-01-07 Thread liang xie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liang xie updated HBASE-6031:
-

Attachment: rs_shutdown_hung20130107.jstack

 RegionServer does not go down while aborting
 

 Key: HBASE-6031
 URL: https://issues.apache.org/jira/browse/HBASE-6031
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: rs_shutdown_hung20130107.jstack, rsthread.txt


 Following is the thread dump.
 {code}
 1997531088@qtp-716941846-5 prio=10 tid=0x7f7c5820c800 nid=0xe1b in 
 Object.wait() [0x7f7c56ae8000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:279)
   - locked 0x7f7cfe0616d0 (a 
 org.mortbay.jetty.nio.SelectChannelConnector$ConnectorEndPoint)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:545)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:639)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
   at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:109)
   - locked 0x7f7cfe74d758 (a 
 org.mortbay.util.ByteArrayOutputStream2)
   at 
 org.mortbay.jetty.AbstractGenerator$OutputWriter.write(AbstractGenerator.java:904)
   at java.io.Writer.write(Writer.java:96)
   - locked 0x7f7cfca02fc0 (a 
 org.mortbay.jetty.HttpConnection$OutputWriter)
   at java.io.PrintWriter.write(PrintWriter.java:361)
   - locked 0x7f7cfca02fc0 (a 
 org.mortbay.jetty.HttpConnection$OutputWriter)
   at org.jamon.escaping.HtmlEscaping.write(HtmlEscaping.java:43)
   at 
 org.jamon.escaping.AbstractCharacterEscaping.write(AbstractCharacterEscaping.java:35)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:222)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:180)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:171)
   at 
 org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:932)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 1374615312@qtp-716941846-3 prio=10 tid=0x7f7c58214800 nid=0xc42 in 
 Object.wait() [0x7f7c55bd9000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:279)
   - locked 0x7f7cfdbb6cc8 (a 
 org.mortbay.jetty.nio.SelectChannelConnector$ConnectorEndPoint)
   at

[jira] [Commented] (HBASE-6031) RegionServer does not go down while aborting

2013-01-07 Thread liang xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545724#comment-13545724
 ] 

liang xie commented on HBASE-6031:
--

Due to qtp-* was not a daemon thread, our hbase shutdownhook was not be 
triggered

 RegionServer does not go down while aborting
 

 Key: HBASE-6031
 URL: https://issues.apache.org/jira/browse/HBASE-6031
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: rs_shutdown_hung20130107.jstack, rsthread.txt


 Following is the thread dump.
 {code}
 1997531088@qtp-716941846-5 prio=10 tid=0x7f7c5820c800 nid=0xe1b in 
 Object.wait() [0x7f7c56ae8000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:279)
   - locked 0x7f7cfe0616d0 (a 
 org.mortbay.jetty.nio.SelectChannelConnector$ConnectorEndPoint)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:545)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:639)
   at 
 org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
   at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:109)
   - locked 0x7f7cfe74d758 (a 
 org.mortbay.util.ByteArrayOutputStream2)
   at 
 org.mortbay.jetty.AbstractGenerator$OutputWriter.write(AbstractGenerator.java:904)
   at java.io.Writer.write(Writer.java:96)
   - locked 0x7f7cfca02fc0 (a 
 org.mortbay.jetty.HttpConnection$OutputWriter)
   at java.io.PrintWriter.write(PrintWriter.java:361)
   - locked 0x7f7cfca02fc0 (a 
 org.mortbay.jetty.HttpConnection$OutputWriter)
   at org.jamon.escaping.HtmlEscaping.write(HtmlEscaping.java:43)
   at 
 org.jamon.escaping.AbstractCharacterEscaping.write(AbstractCharacterEscaping.java:35)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:222)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:180)
   at 
 org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:171)
   at 
 org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:932)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 1374615312@qtp-716941846-3 prio=10 tid=0x7f7c58214800 nid=0xc42 in 
 Object.wait() [0x7f7c55bd9000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:279)
   - locked 0x7f7cfdbb6cc8 (a

[jira] [Created] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)

chunhui shen created HBASE-7504:
---

 Summary: -ROOT- may be offline forever after FullGC of  RS
 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen


1.FullGC happen on ROOT regionserver.
2.ZK session timeout, master expire the regionserver and submit to 
ServerShutdownHandler
3.Regionserver complete the FullGC
4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns true
5.ServerShutdownHandler skip assigning -ROOT- region
6.Regionserver abort itself because it reveive YouAreDeadException after a 
regionserver report
7.-ROO- is offline now, and won't be assigned any more unless we restart master



Master Log:
{code}
2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted shutdown 
handler to be executed, root=true, meta=false
2012-10-31 19:51:39,045 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
for dw88.kgb.sqa.cm4,60020,1351671478752
2012-10-31 19:51:50,113 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Server REPORT rejected; currently processing 
dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
2012-10-31 19:52:15,945 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
splitting for dw88.kgb.sqa.cm4,60020,1351671478752
{code}

No log of assigning -ROOT-

Regionserver log:
{code}
2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
229128ms instead of 10ms, this is likely due to a long garbage collecting 
pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
{code}




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: 7504-trunk v1.patch

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 7504-trunk v1.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning -ROOT- region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.-ROO- is offline now, and won't be assigned any more unless we restart 
 master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning -ROOT-
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: 7504-trunk v1.patch

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 7504-trunk v1.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning -ROOT- region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.-ROO- is offline now, and won't be assigned any more unless we restart 
 master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning -ROOT-
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7504) -ROOT- may be offline forever after FullGC of RS

2013-01-07 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7504:


Attachment: (was: 7504-trunk v1.patch)

 -ROOT- may be offline forever after FullGC of  RS
 -

 Key: HBASE-7504
 URL: https://issues.apache.org/jira/browse/HBASE-7504
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 7504-trunk v1.patch


 1.FullGC happen on ROOT regionserver.
 2.ZK session timeout, master expire the regionserver and submit to 
 ServerShutdownHandler
 3.Regionserver complete the FullGC
 4.In the process of ServerShutdownHandler, verifyRootRegionLocation returns 
 true
 5.ServerShutdownHandler skip assigning -ROOT- region
 6.Regionserver abort itself because it reveive YouAreDeadException after a 
 regionserver report
 7.-ROO- is offline now, and won't be assigned any more unless we restart 
 master
 Master Log:
 {code}
 2012-10-31 19:51:39,043 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Added=dw88.kgb.sqa.cm4,60020,1351671478752 to dead servers, submitted 
 shutdown handler to be executed, root=true, meta=false
 2012-10-31 19:51:39,045 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for dw88.kgb.sqa.cm4,60020,1351671478752
 2012-10-31 19:51:50,113 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server 
 dw88.kgb.sqa.cm4,60020,1351671478752 was carrying ROOT. Trying to assign.
 2012-10-31 19:52:15,939 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 Server REPORT rejected; currently processing 
 dw88.kgb.sqa.cm4,60020,1351671478752 as dead server
 2012-10-31 19:52:15,945 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skipping log 
 splitting for dw88.kgb.sqa.cm4,60020,1351671478752
 {code}
 No log of assigning -ROOT-
 Regionserver log:
 {code}
 2012-10-31 19:52:15,923 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 229128ms instead of 10ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7403) Online Merge

2013-01-07 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545731#comment-13545731
]

Hadoop QA commented on HBASE-7403:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12563531/hbase-7403-trunkv7.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 3 new
or modified tests.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestSplitTransaction

{color:red}-1 core zombie tests{color}. There are 1 zombie test(s):

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3889//console

This message is automatically generated.

Online Merge

Key: HBASE-7403
URL: https://issues.apache.org/jira/browse/HBASE-7403
Project: HBase
Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
Fix For: 0.96.0, 0.94.5

Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff,
7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv1.patch,
hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch,
merge region.pdf

The feature of this online merge:
1.Online,no necessary to disable table
2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
3.Easy to call merege request, no need to input a long region name, only
encoded name enough
4.No limit when operation, you don't need to tabke care the events like
Server Dead, Balance, Split, Disabing/Enabing table, no need to take care
whether you send a wrong merge request, it has alread done for you
5.Only little offline time for two merging regions
We need merge in the following cases：
1.Region hole or region overlap, can’t be fix by hbck
2.Region become empty because of TTL and not reasonable Rowkey design
3.Region is always empty or very small because of presplit when create table
4.Too many empty or small regions would reduce the system performance(e.g.
mslab)
Current merge tools only support offline and are not able to redo if
exception is thrown in the process of merging, causing a dirty data
For online system, we need a online merge.
This implement logic of this patch for Online Merge is :
For example, merge regionA and regionB into regionC
1.Offline the two regions A and B
2.Merge the two regions in the HDFS(Create regionC’s directory, move
regionA’s and regionB’s file to regionC’s directory, delete regionA’s and
regionB’s directory)
3.Add the merged regionC to .META.
4.Assign the merged regionC
As design of this patch , once we do the merge work in the HDFS,we could redo
it until successful if it throws exception or abort or server restart, but
couldn’t be rolled back.
It depends on
Use zookeeper to

[jira] [Created] (HBASE-7505) Server will hang when stopping cluster, caused by waiting for split threads

2013-01-07 Thread chunhui shen (JIRA)

chunhui shen created HBASE-7505:
---

 Summary: Server will hang when stopping cluster, caused by waiting 
for split threads
 Key: HBASE-7505
 URL: https://issues.apache.org/jira/browse/HBASE-7505
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0


We will retry 100 times (about 3200 minitues) for 
HRegionServer#postOpenDeployTasks now, see 
HConnectionManager#setServerSideHConnectionRetries.

However, 
when we stopping the cluster, we will wait for split threads in  
HRegionServer#join,
if META/ROOT server has already been stopped, the split thread won't exit 
because it is in the retrying for HRegionServer#postOpenDeployTasks


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7505) Server will hang when stopping cluster, caused by waiting for split threads

2013-01-07 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7505:


Attachment: 7505-trunk v1.patch

 Server will hang when stopping cluster, caused by waiting for split threads
 ---

 Key: HBASE-7505
 URL: https://issues.apache.org/jira/browse/HBASE-7505
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7505-trunk v1.patch


 We will retry 100 times (about 3200 minitues) for 
 HRegionServer#postOpenDeployTasks now, see 
 HConnectionManager#setServerSideHConnectionRetries.
 However, 
 when we stopping the cluster, we will wait for split threads in  
 HRegionServer#join,
 if META/ROOT server has already been stopped, the split thread won't exit 
 because it is in the retrying for HRegionServer#postOpenDeployTasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7506) Judgement of carrtying ROOT/META will become wrong when expiring server

2013-01-07 Thread chunhui shen (JIRA)

chunhui shen created HBASE-7506:
---

 Summary: Judgement of carrtying ROOT/META will become wrong when 
expiring server
 Key: HBASE-7506
 URL: https://issues.apache.org/jira/browse/HBASE-7506
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0


We will check whether server carrying ROOT/META when expiring the server.
See ServerManager#expireServer.

If the dead server carrying META, we assign meta directly in the process of 
ServerShutdownHandler.
If the dead server carrying ROOT, we will offline ROOT and then 
verifyAndAssignRootWithRetries()

How judgement of carrtying ROOT/META become wrong?
If region is in RIT, and isCarryingRegion() return true after addressing from 
zk.
However, once RIT time out(could be caused by this.allRegionServersOffline  
!noRSAvailable, see AssignmentManager#TimeoutMonitor)   and we assign it to 
otherwhere, this judgement become wrong.
See AssignmentManager#isCarryingRegion for details

With the wrong judgement of carrtying ROOT/META, we would assign ROOT/META 
twice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7403) Online Merge

2013-01-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545749#comment-13545749
 ] 

Hadoop QA commented on HBASE-7403:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12563531/hbase-7403-trunkv7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3888//console

This message is automatically generated.

 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0, 0.94.5

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv1.patch, 
 hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, 
 merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 We need merge in the following cases：
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we could redo 
 it until successful if it throws exception or abort or server restart, but 
 couldn’t be rolled back. 
 It depends on
 Use zookeeper to record the transaction journal state, make redo easier
 Use zookeeper to send/receive merge request
 Merge transaction is executed on the master
 Support

[jira] [Updated] (HBASE-7506) Judgement of carrtying ROOT/META will become wrong when expiring server

2013-01-07 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7506:


Attachment: 7506-trunk v1.patch

 Judgement of carrtying ROOT/META will become wrong when expiring server
 ---

 Key: HBASE-7506
 URL: https://issues.apache.org/jira/browse/HBASE-7506
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7506-trunk v1.patch


 We will check whether server carrying ROOT/META when expiring the server.
 See ServerManager#expireServer.
 If the dead server carrying META, we assign meta directly in the process of 
 ServerShutdownHandler.
 If the dead server carrying ROOT, we will offline ROOT and then 
 verifyAndAssignRootWithRetries()
 How judgement of carrtying ROOT/META become wrong?
 If region is in RIT, and isCarryingRegion() return true after addressing from 
 zk.
 However, once RIT time out(could be caused by this.allRegionServersOffline  
 !noRSAvailable, see AssignmentManager#TimeoutMonitor)   and we assign it to 
 otherwhere, this judgement become wrong.
 See AssignmentManager#isCarryingRegion for details
 With the wrong judgement of carrtying ROOT/META, we would assign ROOT/META 
 twice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7507) Make memstore flush be able to retry after exception

2013-01-07 Thread chunhui shen (JIRA)

chunhui shen created HBASE-7507:
---

 Summary: Make memstore flush be able to retry after exception
 Key: HBASE-7507
 URL: https://issues.apache.org/jira/browse/HBASE-7507
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0


We will abort regionserver if memstore flush throws exception.

I thinks we could do retry to make regionserver more stable because file system 
may be not ok in a transient time. e.g. Switching namenode in the NamenodeHA 
environment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7507) Make memstore flush be able to retry after exception

2013-01-07 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7507:


Description: 
We will abort regionserver if memstore flush throws exception.

I thinks we could do retry to make regionserver more stable because file system 
may be not ok in a transient time. e.g. Switching namenode in the NamenodeHA 
environment



{code}
HRegion#internalFlushcache(){

...
try {
...
}catch(Throwable t){
DroppedSnapshotException dse = new DroppedSnapshotException(region:  +
  Bytes.toStringBinary(getRegionName()));
dse.initCause(t);
throw dse;
}
...

}

MemStoreFlusher#flushRegion(){
...
region.flushcache();
...
 try {
}catch(DroppedSnapshotException ex){
server.abort(Replay of HLog required. Forcing server shutdown, ex);
}

...
}
{code}

  was:
We will abort regionserver if memstore flush throws exception.

I thinks we could do retry to make regionserver more stable because file system 
may be not ok in a transient time. e.g. Switching namenode in the NamenodeHA 
environment


 Make memstore flush be able to retry after exception
 

 Key: HBASE-7507
 URL: https://issues.apache.org/jira/browse/HBASE-7507
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0


 We will abort regionserver if memstore flush throws exception.
 I thinks we could do retry to make regionserver more stable because file 
 system may be not ok in a transient time. e.g. Switching namenode in the 
 NamenodeHA environment
 {code}
 HRegion#internalFlushcache(){
 ...
 try {
 ...
 }catch(Throwable t){
 DroppedSnapshotException dse = new DroppedSnapshotException(region:  +
   Bytes.toStringBinary(getRegionName()));
 dse.initCause(t);
 throw dse;
 }
 ...
 }
 MemStoreFlusher#flushRegion(){
 ...
 region.flushcache();
 ...
  try {
 }catch(DroppedSnapshotException ex){
 server.abort(Replay of HLog required. Forcing server shutdown, ex);
 }
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7507) Make memstore flush be able to retry after exception

2013-01-07 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-7507:


Attachment: 7507-trunk v1.patch

 Make memstore flush be able to retry after exception
 

 Key: HBASE-7507
 URL: https://issues.apache.org/jira/browse/HBASE-7507
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: 7507-trunk v1.patch


 We will abort regionserver if memstore flush throws exception.
 I thinks we could do retry to make regionserver more stable because file 
 system may be not ok in a transient time. e.g. Switching namenode in the 
 NamenodeHA environment
 {code}
 HRegion#internalFlushcache(){
 ...
 try {
 ...
 }catch(Throwable t){
 DroppedSnapshotException dse = new DroppedSnapshotException(region:  +
   Bytes.toStringBinary(getRegionName()));
 dse.initCause(t);
 throw dse;
 }
 ...
 }
 MemStoreFlusher#flushRegion(){
 ...
 region.flushcache();
 ...
  try {
 }catch(DroppedSnapshotException ex){
 server.abort(Replay of HLog required. Forcing server shutdown, ex);
 }
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7403) Online Merge

2013-01-07 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545784#comment-13545784
]