[jira] [Commented] (HBASE-4326) Tests that use HBaseTestingUtility.startMiniCluster(n) should shutdown with HBaseTestingUtility.shutdownMiniCluster.

2011-09-26 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114892#comment-13114892
 ] 

Jonathan Hsieh commented on HBASE-4326:
---

Adding @AfterClass shutdown to TestHLog seems to make the testLogCleaning tests 
hang when it attempts to shutdown.

> Tests that use HBaseTestingUtility.startMiniCluster(n) should shutdown with 
> HBaseTestingUtility.shutdownMiniCluster.
> 
>
> Key: HBASE-4326
> URL: https://issues.apache.org/jira/browse/HBASE-4326
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>
> Most tests that use mini clusters use this pattern
> {code}
>  private final static HBaseTestingUtility UTIL = new HBaseTestingUtility();
>   @BeforeClass
>   public static void beforeClass() throws Exception {
> UTIL.startMiniCluster(1);
>   }
>   @AfterClass
>   public static void afterClass() throws IOException {
> UTIL.shutdownMiniCluster();
>   }
> {code}
> Some tests (like hbase-4269)
> {code}
>   @BeforeClass
>   public static void beforeClass() throws Exception {
> UTIL.startMiniCluster(1);
>   }
>   @AfterClass
>   public static void afterClass() throws IOException {
> UTIL.getMiniCluster().shutdown();
> // or UTIL.shutdownMiniHBaseCluster();
> // and likely others.
>   }
> {code}
> There is a difference between the two shutdown -- the former deletes files 
> created during the tests while the latter does not.  This funny state 
> persisting (zk or hbase/mr data) may be the cause of strange inter-testcase 
> problems when full suites are run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-27 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116021#comment-13116021
 ] 

Jonathan Hsieh commented on HBASE-4489:
---

A few thoughts:

I agree with jgray -- I think one fix should correct the MD5 string split so 
that it splits from 0x00.. 0xff.  I think there could be another separate patch 
that adds the UniformSplit.  

I'd be wary of changing the default, especially if this is means to go into a 
0.90.x branch.  It looks like as a user you can add and use the UniformSplit by 
changing the conf option. 

Ideally patches with new functionality or changing semantics would also 
introduce corresponding tests.  There were no tests on the previous code, and 
no tests in on the newly introduced code.  Adding tests especially around edge 
cases could accommodate Ted's concerns, and it doesn't really hurt to be extra 
defensive when coding on non-performance sensitive code.



> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116634#comment-13116634
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

I think my plan is to postpone the large refactor until after this gets 
through. 

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116712#comment-13116712
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

@stack: Not yet, I'm still cleaning this up and adding tests right now.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116713#comment-13116713
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

More detail -- I've done a large refactor of hbck but found that then doing the 
changes would more difficult understand or review the offline rebuild code.  
So, my plan is to add the offline rebuild code, and then potentially do a 
refactor afterwards.

Regardless of whether the refactor happens, I feel that I need to add tests and 
docs for this before it is ready for review. 


> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116774#comment-13116774
 ] 

Jonathan Hsieh commented on HBASE-4489:
---

@Dave

My main suggestion is fixing based on the original author's intent in one patch 
(fixing the ascii encoded hex 7f problem) and then potentially changing the 
semantics/default in a different patch.  I believe we agree that the intent of 
the original author's code looks to be for ascii hex ranges and that the 0x7f 
max is broken.

In the tables I've encountered, it seems more folks who just use ascii rowkeys 
than use binary rowkeys.  Using the uniform byte range split keys for ascii 
character ranges -- would make the new alternate default just a "wrong" for 
many users.  The shell provides a generic mechanism for generating splits for 
new tables now (HBASE-4000) so it seems like using that completely generic 
approach seems more useful given knowledge about your particular row keys.

>From a code skim, it seems that rollingSplits is "smarter" - it take existing 
>row key boundaries and split them at region midpoints.  This is still 
>vulnerable to skewed rows key distributions but at least takes into account 
>the existing rowkey ranges!




> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116951#comment-13116951
 ] 

Jonathan Hsieh commented on HBASE-4489:
---

@Ted, sounds good to me.

> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117095#comment-13117095
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

I'm having a hard time with tests that restart the test hbase mini cluster.  I 
start cluster, modify meta/hdfs regions, shutdown cluster, rebuild meta, and 
then get an NPE when restarting. 

Specifically, this method sometimes returns null which later causes an NPE when 
constructor calls 

{code}
User.HadoopUser.
  ugi = (UserGroupInformation) callStatic("getCurrentUGI");
{code}

Test were passing at one point but I can't seem to figure out a direct cause 
for why this would fail.  Any hints?  

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4515) User.getCurrent() can cause NPE.

2011-09-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117623#comment-13117623
 ] 

Jonathan Hsieh commented on HBASE-4515:
---

A stack trace of the error that I'd like to avoid.

{code}
2011-09-29 11:38:45,823 ERROR [Thread-341] hbase.MiniHBaseCluster(201): Error 
starting cluster
java.lang.NullPointerException
at org.apache.hadoop.hbase.security.User.getName(User.java:71)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.getDifferentUser(HBaseTestingUtility.java:1421)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:191)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:76)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:61)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.restartHBaseCluster(HBaseTestingUtility.java:505)
at 
org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuild.testMetaRebuild(TestOfflineMetaRebuild.java:306)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
{code}

> User.getCurrent() can cause NPE.
> 
>
> Key: HBASE-4515
> URL: https://issues.apache.org/jira/browse/HBASE-4515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> When testing with miniclusters that shutdown and are restarted, sometimes a 
> call to User.getCurrent().getName() NPEs when attempting to restart hbase.  
> Oddly this happens consistently on particular branches and not on others. I 
> don't know or understand why this happens but it has something to do with the 
> getCurrentUGI call in  o.a.h.h.security.User.HadoopUser sometimes returning 
> null and sometimes returning data.
> {code}
>private HadoopUser() {
>   try {
> ugi = (UserGroupInformation) callStatic("getCurrentUGI");
> if (ugi == null) {
>   LOG.warn("Although successfully retrieved UserGroupInformation" 
>   + "  it was null!");
> }
>   } catch (RuntimeException re) {
> {code}
> This patch essentially is a workaround -- it propagates the null so that 
> clients can check and avoid the NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117641#comment-13117641
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

HBASE-4515 is required for tests to pass consistently

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4515) User.getCurrent() can cause NPE.

2011-09-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117735#comment-13117735
 ] 

Jonathan Hsieh commented on HBASE-4515:
---

@Gary I tested your fix and it works for me.  Can this be backported to 
0.92/0.90.x as well?  I need it for the unit tests of HBASE-4377, which I would 
like to implement and backport as well.  



> User.getCurrent() can cause NPE.
> 
>
> Key: HBASE-4515
> URL: https://issues.apache.org/jira/browse/HBASE-4515
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 0001-HBASE-4515-User.getCurrent-can-cause-NPE.patch, 
> HBASE-4515_trunk.patch
>
>
> When testing with miniclusters that shutdown and are restarted, sometimes a 
> call to User.getCurrent().getName() NPEs when attempting to restart hbase.  
> Oddly this happens consistently on particular branches and not on others. I 
> don't know or understand why this happens but it has something to do with the 
> getCurrentUGI call in  o.a.h.h.security.User.HadoopUser sometimes returning 
> null and sometimes returning data.
> {code}
>private HadoopUser() {
>   try {
> ugi = (UserGroupInformation) callStatic("getCurrentUGI");
> if (ugi == null) {
>   LOG.warn("Although successfully retrieved UserGroupInformation" 
>   + "  it was null!");
> }
>   } catch (RuntimeException re) {
> {code}
> This patch essentially is a workaround -- it propagates the null so that 
> clients can check and avoid the NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4436) Remove methods deprecated in 0.90 from TRUNK and 0.92

2011-09-30 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118495#comment-13118495
 ] 

Jonathan Hsieh commented on HBASE-4436:
---

I'll take this.  I'm going to break it down to a few patches.  The first will 
be the completely trivial changes, to be followed by a series of more 
complicated patches.  

Is removing classes like HServerAddress in scope? (It is pretty pervasive)

> Remove methods deprecated in 0.90 from TRUNK and 0.92
> -
>
> Key: HBASE-4436
> URL: https://issues.apache.org/jira/browse/HBASE-4436
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Jonathan Hsieh
>Priority: Critical
>  Labels: noob
> Fix For: 0.92.0
>
>
> Remove methods deprecated in 0.90 from codebase.  i took a quick look.  The 
> messy bit is thrift referring to old stuff; will take a little work to do the 
> convertion over to the new methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4509) [hbck] Improve region map output

2011-09-30 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118560#comment-13118560
 ] 

Jonathan Hsieh commented on HBASE-4509:
---

@Stack In my tree, it is applied after HBASE-4506.  I've added it as a 
dependency, and will give it a quick test without it.  

I wanted the trunk review before backporting this to 0.90 -- I'll get the 
backport done this weekend. 


> [hbck] Improve region map output
> 
>
> Key: HBASE-4509
> URL: https://issues.apache.org/jira/browse/HBASE-4509
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0, 0.94.0, 0.90.5
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 0001-HBASE-4509-hbck-Improve-region-map-output.patch
>
>
> HBASE-4375 added a region coverage visualization to hbck in details mode.  
> When users have binary row keys the output is difficult to parse (awk/sed) or 
> pull into programs (numeric, excel) capable of handling tsv formatted data.
> This patch 
> * improves output by using Bytes.toStringBinary (which escapes binary) 
> instead of Bytes.toString when printing keys, 
> * suggests some repair actions, and 
> * collects "problem group" that groups regions that are overlapping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4509) [hbck] Improve region map output

2011-09-30 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118568#comment-13118568
 ] 

Jonathan Hsieh commented on HBASE-4509:
---

New patch cherry-picked onto 0.90 and test pass there as well.

> [hbck] Improve region map output
> 
>
> Key: HBASE-4509
> URL: https://issues.apache.org/jira/browse/HBASE-4509
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0, 0.94.0, 0.90.5
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 0001-HBASE-4509-hbck-Improve-region-map-output.patch, 
> hbase-4509-pre4506.patch
>
>
> HBASE-4375 added a region coverage visualization to hbck in details mode.  
> When users have binary row keys the output is difficult to parse (awk/sed) or 
> pull into programs (numeric, excel) capable of handling tsv formatted data.
> This patch 
> * improves output by using Bytes.toStringBinary (which escapes binary) 
> instead of Bytes.toString when printing keys, 
> * suggests some repair actions, and 
> * collects "problem group" that groups regions that are overlapping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-03 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119466#comment-13119466
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

Since the review for trunk had relatively minor issues, I'm going to work on 
re-backporting this to the 0.90 branch.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> hbase-4377-trunk.v2.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-03 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119727#comment-13119727
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

When backporting to 0.90, the TestOfflineMetaRebuild test case would fail out 
due to out of file handles exceptions.  I dug for a while and found that the 
static HConnections cached connections that are not flushed between tests.  
Even after avoiding that there are other resources (maybe pooling on hdfs 
client or zk client connections?) that cause the open file handles count to 
increase significantly after every test case.  

To avoid this problem, I'm going to split out the each rebuild tests into own 
test case so that each can be executed in a new process and avoid the out of 
file handles problem.  I'll do this for trunk and for the 0.90 backport.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> hbase-4377-trunk.v2.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-04 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120561#comment-13120561
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

Although I've gotten this to work with live systems, it seems like that there 
are some problems with the testing on the backports.  Different versions have 
different expected values which does not seem to make sense.  HBASE-3777 
changed some of the semantics of the HBaseTestingUtility so I'll be 
investigating more.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> hbase-4377-trunk.v2.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-06 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122344#comment-13122344
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

In the 0.90 branch, after deleting meta and restarting the # of tables present 
is 0.
In trunk and 0.92 branch, after deleting meta and restart the # of tables 
present is 1.  

This actually does make sense because HBASE-451 changed the behavior of HMaster 
-- in 0.90 (pre-HBASE-451) it HConnectionManager.listTables() loads table info 
on the client side via a meta scan.  Post HBASE-451, table data from  
HConnectionManager.listTables() comes from the files system and is cached by 
the HMaster, and ignores the meta table.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> hbase-4377-trunk.v2.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-06 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122367#comment-13122367
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

@Todd,

I think there is some confusion.  Clients do not directly access hdfs. Let me 
add more detail.

In trunk post HBASE-451, the HMaster reads and caches data from the file system 
(not the client).  It then serves this the HTableDescriptors to the client 
rpc's  via HConnectionManager to talk to the HMaster which just ships the 
cached HTD data.  

HMaster on initialization reads file system for HTD data.
Client calls listTables() -> HMaster (serve cached data from file system).

Pre-HBASE-451, it the client HConnectionManager does a meta scan and builds 
HTableDescriptors.  

Client calls listTables() which actually is a metascan and that builds htds.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> hbase-4377-trunk.v2.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4548) Client should not look on HDFS to list tables

2011-10-06 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122369#comment-13122369
 ] 

Jonathan Hsieh commented on HBASE-4548:
---

@Todd, (also posted in HBASE-4377).

I think there is some confusion. Clients do not directly access hdfs. Let me 
add more detail.

In trunk post HBASE-451, the HMaster reads and caches data from the file system 
(not the client). It then serves this the HTableDescriptors to the client rpc's 
via HConnectionManager to talk to the HMaster which just ships the cached HTD 
data.

HMaster on initialization reads file system for HTD data.
Client calls listTables() -> HMaster (serve cached data from file system).

Pre-HBASE-451, it the client HConnectionManager does a meta scan and builds 
HTableDescriptors.

Client calls listTables() which actually is a metascan and that builds htds.



> Client should not look on HDFS to list tables
> -
>
> Key: HBASE-4548
> URL: https://issues.apache.org/jira/browse/HBASE-4548
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
>
> In HBASE-4377, Jon noticed that HConnectionManager.listTable now looks on 
> HDFS for the table list. This seems incorrect, since the client may not have 
> access to the hbase directory on HDFS (eg in a secure cluster). At the least, 
> it should RPC to the master to find a table list, and have the master do the 
> list on HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4548) Client should not look on HDFS to list tables

2011-10-06 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122372#comment-13122372
 ] 

Jonathan Hsieh commented on HBASE-4548:
---

closed out as not a problem.

> Client should not look on HDFS to list tables
> -
>
> Key: HBASE-4548
> URL: https://issues.apache.org/jira/browse/HBASE-4548
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
>
> In HBASE-4377, Jon noticed that HConnectionManager.listTable now looks on 
> HDFS for the table list. This seems incorrect, since the client may not have 
> access to the hbase directory on HDFS (eg in a secure cluster). At the least, 
> it should RPC to the master to find a table list, and have the master do the 
> list on HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-10-07 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122974#comment-13122974
 ] 

Jonathan Hsieh commented on HBASE-4335:
---

@Lars some nits / suggestions on v3.

TestEndToEndSplitTransaction needs license.

Maybe a more descriptive function names for phaseI, phaseII, phaseIII?

Any reason for the (overly?) general Class... instead of just taking a single 
Class and checking for null when no exceptions expected?  Or maybe just make 
'test' return boolean and assertTrue/assertFalse?






> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335-v3.txt, 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers

2011-10-11 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125037#comment-13125037
 ] 

Jonathan Hsieh commented on HBASE-4335:
---

lgtm

> Splits can create temporary holes in .META. that confuse clients and 
> regionservers
> --
>
> Key: HBASE-4335
> URL: https://issues.apache.org/jira/browse/HBASE-4335
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: Joe Pallas
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4335-v2.txt, 4335-v3.txt, 4335-v4.txt, 4335-v5.txt, 
> 4335.txt
>
>
> When a SplitTransaction is performed, three updates are done to .META.:
> 1. The parent region is marked as splitting (and hence offline)
> 2. The first daughter region is added (same start key as parent)
> 3. The second daughter region is added (split key is start key)
> (later, the original parent region is deleted, but that's not important to 
> this discussion)
> Steps 2 and 3 are actually done concurrently by 
> SplitTransaction.DaughterOpener threads.  While the master is notified when a 
> split is complete, the only visibility that clients have is whether the 
> daughter regions have appeared in .META.
> If the second daughter is added to .META. first, then .META. will contain the 
> (offline) parent region followed by the second daughter region.  If the 
> client looks up a key that is greater than (or equal to) the split, the 
> client will find the second daughter region and use it.  If the key is less 
> than the split key, the client will find the parent region and see that it is 
> offline, triggering a retry.
> If the first daughter is added to .META. before the second daughter, there is 
> a window during which .META. has a hole: the first daughter effectively hides 
> the parent region (same start key), but there is no entry for the second 
> daughter.  A region lookup will find the first daughter for all keys in the 
> parent's range, but the first daughter does not include keys at or beyond the 
> split key.
> See HBASE-4333 and HBASE-4334 for details on how this causes problems and 
> suggestions for mitigating this in the client and regionserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-10-11 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125118#comment-13125118
 ] 

Jonathan Hsieh commented on HBASE-4489:
---

@Dave

Part of me really just would prefer decouple rollingSplit from the  presplit 
min/max value selection -- maybe change this in to two programs -- a custom 
presplit table generator program that handles key bounds, and a separate 
rollingSplit program that just splits based on given key ranges.

I thought that there was agreement that we would keep MD5StringSplit as default 
for 0.90.  It looks like the default was changed to UniformSplit from 
MD5StringSplit in both patches.   While I generally agree with your point #3, 
it is a in 0.90 and would be a compatibility problem for anyone who depends on 
it.   Would it make sense to change the default in trunk/0.92 (I'm fine with 
that) but leave 0.90.x as is?

Nice functional test.  Did you consider just doing a unit test on the split 
algorithm along with the cluster spinning functional test?  I believe 
HBaseAdmin.create(HTableDescriptor htd,byte startKeys[][]) is well tested and 
would make the non @Ignored portions quicker.  I can see how you need this 
setup for testing rollingSplit.

Interesting div 0 bug.  More testing, less surprises!

Any reason why in testCreatePressplitTable you go to -0x71, 0x81 .. -0x11 
instead of just going to 0x8f, 0x9f .. 0xff?  Though more verbose,  I think it 
is easier to read and follow if you use "positive" hex and cast all of them 
with (byte), or write out single longs and convert?



> Better key splitting in RegionSplitter
> --
>
> Key: HBASE-4489
> URL: https://issues.apache.org/jira/browse/HBASE-4489
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Dave Revell
>Assignee: Dave Revell
> Attachments: HBASE-4489-branch0.90-v1.patch, 
> HBASE-4489-branch0.90-v2.patch, HBASE-4489-trunk-v1.patch, 
> HBASE-4489-trunk-v2.patch
>
>
> The RegionSplitter utility allows users to create a pre-split table from the 
> command line or do a rolling split on an existing table. It supports 
> pluggable split algorithms that implement the SplitAlgorithm interface. The 
> only/default SplitAlgorithm is one that assumes keys fall in the range from 
> ASCII string "" to ASCII string "7FFF". This is not a sane 
> default, and seems useless to most users. Users are likely to be surprised by 
> the fact that all the region splits occur in in the byte range of ASCII 
> characters.
> A better default split algorithm would be one that evenly divides the space 
> of all bytes, which is what this patch does. Making a table with five regions 
> would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
> \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4570) Scan ACID problem with concurrent puts.

2011-10-12 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126001#comment-13126001
 ] 

Jonathan Hsieh commented on HBASE-4570:
---

rephrase: I have not been able to duplicate this in a unit test yet.  

This test seems scenario is similar to TestAcidGuarentees (HBASE-2856) but uses 
filters and seems a little focused on this particular symptom.

> Scan ACID problem with concurrent puts.
> ---
>
> Key: HBASE-4570
> URL: https://issues.apache.org/jira/browse/HBASE-4570
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.90.1, 0.90.3
>Reporter: Jonathan Hsieh
> Attachments: hbase-4570.tgz
>
>
> When scanning a table sometimes rows that have multiple column families get 
> split into two rows if there are concurrent writes.  In this particular case 
> we are overwriting the contents of a Get directly back onto itself as a Put.
> For example, this is a two cf row (with "f1", "f2", .. "f9" cfs).  It is 
> actually returned as two rows (#55 and #56). Interestingly if the two were 
> merged we would have a single proper row.
> Row row024461 had time stamps: [55: 
> keyvalues={row024461/f0:data/1318200440867/Put/vlen=1000, 
> row024461/f0:qual/1318200440867/Put/vlen=10, 
> row024461/f1:data/1318200440867/Put/vlen=1000, 
> row024461/f1:qual/1318200440867/Put/vlen=10, 
> row024461/f2:data/1318200440867/Put/vlen=1000, 
> row024461/f2:qual/1318200440867/Put/vlen=10, 
> row024461/f3:data/1318200440867/Put/vlen=1000, 
> row024461/f3:qual/1318200440867/Put/vlen=10, 
> row024461/f4:data/1318200440867/Put/vlen=1000, 
> row024461/f4:qual/1318200440867/Put/vlen=10}, 
> 56: keyvalues={row024461/f5:data/1318200440867/Put/vlen=1000, 
> row024461/f5:qual/1318200440867/Put/vlen=10, 
> row024461/f6:data/1318200440867/Put/vlen=1000, 
> row024461/f6:qual/1318200440867/Put/vlen=10, 
> row024461/f7:data/1318200440867/Put/vlen=1000, 
> row024461/f7:qual/1318200440867/Put/vlen=10, 
> row024461/f8:data/1318200440867/Put/vlen=1000, 
> row024461/f8:qual/1318200440867/Put/vlen=10, 
> row024461/f9:data/1318200440867/Put/vlen=1000, 
> row024461/f9:qual/1318200440867/Put/vlen=10}]
> I've only tested this on 0.90.1+patches and 0.90.3+patches, but it is 
> consistent and duplicatable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4570) Scan ACID problem with concurrent puts.

2011-10-12 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126369#comment-13126369
 ] 

Jonathan Hsieh commented on HBASE-4570:
---

Ran the unit test version of this test and it did not fail as the separate 
programs did after 3-4 hours.



> Scan ACID problem with concurrent puts.
> ---
>
> Key: HBASE-4570
> URL: https://issues.apache.org/jira/browse/HBASE-4570
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.90.1, 0.90.3
>Reporter: Jonathan Hsieh
> Attachments: hbase-4570.tgz
>
>
> When scanning a table sometimes rows that have multiple column families get 
> split into two rows if there are concurrent writes.  In this particular case 
> we are overwriting the contents of a Get directly back onto itself as a Put.
> For example, this is a two cf row (with "f1", "f2", .. "f9" cfs).  It is 
> actually returned as two rows (#55 and #56). Interestingly if the two were 
> merged we would have a single proper row.
> Row row024461 had time stamps: [55: 
> keyvalues={row024461/f0:data/1318200440867/Put/vlen=1000, 
> row024461/f0:qual/1318200440867/Put/vlen=10, 
> row024461/f1:data/1318200440867/Put/vlen=1000, 
> row024461/f1:qual/1318200440867/Put/vlen=10, 
> row024461/f2:data/1318200440867/Put/vlen=1000, 
> row024461/f2:qual/1318200440867/Put/vlen=10, 
> row024461/f3:data/1318200440867/Put/vlen=1000, 
> row024461/f3:qual/1318200440867/Put/vlen=10, 
> row024461/f4:data/1318200440867/Put/vlen=1000, 
> row024461/f4:qual/1318200440867/Put/vlen=10}, 
> 56: keyvalues={row024461/f5:data/1318200440867/Put/vlen=1000, 
> row024461/f5:qual/1318200440867/Put/vlen=10, 
> row024461/f6:data/1318200440867/Put/vlen=1000, 
> row024461/f6:qual/1318200440867/Put/vlen=10, 
> row024461/f7:data/1318200440867/Put/vlen=1000, 
> row024461/f7:qual/1318200440867/Put/vlen=10, 
> row024461/f8:data/1318200440867/Put/vlen=1000, 
> row024461/f8:qual/1318200440867/Put/vlen=10, 
> row024461/f9:data/1318200440867/Put/vlen=1000, 
> row024461/f9:qual/1318200440867/Put/vlen=10}]
> I've only tested this on 0.90.1+patches and 0.90.3+patches, but it is 
> consistent and duplicatable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4485) Eliminate window of missing Data

2011-10-13 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127164#comment-13127164
 ] 

Jonathan Hsieh commented on HBASE-4485:
---

@Amitanand

I've applied HBASE-2856's from 
(https://reviews.apache.org/r/2224/diff/#index_header) onto trunk (with minor 
tweak) and then applied HBASE-4485 but have a compile failure.  Specificially 
matcher.ignoreNewerKVs() seems to be missing.  Is there another commit that I'm 
missing?  

> Eliminate window of missing Data
> 
>
> Key: HBASE-4485
> URL: https://issues.apache.org/jira/browse/HBASE-4485
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.94.0
>
> Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 4485-v4.diff, 
> repro_bug-4485.diff
>
>
> After incorporating v11 of the 2856 fix, we discovered that we are still 
> having some ACID violations.
> This time, however, the problem is not about including "newer" updates; but, 
> about missing older updates
> that should be including. 
> Here is what seems to be happening.
> There is a race condition in the StoreScanner.getScanners()
>   private List getScanners(Scan scan,
>   final NavigableSet columns) throws IOException {
> // First the store file scanners
> List sfScanners = StoreFileScanner
>   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
> isGet, false);
> List scanners =
>   new ArrayList(sfScanners.size()+1);
> // include only those scan files which pass all filters
> for (StoreFileScanner sfs : sfScanners) {
>   if (sfs.shouldSeek(scan, columns)) {
> scanners.add(sfs);
>   }
> }
> // Then the memstore scanners
> if (this.store.memstore.shouldSeek(scan)) {
>   scanners.addAll(this.store.memstore.getScanners());
> }
> return scanners;
>   }
> If for example there is a call to Store.updateStorefiles() that happens 
> between
> the store.getStorefiles() and this.store.memstore.getScanners(); then
> it is possible that there was a new HFile created, that is not seen by the
> StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4570) Scan ACID problem with concurrent puts.

2011-10-13 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127162#comment-13127162
 ] 

Jonathan Hsieh commented on HBASE-4570:
---

I can still run these and see acid failues on today's trunk with git hash 
b45dfec.  

I've also tried on a build that applies HBASE-2856 v11 
(https://reviews.apache.org/r/2224/diff/#index_header) it also still has the 
same problem.  




> Scan ACID problem with concurrent puts.
> ---
>
> Key: HBASE-4570
> URL: https://issues.apache.org/jira/browse/HBASE-4570
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.90.1, 0.90.3
>Reporter: Jonathan Hsieh
> Attachments: hbase-4570.tgz
>
>
> When scanning a table sometimes rows that have multiple column families get 
> split into two rows if there are concurrent writes.  In this particular case 
> we are overwriting the contents of a Get directly back onto itself as a Put.
> For example, this is a two cf row (with "f1", "f2", .. "f9" cfs).  It is 
> actually returned as two rows (#55 and #56). Interestingly if the two were 
> merged we would have a single proper row.
> Row row024461 had time stamps: [55: 
> keyvalues={row024461/f0:data/1318200440867/Put/vlen=1000, 
> row024461/f0:qual/1318200440867/Put/vlen=10, 
> row024461/f1:data/1318200440867/Put/vlen=1000, 
> row024461/f1:qual/1318200440867/Put/vlen=10, 
> row024461/f2:data/1318200440867/Put/vlen=1000, 
> row024461/f2:qual/1318200440867/Put/vlen=10, 
> row024461/f3:data/1318200440867/Put/vlen=1000, 
> row024461/f3:qual/1318200440867/Put/vlen=10, 
> row024461/f4:data/1318200440867/Put/vlen=1000, 
> row024461/f4:qual/1318200440867/Put/vlen=10}, 
> 56: keyvalues={row024461/f5:data/1318200440867/Put/vlen=1000, 
> row024461/f5:qual/1318200440867/Put/vlen=10, 
> row024461/f6:data/1318200440867/Put/vlen=1000, 
> row024461/f6:qual/1318200440867/Put/vlen=10, 
> row024461/f7:data/1318200440867/Put/vlen=1000, 
> row024461/f7:qual/1318200440867/Put/vlen=10, 
> row024461/f8:data/1318200440867/Put/vlen=1000, 
> row024461/f8:qual/1318200440867/Put/vlen=10, 
> row024461/f9:data/1318200440867/Put/vlen=1000, 
> row024461/f9:qual/1318200440867/Put/vlen=10}]
> I've only tested this on 0.90.1+patches and 0.90.3+patches, but it is 
> consistent and duplicatable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4570) Scan ACID problem with concurrent puts.

2011-10-14 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127711#comment-13127711
 ] 

Jonathan Hsieh commented on HBASE-4570:
---

Current experiment seems to indicate that Bytes.equals, when it uses the 
UNSAFE_COMPARER class doesn't always tell the truth, and causes scan rows to 
get chopped up into two rows.  I've modified code to use the PureJavaComparer 
and the described problem hasn't appeared yet (runing for 30 mins or so).  

> Scan ACID problem with concurrent puts.
> ---
>
> Key: HBASE-4570
> URL: https://issues.apache.org/jira/browse/HBASE-4570
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.90.1, 0.90.3
>Reporter: Jonathan Hsieh
> Attachments: hbase-4570.tgz
>
>
> When scanning a table sometimes rows that have multiple column families get 
> split into two rows if there are concurrent writes.  In this particular case 
> we are overwriting the contents of a Get directly back onto itself as a Put.
> For example, this is a two cf row (with "f1", "f2", .. "f9" cfs).  It is 
> actually returned as two rows (#55 and #56). Interestingly if the two were 
> merged we would have a single proper row.
> Row row024461 had time stamps: [55: 
> keyvalues={row024461/f0:data/1318200440867/Put/vlen=1000, 
> row024461/f0:qual/1318200440867/Put/vlen=10, 
> row024461/f1:data/1318200440867/Put/vlen=1000, 
> row024461/f1:qual/1318200440867/Put/vlen=10, 
> row024461/f2:data/1318200440867/Put/vlen=1000, 
> row024461/f2:qual/1318200440867/Put/vlen=10, 
> row024461/f3:data/1318200440867/Put/vlen=1000, 
> row024461/f3:qual/1318200440867/Put/vlen=10, 
> row024461/f4:data/1318200440867/Put/vlen=1000, 
> row024461/f4:qual/1318200440867/Put/vlen=10}, 
> 56: keyvalues={row024461/f5:data/1318200440867/Put/vlen=1000, 
> row024461/f5:qual/1318200440867/Put/vlen=10, 
> row024461/f6:data/1318200440867/Put/vlen=1000, 
> row024461/f6:qual/1318200440867/Put/vlen=10, 
> row024461/f7:data/1318200440867/Put/vlen=1000, 
> row024461/f7:qual/1318200440867/Put/vlen=10, 
> row024461/f8:data/1318200440867/Put/vlen=1000, 
> row024461/f8:qual/1318200440867/Put/vlen=10, 
> row024461/f9:data/1318200440867/Put/vlen=1000, 
> row024461/f9:qual/1318200440867/Put/vlen=10}]
> I've only tested this on 0.90.1+patches and 0.90.3+patches, but it is 
> consistent and duplicatable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4570) Scan ACID problem with concurrent puts.

2011-10-14 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127774#comment-13127774
 ] 

Jonathan Hsieh commented on HBASE-4570:
---

The way this is setup, I can't tell if problem will never happen, but I can 
detect if it ever does.

I'm still experimenting on trunk and will move to previous versions when I feel 
confident with this potential root cause.  I'm using a combo of HBASE-2856 on 
trunk and reverting to the java comparator -- it might the combo of the two 
that is required. 


> Scan ACID problem with concurrent puts.
> ---
>
> Key: HBASE-4570
> URL: https://issues.apache.org/jira/browse/HBASE-4570
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.90.1, 0.90.3
>Reporter: Jonathan Hsieh
> Attachments: hbase-4570.tgz
>
>
> When scanning a table sometimes rows that have multiple column families get 
> split into two rows if there are concurrent writes.  In this particular case 
> we are overwriting the contents of a Get directly back onto itself as a Put.
> For example, this is a two cf row (with "f1", "f2", .. "f9" cfs).  It is 
> actually returned as two rows (#55 and #56). Interestingly if the two were 
> merged we would have a single proper row.
> Row row024461 had time stamps: [55: 
> keyvalues={row024461/f0:data/1318200440867/Put/vlen=1000, 
> row024461/f0:qual/1318200440867/Put/vlen=10, 
> row024461/f1:data/1318200440867/Put/vlen=1000, 
> row024461/f1:qual/1318200440867/Put/vlen=10, 
> row024461/f2:data/1318200440867/Put/vlen=1000, 
> row024461/f2:qual/1318200440867/Put/vlen=10, 
> row024461/f3:data/1318200440867/Put/vlen=1000, 
> row024461/f3:qual/1318200440867/Put/vlen=10, 
> row024461/f4:data/1318200440867/Put/vlen=1000, 
> row024461/f4:qual/1318200440867/Put/vlen=10}, 
> 56: keyvalues={row024461/f5:data/1318200440867/Put/vlen=1000, 
> row024461/f5:qual/1318200440867/Put/vlen=10, 
> row024461/f6:data/1318200440867/Put/vlen=1000, 
> row024461/f6:qual/1318200440867/Put/vlen=10, 
> row024461/f7:data/1318200440867/Put/vlen=1000, 
> row024461/f7:qual/1318200440867/Put/vlen=10, 
> row024461/f8:data/1318200440867/Put/vlen=1000, 
> row024461/f8:qual/1318200440867/Put/vlen=10, 
> row024461/f9:data/1318200440867/Put/vlen=1000, 
> row024461/f9:qual/1318200440867/Put/vlen=10}]
> I've only tested this on 0.90.1+patches and 0.90.3+patches, but it is 
> consistent and duplicatable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4570) Scan ACID problem with concurrent puts.

2011-10-14 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127887#comment-13127887
 ] 

Jonathan Hsieh commented on HBASE-4570:
---

@Ted

I have a strange situation where just with the fixes (first two patches, no 
instrumentation) I still get a lot of the failures in my test setup.  However 
with extra instrumentation failure seem to go away (runs a long time without 
encountering problems).  Note in my table setup, I have 10 cf's each with 2 
cols so the instrumentation is written to always expect 20 KVs.  I have two 
process -- one that does a filtered scan and twiddle, and another that just 
dues a filtered scan and count.

I ran TestAcidGuarantees in a loop on the instrumented version.  It eventually 
failed :(

{code}
Tests in error:
  testScanAtomicity(org.apache.hadoop.hbase.TestAcidGuarantees): Deferred
  testMixedAtomicity(org.apache.hadoop.hbase.TestAcidGuarantees): 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@54697123
 closed
{code}

With the instrumented version TestAcidGuarentees still fails -- 
It took about 10th iterations before this happened.

{code}
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 127.479 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 121.662 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 117.508 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 124.208 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 121.513 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 120.472 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 117.869 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 120.435 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 118.946 sec
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Tests run: 3, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 85.81 sec <<< 
FAILURE!
Tests run: 3, Failures: 0, Errors: 2, Skipped: 0
{code}


> Scan ACID problem with concurrent puts.
> ---
>
> Key: HBASE-4570
> URL: https://issues.apache.org/jira/browse/HBASE-4570
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.90.1, 0.90.3
>Reporter: Jonathan Hsieh
> Attachments: 4570-instrumentation.tgz, hbase-4570.tgz
>
>
> When scanning a table sometimes rows that have multiple column families get 
> split into two rows if there are concurrent writes.  In this particular case 
> we are overwriting the contents of a Get directly back onto itself as a Put.
> For example, this is a two cf row (with "f1", "f2", .. "f9" cfs).  It is 
> actually returned as two rows (#55 and #56). Interestingly if the two were 
> merged we would have a single proper row.
> Row row024461 had time stamps: [55: 
> keyvalues={row024461/f0:data/1318200440867/Put/vlen=1000, 
> row024461/f0:qual/1318200440867/Put/vlen=10, 
> row024461/f1:data/1318200440867/Put/vlen=1000, 
> row024461/f1:qual/1318200440867/Put/vlen=10, 
> row024461/f2:data/1318200440867/Put/vlen=1000, 
> row024461/f2:qual/1318200440867/Put/vlen=10, 
> row024461/f3:data/1318200440867/Put/vlen=1000, 
> row024461/f3:qual/1318200440867/Put/vlen=10, 
> row024461/f4:data/1318200440867/Put/vlen=1000, 
> row024461/f4:qual/1318200440867/Put/vlen=10}, 
> 56: keyvalues={row024461/f5:data/1318200440867/Put/vlen=1000, 
> row024461/f5:qual/1318200440867/Put/vlen=10, 
> row024461/f6:data/1318200440867/Put/vlen=1000, 
> row024461/f6:qual/1318200440867/Put/vlen=10, 
> row024461/f7:data/1318200440867/Put/vlen=1000, 
> row024461/f7:qual/1318200440867/Put/vlen=10, 
> row024461/f8:data/1318200440867/Put/vlen=1000, 
> row024461/f8:qual/1318200440867/Put/vlen=10, 
> row024461/f9:data/1318200440867/Put/vlen=1000, 
> row024461/f9:qual/1318200440867/Put/vlen=10}]
> I've only tested this on 0.90.1+patches and 0.90.3+patches, but it is 
> consistent and duplicatable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4570) Scan ACID problem with concurrent puts.

2011-10-17 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128867#comment-13128867
 ] 

Jonathan Hsieh commented on HBASE-4570:
---

@stack I've done testing on trunk and an 0.90 branch and the symptoms 
encountered with the testing programs is fixed.  Would be great to get on 0.90, 
0.92 and trunk.  Thanks!

> Scan ACID problem with concurrent puts.
> ---
>
> Key: HBASE-4570
> URL: https://issues.apache.org/jira/browse/HBASE-4570
> Project: HBase
>  Issue Type: Bug
>  Components: client, regionserver
>Affects Versions: 0.90.1, 0.90.3
>Reporter: Jonathan Hsieh
> Attachments: 4570-instrumentation.tgz, hbase-4570.tgz, hbase-4570.txt
>
>
> When scanning a table sometimes rows that have multiple column families get 
> split into two rows if there are concurrent writes.  In this particular case 
> we are overwriting the contents of a Get directly back onto itself as a Put.
> For example, this is a two cf row (with "f1", "f2", .. "f9" cfs).  It is 
> actually returned as two rows (#55 and #56). Interestingly if the two were 
> merged we would have a single proper row.
> Row row024461 had time stamps: [55: 
> keyvalues={row024461/f0:data/1318200440867/Put/vlen=1000, 
> row024461/f0:qual/1318200440867/Put/vlen=10, 
> row024461/f1:data/1318200440867/Put/vlen=1000, 
> row024461/f1:qual/1318200440867/Put/vlen=10, 
> row024461/f2:data/1318200440867/Put/vlen=1000, 
> row024461/f2:qual/1318200440867/Put/vlen=10, 
> row024461/f3:data/1318200440867/Put/vlen=1000, 
> row024461/f3:qual/1318200440867/Put/vlen=10, 
> row024461/f4:data/1318200440867/Put/vlen=1000, 
> row024461/f4:qual/1318200440867/Put/vlen=10}, 
> 56: keyvalues={row024461/f5:data/1318200440867/Put/vlen=1000, 
> row024461/f5:qual/1318200440867/Put/vlen=10, 
> row024461/f6:data/1318200440867/Put/vlen=1000, 
> row024461/f6:qual/1318200440867/Put/vlen=10, 
> row024461/f7:data/1318200440867/Put/vlen=1000, 
> row024461/f7:qual/1318200440867/Put/vlen=10, 
> row024461/f8:data/1318200440867/Put/vlen=1000, 
> row024461/f8:qual/1318200440867/Put/vlen=10, 
> row024461/f9:data/1318200440867/Put/vlen=1000, 
> row024461/f9:qual/1318200440867/Put/vlen=10}]
> I've only tested this on 0.90.1+patches and 0.90.3+patches, but it is 
> consistent and duplicatable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4436) Remove methods deprecated in 0.90 from TRUNK and 0.92

2011-10-19 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130954#comment-13130954
 ] 

Jonathan Hsieh commented on HBASE-4436:
---

Already gone:
* HBaseClusterTestCase (HBASE-4503)
* HServerLoad.addRegionInfo (HBASE-1502)

Trivial removals:
* MultiPut*, 
* KevValue.createFirstOnRow, 
* Get.addColumns(byte[][] columns), 
* Put.add(byte[] column, long ts, byte[] value), 
* Delete.deleteColumns(byte[] column), 
* Delete.deleteColumns(byte[] column, long ts)
* HBaseAdmin.modifyColumn(.., columnName, ..)
* HColumnDescriptor.CompressionType enum
* HConnectionManager.processBatchOfPuts / HConnection.processBatchOfPuts
* Result.sorted() 

Things that require a little work: (touches many places or requires some code, 
will make separate sub-jiras)
* RemoteExceptionHandler class (15 refs)
* Scan methods (4 ref - might have bug) 
* HBaseTestCase class (47 references)

I didn't encounter and thrift related problems.


> Remove methods deprecated in 0.90 from TRUNK and 0.92
> -
>
> Key: HBASE-4436
> URL: https://issues.apache.org/jira/browse/HBASE-4436
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Jonathan Hsieh
>Priority: Critical
>  Labels: noob
> Fix For: 0.92.0
>
>
> Remove methods deprecated in 0.90 from codebase.  i took a quick look.  The 
> messy bit is thrift referring to old stuff; will take a little work to do the 
> convertion over to the new methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-19 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131302#comment-13131302
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

0.90 version requires HBASE-4508

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
> hbase-4377-trunk.v2.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4622) Remove trivial 0.90 deprecated code from 0.92 and trunk.

2011-10-20 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132185#comment-13132185
 ] 

Jonathan Hsieh commented on HBASE-4622:
---

review here: https://reviews.apache.org/r/2520/

> Remove trivial 0.90 deprecated code from 0.92 and trunk.
> 
>
> Key: HBASE-4622
> URL: https://issues.apache.org/jira/browse/HBASE-4622
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4622.-Remove-trivial-0.90-deprecated-code-from.0.92.patch, 
> 0001-HBASE-4622.-Remove-trivial-0.90-deprecated-code-from.trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families

2011-10-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133190#comment-13133190
 ] 

Jonathan Hsieh commented on HBASE-4552:
---

Plan

1) Test to show there is an atomicity problem.  Likely just does not use 
LoadIncrementalHFiles
2) Fix for the region server side.
3) Rewrite of LoadIncrementalHFiles so that it groups the proper HFiles into 
the new bulkLoadHFile calls.  This will likely have two parallel steps - the 
first gather enough info to group HFiles and then the second that attempts to 
bulk load.

> multi-CF bulk load is not atomic across column families
> ---
>
> Key: HBASE-4552
> URL: https://issues.apache.org/jira/browse/HBASE-4552
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
>
> Currently the bulk load API simply imports one HFile at a time. With 
> multi-column-family support, this is inappropriate, since different CFs show 
> up separately. Instead, the IPC endpoint should take a of CF -> HFiles, so we 
> can online them all under a single region-wide lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families

2011-10-23 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133677#comment-13133677
 ] 

Jonathan Hsieh commented on HBASE-4552:
---

One more piece: Mechanism to atomically rollback if a partial failures 
encountered when attempting to bulk load multiple families.  

For example, let's say I want to bulk load a region with cfs A, B, C.  I issue 
a call to an RS region to atomically bulkload the HFiles.  The RS loads A and B 
successfully but fails on C (hdfs failure, or rs goes down, etc).  We should 
rollback A and B -- if we don't we would have A and B loaded but not C and have 
an atomicity violation.  






> multi-CF bulk load is not atomic across column families
> ---
>
> Key: HBASE-4552
> URL: https://issues.apache.org/jira/browse/HBASE-4552
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
>
> Currently the bulk load API simply imports one HFile at a time. With 
> multi-column-family support, this is inappropriate, since different CFs show 
> up separately. Instead, the IPC endpoint should take a of CF -> HFiles, so we 
> can online them all under a single region-wide lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families

2011-10-23 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133696#comment-13133696
 ] 

Jonathan Hsieh commented on HBASE-4552:
---

If we have an hdfs failure, we won't be able to record or update information 
about what failed. 
 
This make me think we need to journal/log the intended atomic actions.  Once we 
have the log, we can act depending on the situation:
* If we complete successfully, we remove/invalidate log and carry on.  
* If we fail (can't write, rs goes down and restarts), we check to see if 
everything is in.  If it isn't we rollback the subset of hfile loads that had 
happened.  If rollback fails, we still have the log, so we can try later or 
maybe we kill the RS?  

How about we make this a subtask/follow on jira.  The first cut will just 
detect the situation and  log error messages (similar to what currently 
happens).  A follow-on task will discuss and add/implement a recovery mechanism?


> multi-CF bulk load is not atomic across column families
> ---
>
> Key: HBASE-4552
> URL: https://issues.apache.org/jira/browse/HBASE-4552
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
>
> Currently the bulk load API simply imports one HFile at a time. With 
> multi-column-family support, this is inappropriate, since different CFs show 
> up separately. Instead, the IPC endpoint should take a of CF -> HFiles, so we 
> can online them all under a single region-wide lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families

2011-10-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134138#comment-13134138
 ] 

Jonathan Hsieh commented on HBASE-4552:
---

Created recovery mechanism jira at HBASE-4652

> multi-CF bulk load is not atomic across column families
> ---
>
> Key: HBASE-4552
> URL: https://issues.apache.org/jira/browse/HBASE-4552
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
>
> Currently the bulk load API simply imports one HFile at a time. With 
> multi-column-family support, this is inappropriate, since different CFs show 
> up separately. Instead, the IPC endpoint should take a of CF -> HFiles, so we 
> can online them all under a single region-wide lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4649) Add atomic bulk load function to region server

2011-10-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134153#comment-13134153
 ] 

Jonathan Hsieh commented on HBASE-4649:
---

review here: https://reviews.apache.org/r/2545/

> Add atomic bulk load function to region server
> --
>
> Key: HBASE-4649
> URL: https://issues.apache.org/jira/browse/HBASE-4649
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4649-Add-atomic-bulk-load-function-to-region-s.patch
>
>
> Add a method that atomically bulk load multiple hfiles.  Row atomicity 
> guarantees for multi-column family rows require this functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4649) Add atomic bulk load function to region server

2011-10-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134243#comment-13134243
 ] 

Jonathan Hsieh commented on HBASE-4649:
---

@Ted

The existing LoadIncrementalHFiles emphasizes behavior when bulk imported 
HFiles have the wrong region boundaries and require split.  The new tests in 
this patch focuses on the the atomicity properties.  createHFile was borrowed 
and modified so that the expected values would easy to demonstrate atomicity 
failures.  More code was actually borrowed from TestAcidGuarentees.

I'll comment about LoadIncrementalHFiles in HBASE-4649 -- I'm in the process of 
cleaning up a first cut patch.

> Add atomic bulk load function to region server
> --
>
> Key: HBASE-4649
> URL: https://issues.apache.org/jira/browse/HBASE-4649
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.90.4, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4649-Add-atomic-bulk-load-function-to-region-s.patch
>
>
> Add a method that atomically bulk load multiple hfiles.  Row atomicity 
> guarantees for multi-column family rows require this functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4649) Add atomic bulk load function to region server

2011-10-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134244#comment-13134244
 ] 

Jonathan Hsieh commented on HBASE-4649:
---

I meant comment about LoadIncrementalHFiles in HBASE-4650

> Add atomic bulk load function to region server
> --
>
> Key: HBASE-4649
> URL: https://issues.apache.org/jira/browse/HBASE-4649
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.90.4, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4649-Add-atomic-bulk-load-function-to-region-s.patch
>
>
> Add a method that atomically bulk load multiple hfiles.  Row atomicity 
> guarantees for multi-column family rows require this functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4650) Update LoadIncrementalHFiles to use atomic bulk load RS mechanism

2011-10-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134253#comment-13134253
 ] 

Jonathan Hsieh commented on HBASE-4650:
---

I'm in the process of cleaning up the modifications to LoadIncrementalHFiles 
and adding more tests before submitting HBASE-4650 for a for review.  This cut 
passes the two tests that use LoadIncrementalHFiles (TestLoadIncrementalHFiles 
and TestHFileOutputFormat).  I'll post a preliminary version for those 
interested.

In the code, there are significant logic changes due to grouping so I've chosen 
to take out the concurrency on the first cut because gathering and splitting 
HFiles into proper groups introduces a synchronization point that prevents some 
of the concurrency as before.  This is because groups need to be fully gathered 
before bulk loads in a region is attempted.  I'll include comments where 
concurrency is ok.  

Before I spend effort to parallelize this implementation more, I want to add 
another test to verify that this works while splits are going on.


> Update LoadIncrementalHFiles to use atomic bulk load RS mechanism
> -
>
> Key: HBASE-4650
> URL: https://issues.apache.org/jira/browse/HBASE-4650
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Jonathan Hsieh
> Fix For: 0.92.0
>
>
> MR jobs and command line bulk load execution runs use the 
> LoadIncrementalHFile.doBulkLoad.  This needs to be updated to group HFiles by 
> row/region so that rows can be atomically loaded multiple column families.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4649) Add atomic bulk load function to region server

2011-10-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134266#comment-13134266
 ] 

Jonathan Hsieh commented on HBASE-4649:
---

With respect to using a Queue vs List, I think he argument is moot -- because 
of the "gather" step we have a different instance of the region map, and 
different list of things to bulk load per iteration.

> Add atomic bulk load function to region server
> --
>
> Key: HBASE-4649
> URL: https://issues.apache.org/jira/browse/HBASE-4649
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.90.4, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4649-Add-atomic-bulk-load-function-to-region-s.patch
>
>
> Add a method that atomically bulk load multiple hfiles.  Row atomicity 
> guarantees for multi-column family rows require this functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4650) Update LoadIncrementalHFiles to use atomic bulk load RS mechanism

2011-10-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134553#comment-13134553
 ] 

Jonathan Hsieh commented on HBASE-4650:
---

initial parallel implementation here: https://reviews.apache.org/r/2557/

> Update LoadIncrementalHFiles to use atomic bulk load RS mechanism
> -
>
> Key: HBASE-4650
> URL: https://issues.apache.org/jira/browse/HBASE-4650
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4650-Update-LoadIncrementalHFiles-to-use-atomi.patch, 
> 0001-HBASE-4650-Update-LoadIncrementalHFiles-to-use-atomi.prelim.patch
>
>
> MR jobs and command line bulk load execution runs use the 
> LoadIncrementalHFile.doBulkLoad.  This needs to be updated to group HFiles by 
> row/region so that rows can be atomically loaded multiple column families.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4436) Remove methods deprecated in 0.90 from TRUNK and 0.92

2011-10-25 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134810#comment-13134810
 ] 

Jonathan Hsieh commented on HBASE-4436:
---

Yeah, busy busy. Here's where I am now: 

The trivial parts (HBASE-4622) is up for review.

I have a first cut of scan related (HBASE-4623), but this broke tests -- still 
working on this.

I haven't started the other two sub parts.

> Remove methods deprecated in 0.90 from TRUNK and 0.92
> -
>
> Key: HBASE-4436
> URL: https://issues.apache.org/jira/browse/HBASE-4436
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Jonathan Hsieh
>Priority: Critical
>  Labels: noob
> Fix For: 0.92.0
>
>
> Remove methods deprecated in 0.90 from codebase.  i took a quick look.  The 
> messy bit is thrift referring to old stuff; will take a little work to do the 
> convertion over to the new methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-25 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135324#comment-13135324
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

Seb glad to hear that this basically worked for you. 

Would it make sense to add Seb's change as a separate jira after the original 
patch gets committed?  IMO, it feels like it needs a test case as well.



> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
> hbase-4377-trunk.v2.patch, hbase-4377.trunk.v3.txt
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-25 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135478#comment-13135478
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

@Ted I'm basically ok wit it.  

@Seb can you post some of the bad .regioninfo files?  I'm curious about what 
you did to need to use a full rebuild!

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, 
> hbase-4377-trunk.v2.patch, hbase-4377.trunk.v3.txt
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4379) [hbck] Does not complain about tables with no end region [Z,]

2011-10-25 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135512#comment-13135512
 ] 

Jonathan Hsieh commented on HBASE-4379:
---

This one is very similar to HBASE-4378 which was recently review/commmitted, 
and comments/complaints about this one?

> [hbck] Does not complain about tables with no end region [Z,]
> -
>
> Key: HBASE-4379
> URL: https://issues.apache.org/jira/browse/HBASE-4379
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Affects Versions: 0.92.0, 0.90.5
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4379-hbck-does-not-complain-about-tables-with-.patch
>
>
> hbck does not detect or have an error condition when the last region of a 
> table is missing (end key != '').

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138729#comment-13138729
 ] 

Jonathan Hsieh commented on HBASE-4532:
---

This seems to be checked into trunk now and there seems to be an extraneous 
System.out.println that is causing some of my tests to "fail" when run from 
maven (apparently maven buffers in memory instead of writing it out as a test 
is executing).

Here's the OOME that maven reports:

Exception in thread "ThreadedStreamConsumer" java.lang.OutOfMemoryError: Java 
heap spaceat java.util.Arrays.copyOf(Arrays.java:2882)at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)at
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)at 
java.lang.StringBuffer.append(StringBuffer.java:224)at 
org.apache.maven.surefire.report.ConsoleOutputFileReporter.writeMessage(ConsoleOutputFileReporter.java:115)at
 
org.apache.maven.surefire.report.MulticastingReporter.writeMessage(MulticastingReporter.java:101)at
 
org.apache.maven.surefire.report.TestSetRunListener.writeTestOutput(TestSetRunListener.java:99)at
 
org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:132)at
 
org.apache.maven.plugin.surefire.booterclient.output.ThreadedStreamConsumer$Pumper.run(ThreadedStreamConsumer.java:67)at
 java.lang.Thread.run(Thread.java:662) man

I've attached a patch eliminates this issue.


> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---
>
> Key: HBASE-4532
> URL: https://issues.apache.org/jira/browse/HBASE-4532
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
> hbase-4532-89-fb.patch
>
>
> The previous jira, HBASE-4469, is to avoid the top row seek operation if 
> row-col bloom filter is enabled. 
> This jira tries to avoid top row seek for all the cases by creating a 
> dedicated bloom filter only for delete family
> The only subtle use case is when we are interested in the top row with empty 
> column.
> For example, 
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
> bloom filter will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last 
> kv for this row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are 
> trying to GET/SCAN a row with empty column.
> Evaluation from TestSeekOptimization:
> Previously:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1714 (68.40%), savings: 31.60%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
> enabled.[HBASE-4469]
> 
> After this change:
> For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
> optimization: 1458 (58.18%), savings: 41.82%
> So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4649) Add atomic bulk load function to region server

2011-10-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138813#comment-13138813
 ] 

Jonathan Hsieh commented on HBASE-4649:
---

There was a extra System.out.println introduced by HBASE-4532 that was causing 
the new unit tests to fail when run from mvn (worked fine from eclipse or via 
junit's test runner ' bin/hbase org.junit.runner.JUnitCore 
org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad')

I've attached patch there.


> Add atomic bulk load function to region server
> --
>
> Key: HBASE-4649
> URL: https://issues.apache.org/jira/browse/HBASE-4649
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 0.90.4, 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4649-Add-atomic-bulk-load-function-to-region-s.patch, 
> hbase-4649.v2.patch
>
>
> Add a method that atomically bulk load multiple hfiles.  Row atomicity 
> guarantees for multi-column family rows require this functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4677) Remove old single bulkLoadHFile method

2011-10-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138876#comment-13138876
 ] 

Jonathan Hsieh commented on HBASE-4677:
---

Updated patch that bumps RPC version.

> Remove old single bulkLoadHFile method
> --
>
> Key: HBASE-4677
> URL: https://issues.apache.org/jira/browse/HBASE-4677
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: hbase-4677.patch
>
>
> In review for HBASE-4649, there is some debate as whether to remove, 
> deprecate, or leave the HRegionServer.bulkLoadHFile method. 
> https://reviews.apache.org/r/2545/ .   This jira will take care of that for 
> the 0.92 and trunk releases, and allow the same patch to remain for an 
> optional 0.90.x patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families

2011-10-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138977#comment-13138977
 ] 

Jonathan Hsieh commented on HBASE-4552:
---

This was due to HBASE-4634 which got committed two days ago.  The old 
getTestDir was a public method and apparently was just removed.  This will 
probably break on trunk as well.

https://github.com/apache/hbase/commit/ed21cd6c4c266f610352d76d3d4b6f5cff492a97#src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java

I think this should be replaced with getDataTestDir calls (thats what the old 
bulk load test calls to getTestDir were changed to).

> multi-CF bulk load is not atomic across column families
> ---
>
> Key: HBASE-4552
> URL: https://issues.apache.org/jira/browse/HBASE-4552
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: hbase-4552.consolidated.patch
>
>
> Currently the bulk load API simply imports one HFile at a time. With 
> multi-column-family support, this is inappropriate, since different CFs show 
> up separately. Instead, the IPC endpoint should take a of CF -> HFiles, so we 
> can online them all under a single region-wide lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139386#comment-13139386
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

@Stack

I have a patch written that optionally handles filling in holes, but haven't 
polished it for review yet.  I'll add it after this patch gets through.  IIRC 
it adds this functionality to hbck and to the offline meta rebuilder.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, 
> EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, 
> hbase-4377-trunk.v2.patch, hbase-4377.trunk.v3.txt, hbase-4377.trunk.v4.txt, 
> hbase-4377.trunk.v5.txt
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139500#comment-13139500
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

i'll do an update tomorrow or monday to
address the nits and get the 0.90 version caught up again.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, 
> EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, 
> hbase-4377-trunk.v2.patch, hbase-4377.trunk.v3.txt, hbase-4377.trunk.v4.txt, 
> hbase-4377.trunk.v5.txt
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4677) Remove old single bulkLoadHFile method

2011-10-31 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140217#comment-13140217
 ] 

Jonathan Hsieh commented on HBASE-4677:
---

@Stack

I lean slightly towards removing instead of deprecating.  From those reviews, I 
was initially leaning towards deprecating until it became clear we'd need to 
bump the rpc version numbers in both cases.

The patch is broken out so it is easy to pick one path or the other.


> Remove old single bulkLoadHFile method
> --
>
> Key: HBASE-4677
> URL: https://issues.apache.org/jira/browse/HBASE-4677
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: hbase-4677.patch
>
>
> In review for HBASE-4649, there is some debate as whether to remove, 
> deprecate, or leave the HRegionServer.bulkLoadHFile method. 
> https://reviews.apache.org/r/2545/ .   This jira will take care of that for 
> the 0.92 and trunk releases, and allow the same patch to remain for an 
> optional 0.90.x patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families

2011-10-31 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140317#comment-13140317
 ] 

Jonathan Hsieh commented on HBASE-4552:
---

@Ram, on trunk or 0.92 branches, HTableDescriptor(conf,tablename) doesn't seem 
to be in the api.  In patch v4, it seems like all the HTable constructors have 
been updated to explicitly take a the configuration reference.

I'm assuming you meant HTable? 

> multi-CF bulk load is not atomic across column families
> ---
>
> Key: HBASE-4552
> URL: https://issues.apache.org/jira/browse/HBASE-4552
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: hbase-4552.consolidated.patch, 
> hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, 
> hbase-4552.consolidated.v4.patch
>
>
> Currently the bulk load API simply imports one HFile at a time. With 
> multi-column-family support, this is inappropriate, since different CFs show 
> up separately. Instead, the IPC endpoint should take a of CF -> HFiles, so we 
> can online them all under a single region-wide lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-10-31 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140533#comment-13140533
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

Addressed most of stack's comments:

* Removed try-catch from deleteTable.
* Updated comment related issues.
* Renamed splits in populateTable to values (splits is for region splits, the 
latter is for creating values.)
* Have separate patch for filling in holes.
* Removed setTableName and added internal check code to getTableName().
* Refactored the sidelining function to check rename returns.

I'm going to punt on these two.

* HRegion creation was done manually because the version that existed attempted 
to open stores and I didn't want or need that.
* MetaReader was not used because at the time I was trying to figure out the 
different table existence semantics in 0.90 vs trunk.   


> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, 
> EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, 
> hbase-4377-trunk.v2.patch, hbase-4377.trunk.v3.txt, hbase-4377.trunk.v4.txt, 
> hbase-4377.trunk.v5.txt
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4677) Remove old single bulkLoadHFile method

2011-10-31 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140742#comment-13140742
 ] 

Jonathan Hsieh commented on HBASE-4677:
---

I think in this case the api was flawed and the only real way to
fix is to extend. If this gets backported to 0.90.5 we'll keep the old
API call to maintain compatibility.

0.90 and 0.92 are different major versions so the apis can change.

> Remove old single bulkLoadHFile method
> --
>
> Key: HBASE-4677
> URL: https://issues.apache.org/jira/browse/HBASE-4677
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: hbase-4677.patch
>
>
> In review for HBASE-4649, there is some debate as whether to remove, 
> deprecate, or leave the HRegionServer.bulkLoadHFile method. 
> https://reviews.apache.org/r/2545/ .   This jira will take care of that for 
> the 0.92 and trunk releases, and allow the same patch to remain for an 
> optional 0.90.x patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4718) Backport HBASE-4552 to 0.90 branch.

2011-11-01 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141449#comment-13141449
 ] 

Jonathan Hsieh commented on HBASE-4718:
---

An initial backport was done by apurtell.  I've taken it and made it work 
against 0.90.  It requires a backport of HBASE-3316.  Before I submit, I would 
like test cross version RPC to verify compatibility or reasonable warning 
messages.

If it is decided not to integrate, I will post the patch after testing.

> Backport HBASE-4552 to 0.90 branch.
> ---
>
> Key: HBASE-4718
> URL: https://issues.apache.org/jira/browse/HBASE-4718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>
> In discussion of HBASE-4552 / HBASE-4677 there has been some discussion about 
> whether and how to backport HBASE-4552 to the 0.90 branch.  This is a 
> potentially compatibility breaking so several approaches hav ebeen suggested.
> 1) provide patch but do not integrate
> 2) integrate patch that extends and deprecates old api without removing old 
> api.  It has been argued that  clients are supposed to use 
> LoadIncrementalHFiles api and not at the internal HRegionServer RPC api.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4718) Backport HBASE-4552 to 0.90 branch.

2011-11-02 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142658#comment-13142658
 ] 

Jonathan Hsieh commented on HBASE-4718:
---

I have verified that stock 0.90.4's non-atomic bulk import still works against 
a standalone cluster running 0.90.5-snapshot including the HBASE-4718 patch 
(combined backport of HBASE-4552/HBASE-4716) and the HBASE-3316 patch (separate 
but trivial backport)

> Backport HBASE-4552 to 0.90 branch.
> ---
>
> Key: HBASE-4718
> URL: https://issues.apache.org/jira/browse/HBASE-4718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-4718.0.90.patch
>
>
> In discussion of HBASE-4552 / HBASE-4677 there has been some discussion about 
> whether and how to backport HBASE-4552 to the 0.90 branch.  This is a 
> potentially compatibility breaking so several approaches hav ebeen suggested.
> 1) provide patch but do not integrate
> 2) integrate patch that extends and deprecates old api without removing old 
> api.  It has been argued that  clients are supposed to use 
> LoadIncrementalHFiles api and not at the internal HRegionServer RPC api.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4718) Backport HBASE-4552 to 0.90 branch.

2011-11-02 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142691#comment-13142691
 ] 

Jonathan Hsieh commented on HBASE-4718:
---

Here are the results of the relevent unit tests running tests on 0.90.x.  I'm 
fairly confident that only known flakies could fail on the full run, will post 
any anomalies.

{code}

~/proj/hbase-0.90$ mvn test 
-Dtest=TestLoadIncrementalHFilesSplitRecovery,TestHRegionServerBulkLoad



---
 T E S T S
---
Running org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 47.143 sec
Running org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 72.985 sec

Results :

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
{code}

> Backport HBASE-4552 to 0.90 branch.
> ---
>
> Key: HBASE-4718
> URL: https://issues.apache.org/jira/browse/HBASE-4718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-4718.0.90.patch
>
>
> In discussion of HBASE-4552 / HBASE-4677 there has been some discussion about 
> whether and how to backport HBASE-4552 to the 0.90 branch.  This is a 
> potentially compatibility breaking so several approaches hav ebeen suggested.
> 1) provide patch but do not integrate
> 2) integrate patch that extends and deprecates old api without removing old 
> api.  It has been argued that  clients are supposed to use 
> LoadIncrementalHFiles api and not at the internal HRegionServer RPC api.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4740) [bulk load] the HBASE-4552 API can't tell if errors on region server is recoverable or unrecoverable error.

2011-11-03 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143642#comment-13143642
 ] 

Jonathan Hsieh commented on HBASE-4740:
---

While reworking the tests for recoverable and simulated unrecoverable failures 
with the updated api, I noticed that there are some problems in the test cases 
I previously wrote.  There will be some significant changes with the tests in 
this patch as well.

Oddly I have a case where splitting was not happening in a particular test case 
but is in another.  

> [bulk load]  the HBASE-4552 API can't tell if errors on region server is 
> recoverable or unrecoverable error.
> 
>
> Key: HBASE-4740
> URL: https://issues.apache.org/jira/browse/HBASE-4740
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>Priority: Critical
> Fix For: 0.92.0
>
>
> Running TestHFileOutputFormat more frequently seems to show that it has 
> become flaky.   It is difficult to tell if this is because of a unrecoverable 
> failure or a recoverable failure.   To make this visiable from test and for 
> users, we need to make a change to bulkload call's interface on 
> HRegionServer.  The change should make successful rpcs return true, 
> recoverable failures return false, and unrecoverable failure throw an 
> IOException.  This is an RPC change, so it would be really good to get this 
> api right before the final 0.92 goes out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4740) [bulk load] the HBASE-4552 API can't tell if errors on region server is recoverable or unrecoverable error.

2011-11-04 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144364#comment-13144364
 ] 

Jonathan Hsieh commented on HBASE-4740:
---

Review here: https://reviews.apache.org/r/2730/

> [bulk load]  the HBASE-4552 API can't tell if errors on region server is 
> recoverable or unrecoverable error.
> 
>
> Key: HBASE-4740
> URL: https://issues.apache.org/jira/browse/HBASE-4740
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4740-bulkload-HBASE-4552-API-can-t-tell-if-err.patch
>
>
> Running TestHFileOutputFormat more frequently seems to show that it has 
> become flaky.   It is difficult to tell if this is because of a unrecoverable 
> failure or a recoverable failure.   To make this visiable from test and for 
> users, we need to make a change to bulkload call's interface on 
> HRegionServer.  The change should make successful rpcs return true, 
> recoverable failures return false, and unrecoverable failure throw an 
> IOException.  This is an RPC change, so it would be really good to get this 
> api right before the final 0.92 goes out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4740) [bulk load] the HBASE-4552 API can't tell if errors on region server is recoverable or unrecoverable error.

2011-11-04 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144362#comment-13144362
 ] 

Jonathan Hsieh commented on HBASE-4740:
---

Ends up that I was splitting in the wrong place and splitting an empty region 
returns scary error messages when it should say return an innocuous one.

> [bulk load]  the HBASE-4552 API can't tell if errors on region server is 
> recoverable or unrecoverable error.
> 
>
> Key: HBASE-4740
> URL: https://issues.apache.org/jira/browse/HBASE-4740
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4740-bulkload-HBASE-4552-API-can-t-tell-if-err.patch
>
>
> Running TestHFileOutputFormat more frequently seems to show that it has 
> become flaky.   It is difficult to tell if this is because of a unrecoverable 
> failure or a recoverable failure.   To make this visiable from test and for 
> users, we need to make a change to bulkload call's interface on 
> HRegionServer.  The change should make successful rpcs return true, 
> recoverable failures return false, and unrecoverable failure throw an 
> IOException.  This is an RPC change, so it would be really good to get this 
> api right before the final 0.92 goes out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4740) [bulk load] the HBASE-4552 API can't tell if errors on region server is recoverable or unrecoverable error.

2011-11-04 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144485#comment-13144485
 ] 

Jonathan Hsieh commented on HBASE-4740:
---

@Stack

Yeah, 0 is actually the original behavior in the pre-HBASE-4552 version it I 
think would just eat exceptions and bail out without completing.  It is more 
complicated because of bulk atomicity.

Will update boolean if it works --  there is some template checking in another 
place so assumed it needed boxed type.

The difference is that the version uses a different LoadIncrementalHandlers 
instance.  I'll refactor to exclude that portion and require it in the test.

I tried the previous version with a small data set on psuedo-dist cluster and 
live cluster.  For this particular patch I tried this one by looping the 
relevant unit tests 100 times and seeing that they passed all the time.  I 
haven't tested this exact version on real cluster. 



> [bulk load]  the HBASE-4552 API can't tell if errors on region server is 
> recoverable or unrecoverable error.
> 
>
> Key: HBASE-4740
> URL: https://issues.apache.org/jira/browse/HBASE-4740
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4740-bulkload-HBASE-4552-API-can-t-tell-if-err.patch
>
>
> Running TestHFileOutputFormat more frequently seems to show that it has 
> become flaky.   It is difficult to tell if this is because of a unrecoverable 
> failure or a recoverable failure.   To make this visiable from test and for 
> users, we need to make a change to bulkload call's interface on 
> HRegionServer.  The change should make successful rpcs return true, 
> recoverable failures return false, and unrecoverable failure throw an 
> IOException.  This is an RPC change, so it would be really good to get this 
> api right before the final 0.92 goes out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4740) [bulk load] the HBASE-4552 API can't tell if errors on region server is recoverable or unrecoverable error.

2011-11-05 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144842#comment-13144842
 ] 

Jonathan Hsieh commented on HBASE-4740:
---

@Stack I've updated the patch -- if this is insufficient, I'm probably going to 
be spotty for a week or so.

> [bulk load]  the HBASE-4552 API can't tell if errors on region server is 
> recoverable or unrecoverable error.
> 
>
> Key: HBASE-4740
> URL: https://issues.apache.org/jira/browse/HBASE-4740
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-4740-bulkload-HBASE-4552-API-can-t-tell-if-err.patch, 
> hbase-4740.v2.patch
>
>
> Running TestHFileOutputFormat more frequently seems to show that it has 
> become flaky.   It is difficult to tell if this is because of a unrecoverable 
> failure or a recoverable failure.   To make this visiable from test and for 
> users, we need to make a change to bulkload call's interface on 
> HRegionServer.  The change should make successful rpcs return true, 
> recoverable failures return false, and unrecoverable failure throw an 
> IOException.  This is an RPC change, so it would be really good to get this 
> api right before the final 0.92 goes out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-11-06 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145197#comment-13145197
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

@mingjian 

If there was a split that didn't complete cleanly, a parent region with 
daughters should look like an overlap.  The tool will tell you where these 
overlaps are.

One way to fix the problem is to keep the parent region and then move or remove 
the daughter regions from hdfs.  Since it is in the middle of a split, the 
parent should have all the data.  Alternately, you could copy the store files 
from the daughters into the dir of the parent and then run the offline 
rebuilder.

I plan on writing a blog post and hopefully adding to the book on how to fix 
these problems.

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, 
> EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, 
> hbase-4377-trunk.v2.patch, hbase-4377.0.90.v6.patch, hbase-4377.trunk.v3.txt, 
> hbase-4377.trunk.v4.txt, hbase-4377.trunk.v5.txt, hbase-4377.trunk.v6.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4718) Backport HBASE-4552 to 0.90 branch.

2011-11-14 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150062#comment-13150062
 ] 

Jonathan Hsieh commented on HBASE-4718:
---

Patch *failed* to apply..

> Backport HBASE-4552 to 0.90 branch.
> ---
>
> Key: HBASE-4718
> URL: https://issues.apache.org/jira/browse/HBASE-4718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.90.5
>
> Attachments: 4718-v2.90, hbase-4718.0.90.patch, 
> hbase-4718.v3.includes-hbase-3316.patch, hbase-4718.v4.patch
>
>
> In discussion of HBASE-4552 / HBASE-4677 there has been some discussion about 
> whether and how to backport HBASE-4552 to the 0.90 branch.  This is a 
> potentially compatibility breaking so several approaches hav ebeen suggested.
> 1) provide patch but do not integrate
> 2) integrate patch that extends and deprecates old api without removing old 
> api.  It has been argued that  clients are supposed to use 
> LoadIncrementalHFiles api and not at the internal HRegionServer RPC api.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4718) Backport HBASE-4552 to 0.90 branch.

2011-11-14 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150061#comment-13150061
 ] 

Jonathan Hsieh commented on HBASE-4718:
---

Patch filed to apply since this was targeted to the 0.90 branch instead of the 
trunk/0.92 branch.  Attached output of selected unit tests run.


{code}
---
 T E S T S
---
Running org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 146.315 sec
Running org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.407 sec

Results :

Tests run: 10, Failures: 0, Errors: 0, Skipped: 0
{code}

> Backport HBASE-4552 to 0.90 branch.
> ---
>
> Key: HBASE-4718
> URL: https://issues.apache.org/jira/browse/HBASE-4718
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.90.5
>
> Attachments: 4718-v2.90, hbase-4718.0.90.patch, 
> hbase-4718.v3.includes-hbase-3316.patch, hbase-4718.v4.patch
>
>
> In discussion of HBASE-4552 / HBASE-4677 there has been some discussion about 
> whether and how to backport HBASE-4552 to the 0.90 branch.  This is a 
> potentially compatibility breaking so several approaches hav ebeen suggested.
> 1) provide patch but do not integrate
> 2) integrate patch that extends and deprecates old api without removing old 
> api.  It has been argued that  clients are supposed to use 
> LoadIncrementalHFiles api and not at the internal HRegionServer RPC api.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4506) [hbck] Allow HBaseFsck to be instantiated without connecting

2011-11-14 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150217#comment-13150217
 ] 

Jonathan Hsieh commented on HBASE-4506:
---

@Nicolas I don't see the revert on this particular patch -- the executor 
instantiation is just moved to a separate method and uses the same numThreads 
value which is the hard coded value or the one set in the hbase-site.xml file.

Which lines are we talking about?  

> [hbck] Allow HBaseFsck to be instantiated without connecting
> 
>
> Key: HBASE-4506
> URL: https://issues.apache.org/jira/browse/HBASE-4506
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Affects Versions: 0.90.5
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.90.5
>
> Attachments: 
> 0001-HBASE-4506-hbck-Allow-HBaseFsck-to-be-instantiated-w.patch, 
> hbase-4506-0.90.patch
>
>
> This is a semantics preserving patch that allows for offline meta rebuild 
> (HBASE-4377) to reuse code in the existing hbck code when hbase is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4804) Minor Dyslexia in CHANGES.txt

2011-11-16 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151630#comment-13151630
 ] 

Jonathan Hsieh commented on HBASE-4804:
---

Haha..  I have a spelling problem and a tendency to omit words which may be 
incurable. :)

> Minor Dyslexia in CHANGES.txt
> -
>
> Key: HBASE-4804
> URL: https://issues.apache.org/jira/browse/HBASE-4804
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: hbase-4804.patch
>
>
> I was going through the 0.92 CHANGES and found are a few entries in 
> CHANGES.txt where jira numbers don't match up descriptions.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-11-16 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151743#comment-13151743
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

Todd too a quick look and mentioned that "fs.defaultFS" is a Hadoop 0.21+'ism.  
On a 0.20.x release nothing really happens.  Any concerns about this on the 
0.90 backport?

{code}
+  public static void main(String[] args) throws Exception {
+
+// create a fsck object
+Configuration conf = HBaseConfiguration.create();
+conf.set("fs.defaultFS", conf.get(HConstants.HBASE_DIR));
+HBaseFsck fsck = new HBaseFsck(conf);
+
+
{code{

> [hbck] Offline rebuild .META. from fs data only.
> 
>
> Key: HBASE-4377
> URL: https://issues.apache.org/jira/browse/HBASE-4377
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.92.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
> 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, 
> EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, 
> hbase-4377-trunk.v2.patch, hbase-4377.0.90.v6.patch, hbase-4377.trunk.v3.txt, 
> hbase-4377.trunk.v4.txt, hbase-4377.trunk.v5.txt, hbase-4377.trunk.v6.patch
>
>
> In a worst case situation, it may be helpful to have an offline .META. 
> rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
> from scratch.  Users could move bad regions out until there is a clean 
> rebuild.  
> It would likely fill in region split holes.  Follow on work could given 
> options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4623) Remove @deprecated Scan methods in 0.90 from TRUNK and 0.92

2011-11-17 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152040#comment-13152040
 ] 

Jonathan Hsieh commented on HBASE-4623:
---

TestShell is failing and I need some hints on where to find test output from 
TestShell.

I'm getting a error in TestShell, likely because some methods have been removed 
from Scan.  It is telling me:

{code}
...
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
Caused by: org.jruby.exceptions.RaiseException: (RuntimeError) Shell unit tests 
failed. Check output file for details.
{code}

Currently I'm trying to run via 'mvn test -Dtest=TestShell' and don't know 
where to get this output logging.  (looking in target/surefire-reports doesn't 
provide useful log data).



> Remove @deprecated Scan methods in 0.90 from TRUNK and 0.92
> ---
>
> Key: HBASE-4623
> URL: https://issues.apache.org/jira/browse/HBASE-4623
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.94.0
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4623) Remove @deprecated Scan methods in 0.90 from TRUNK and 0.92

2011-11-18 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153195#comment-13153195
 ] 

Jonathan Hsieh commented on HBASE-4623:
---

@stack Is that fix in reference to HBASE-4973 or specific to this patch?  (This 
one I'm mentioning is specific to a patch on this one).

Is there a place to find this without jenkins?


> Remove @deprecated Scan methods in 0.90 from TRUNK and 0.92
> ---
>
> Key: HBASE-4623
> URL: https://issues.apache.org/jira/browse/HBASE-4623
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.94.0
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4623) Remove @deprecated Scan methods in 0.90 from TRUNK and 0.92

2011-11-18 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153208#comment-13153208
 ] 

Jonathan Hsieh commented on HBASE-4623:
---

Found the info -- it is in 
hbase/target/surefire-reports/TEST-org.apache.hadoop.hbase.client.TestShell.xml 

> Remove @deprecated Scan methods in 0.90 from TRUNK and 0.92
> ---
>
> Key: HBASE-4623
> URL: https://issues.apache.org/jira/browse/HBASE-4623
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.94.0
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4623) Remove @deprecated Scan methods in 0.90 from TRUNK and 0.92

2011-11-18 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153271#comment-13153271
 ] 

Jonathan Hsieh commented on HBASE-4623:
---

@stack.  hbase-4623-0.92.patch doesn't apply on trunk.  The robot tried the 
0.92 version on trunk.

I did the diff for trunk backwards.  Fixing.

> Remove @deprecated Scan methods in 0.90 from TRUNK and 0.92
> ---
>
> Key: HBASE-4623
> URL: https://issues.apache.org/jira/browse/HBASE-4623
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 0.92.0
>
> Attachments: hbase-4623-0.92.patch, hbase-4623.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-18 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153282#comment-13153282
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

I've been looping TestAcidGuarantee's fro about 6 hours now and it is still 
chugging along and has not  failed.  I'm going to let it go overnight.  (I 
believe it used to fail within an hour)  

What are thoughts on backporting this onto the 0.92 branch?   (as a separate 
issue..)

> TestAcidGuarantee broken on trunk 
> --
>
> Key: HBASE-2856
> URL: https://issues.apache.org/jira/browse/HBASE-2856
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89.20100621
>Reporter: ryan rawson
>Assignee: Amitanand Aiyer
>Priority: Blocker
> Fix For: 0.94.0
>
> Attachments: 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 
> 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt
>
>
> TestAcidGuarantee has a test whereby it attempts to read a number of columns 
> from a row, and every so often the first column of N is different, when it 
> should be the same.  This is a bug deep inside the scanner whereby the first 
> peek() of a row is done at time T then the rest of the read is done at T+1 
> after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
> data becomes committed and flushed to disk.
> One possible solution is to introduce the memstoreTS (or similarly equivalent 
> value) to the HFile thus allowing us to preserve read consistency past 
> flushes.  Another solution involves fixing the scanners so that peek() is not 
> destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153774#comment-13153774
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

On trunk, TestAcidGuarantees ran for a solid day and a half (33+ hours) without 
failing.  

larsh@ I'll loop the 0.92 version and let it run through today and report how 
it fared around midday monday.

> TestAcidGuarantee broken on trunk 
> --
>
> Key: HBASE-2856
> URL: https://issues.apache.org/jira/browse/HBASE-2856
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89.20100621
>Reporter: ryan rawson
>Assignee: Amitanand Aiyer
>Priority: Blocker
> Fix For: 0.94.0
>
> Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
> 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
> 2856-v9-all-inclusive.txt, acid.txt
>
>
> TestAcidGuarantee has a test whereby it attempts to read a number of columns 
> from a row, and every so often the first column of N is different, when it 
> should be the same.  This is a bug deep inside the scanner whereby the first 
> peek() of a row is done at time T then the rest of the read is done at T+1 
> after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
> data becomes committed and flushed to disk.
> One possible solution is to introduce the memstoreTS (or similarly equivalent 
> value) to the HFile thus allowing us to preserve read consistency past 
> flushes.  Another solution involves fixing the scanners so that peek() is not 
> destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153775#comment-13153775
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

On trunk, TestAcidGuarantees ran for a solid day and a half (33+ hours) without 
failing.  

larsh@ I'll loop the 0.92 version and let it run through today and report how 
it fared around midday monday.

> TestAcidGuarantee broken on trunk 
> --
>
> Key: HBASE-2856
> URL: https://issues.apache.org/jira/browse/HBASE-2856
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89.20100621
>Reporter: ryan rawson
>Assignee: Amitanand Aiyer
>Priority: Blocker
> Fix For: 0.94.0
>
> Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
> 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
> 2856-v9-all-inclusive.txt, acid.txt
>
>
> TestAcidGuarantee has a test whereby it attempts to read a number of columns 
> from a row, and every so often the first column of N is different, when it 
> should be the same.  This is a bug deep inside the scanner whereby the first 
> peek() of a row is done at time T then the rest of the read is done at T+1 
> after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
> data becomes committed and flushed to disk.
> One possible solution is to introduce the memstoreTS (or similarly equivalent 
> value) to the HFile thus allowing us to preserve read consistency past 
> flushes.  Another solution involves fixing the scanners so that peek() is not 
> destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153925#comment-13153925
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

@larsh I posted it for you here.  https://reviews.apache.org/r/2893/

I applied the patch, committed it and generated a git-patch via 'git 
format-patch HEAD^' which has enough info to find the right branch.

> TestAcidGuarantee broken on trunk 
> --
>
> Key: HBASE-2856
> URL: https://issues.apache.org/jira/browse/HBASE-2856
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89.20100621
>Reporter: ryan rawson
>Assignee: Amitanand Aiyer
>Priority: Blocker
> Fix For: 0.94.0
>
> Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
> 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
> 2856-v9-all-inclusive.txt, acid.txt
>
>
> TestAcidGuarantee has a test whereby it attempts to read a number of columns 
> from a row, and every so often the first column of N is different, when it 
> should be the same.  This is a bug deep inside the scanner whereby the first 
> peek() of a row is done at time T then the rest of the read is done at T+1 
> after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
> data becomes committed and flushed to disk.
> One possible solution is to introduce the memstoreTS (or similarly equivalent 
> value) to the HFile thus allowing us to preserve read consistency past 
> flushes.  Another solution involves fixing the scanners so that peek() is not 
> destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154249#comment-13154249
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

@lars the 0.92 version or TestAcidGuarantees ran for about 12 hours without 
problems. 


> TestAcidGuarantee broken on trunk 
> --
>
> Key: HBASE-2856
> URL: https://issues.apache.org/jira/browse/HBASE-2856
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89.20100621
>Reporter: ryan rawson
>Assignee: Amitanand Aiyer
>Priority: Blocker
> Fix For: 0.94.0
>
> Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
> 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
> 2856-v9-all-inclusive.txt, acid.txt
>
>
> TestAcidGuarantee has a test whereby it attempts to read a number of columns 
> from a row, and every so often the first column of N is different, when it 
> should be the same.  This is a bug deep inside the scanner whereby the first 
> peek() of a row is done at time T then the rest of the read is done at T+1 
> after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
> data becomes committed and flushed to disk.
> One possible solution is to introduce the memstoreTS (or similarly equivalent 
> value) to the HFile thus allowing us to preserve read consistency past 
> flushes.  Another solution involves fixing the scanners so that peek() is not 
> destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154443#comment-13154443
 ] 

Jonathan Hsieh commented on HBASE-4820:
---

@Kannan, I'm looking at this from the point of view of someone who recently 
spent a many hours reviewing the dist log splitting patches in aggregate and 
may be responsible for fixing issues if it has problems.  I had a harder time 
than I'd prefer, and will likely have the same problem again if there are 
problems in the future.  Doing a little bit of semantics preserving changes 
such as making var/method/class names more descriptive and encapsulating pieces 
would go a long way to make the code more easily and quickly understandable by 
more people.

Are you suggesting splitting these changes into smaller pieces such as:

* add better exception error messages.
* consolidate calls only used once. Ex: async callbacks submethods; inline 
finishInitailize into SLM's constructor
* rename vague methods. ex: installTask(String taskName) might be better as 
enqueueSplitLog(String logPath);  handleDeadWorker might be better as 
blacklistDeadWorker;  'exec(String name, Progressable)' might be better as  
'split(String logfilename, Progressable)'
* rename vague classes. ex: Task to SplitTask, TaskBatch to 
SplitTaskState/SplitTaskContext
* correct comments to be consistent with code (comments in SplitLogWorker talks 
about SUCCESS state which acutally is DONE state).








> Distributed log splitting coding enhancement to make it easier to understand, 
> no semantics change
> -
>
> Key: HBASE-4820
> URL: https://issues.apache.org/jira/browse/HBASE-4820
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
>  Labels: newbie
> Fix For: 0.94.0
>
> Attachments: 
> 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
>  
> 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch
>
>
> In reviewing distributed log splitting feature, we found some cosmetic 
> issues.  They make the code hard to understand.
> It will be great to fix them.  For this issue, there should be no semantic 
> change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154453#comment-13154453
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

On the bulkload operation, the error has something to do with the split point 
-- in the test I force a split and the resulting error has something to do with 
the point where the start of the second daughter.

@Lars -- since the original issue is resolved, and since this seems non-trival, 
maybe this should get move into a new issue?

> TestAcidGuarantee broken on trunk 
> --
>
> Key: HBASE-2856
> URL: https://issues.apache.org/jira/browse/HBASE-2856
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89.20100621
>Reporter: ryan rawson
>Assignee: Amitanand Aiyer
>Priority: Blocker
> Fix For: 0.94.0
>
> Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
> 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
> 2856-v9-all-inclusive.txt, acid.txt
>
>
> TestAcidGuarantee has a test whereby it attempts to read a number of columns 
> from a row, and every so often the first column of N is different, when it 
> should be the same.  This is a bug deep inside the scanner whereby the first 
> peek() of a row is done at time T then the rest of the read is done at T+1 
> after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
> data becomes committed and flushed to disk.
> One possible solution is to introduce the memstoreTS (or similarly equivalent 
> value) to the HFile thus allowing us to preserve read consistency past 
> flushes.  Another solution involves fixing the scanners so that peek() is not 
> destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154686#comment-13154686
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Output Examples:

Note that the ZK assignment and the META assignment did not change.
{code}
// hbck -fix call
ERROR: Region 
tableBadMetaAssign,,1321733234211.35120fc878802e3b6829e6d7b597b44c. listed in 
META on region server ubuntu64-build01.sf.cloudera.com,51134,1321733229687 but 
found on region server ubuntu64-build01.sf.cloudera.com,38112,1321733229583
Trying to fix assignment error...
...
// hbck after fix
ERROR: Region 
tableBadMetaAssign,,1321733234211.35120fc878802e3b6829e6d7b597b44c. listed in 
META on region server ubuntu64-build01.sf.cloudera.com,51134,1321733229687 but 
found on region server ubuntu64-build01.sf.cloudera.com,38112,1321733229583
{code}

Note that the ZK assignment changed but meta had not yet changed.
{code}
// hbck -fix
ERROR: Region 
tableBadMetaAssign,,1321719700727.af24fbbe3e1df676b8e31e3ff5765fb6. listed in 
META on region server p0123.sf.cloudera.com,36067,1321719696277 but found on 
region server p0123.sf.cloudera.com,54221,1321719696237
Trying to fix assignment error...
...
// hbck after fix
ERROR: Region 
tableBadMetaAssign,,1321719700727.af24fbbe3e1df676b8e31e3ff5765fb6. listed in 
META on region server p0123.sf.cloudera.com,36067,1321719696277 but found on 
region server p0123.sf.cloudera.com,59522,1321719696305
{code}

> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Reporter: Jonathan Hsieh
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154729#comment-13154729
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Hm.. this looks like a race or due to the lack of a rendezvous of some sort.  
Up to HBASE-4378, there was a 15000ms (yikes 15 sec!) sleep between the 'hbck 
-fix' call and the subsequent 'hbck' call that is supposed to be clean.  
HBASE-4703 removed this.  

My hunch is that maybe the update to META the 'hbck -fix' does isn't seen on 
the second 'hbck' run.

https://github.com/apache/hbase/commit/6ca0e79a6ac92190238d5cda56f787ab9702d7fc#L61L138
TestHBaseFsck.java:138 


> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Reporter: Jonathan Hsieh
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154788#comment-13154788
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

The story behind this problem.

HBCK repairs a bad assignment using the admin interface to reassign a 
particular region.   offlining the region in ZK.  This calls master.assign -- 
eventually the master uses its serverManager and issues an 
HRegionServer.openRegion().

Looks like the HRegionServer.openRegion being essentially asynchronous and 
causes the failure.  The call submits an OpenRegionHandler (ORH) callback to 
the RS's ExecutorService and then immediately returns the RegionState to OPENED.

The ORH thread calls ORH.process -> updateMeta, which creates a 
PostOpenDeployTaskThread and starts another thread that calls  
HRegionServer.postOpenDeployTasks -> MetaEditor.updateRegionLocation which 
updates the meta table.  

The problem is that the RegionState OPENED is reported to the master even 
though it may not have written all its new assignment to META yet.




> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Reporter: Jonathan Hsieh
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154791#comment-13154791
 ] 

Jonathan Hsieh commented on HBASE-4842:
---


I've attached a patch that inserts a sleep into the RegionServer code right 
before writing to meta which causes the test to fail consistently.  There are 
some hanging threads if you run this using mvn.  I ran the change in eclipse as 
a unit test where it fails the test (but the unit test remains hung).


> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Reporter: Jonathan Hsieh
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154802#comment-13154802
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Stack.

For now the adding a sleep.  Longer term adding some synchronization options 
for the open region call or add update the regions state to returning something 
like OPENING state and then OPEN state after meta and zk have been updated.

> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4, 0.92.0, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-4842-breaker.patch, hbase-4842.patch
>
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154834#comment-13154834
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Looks like it failed on iteration 39.. 

> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4, 0.92.0, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-4842-breaker.patch, hbase-4842.patch
>
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154835#comment-13154835
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Hm.. minicluster failed to start properly in that one.  Seems likely due to 
problem in there somewhere.

> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4, 0.92.0, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-4842-breaker.patch, hbase-4842.patch
>
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-22 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155543#comment-13155543
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

I'll file a new issue.

The main issue isn't what is returned, but when.  With the first 'hbck -fix', 
the master makes a call to the regionserver to issue a request open the region 
(which adds data to meta).  This returns right away.  The next hbck call will 
cause the master query meta again which is used to check consistency.  
Sometimes the new meta entries are fixed before the second hbck call is done 
(failing the test), sometimes it is not (not failing).  

The slight delay allows the open request to finish and the meta entry to be 
updated before the subsequent 'hbck' call.

> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4, 0.92.0, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch
>
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-22 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155544#comment-13155544
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Also, I don't think dist log splitting has anything do to with this failure.

> [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
> ---
>
> Key: HBASE-4842
> URL: https://issues.apache.org/jira/browse/HBASE-4842
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4, 0.92.0, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch
>
>
> Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
> is intermittently failing.
> In the test, a region's assignment is purposely changed in META but not in 
> ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
> clean comes up with a new ZK assignment but with META still being 
> inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
> sometimes it "moves" to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4866) Fix possible NPE in AssignmentManager#regionOnline

2011-11-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156779#comment-13156779
 ] 

Jonathan Hsieh commented on HBASE-4866:
---


Looks like it corresponds to this line which is AssignmentManager:724 on the 
0.90 branch

{code}
  HServerInfo hsiWithoutLoad = new HServerInfo(
serverInfo.getServerAddress(), serverInfo.getStartCode(),
serverInfo.getInfoPort(), serverInfo.getHostname());
{code}   

> Fix possible NPE in AssignmentManager#regionOnline
> --
>
> Key: HBASE-4866
> URL: https://issues.apache.org/jira/browse/HBASE-4866
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>
> NPE encountered in users's HMaster logs:
> {code}
> 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting 
> shutdown.
> java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
>at 
> org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
>at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422)
>at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295)
> {code}
> From user list: 
> http://mail-archives.apache.org/mod_mbox/hbase-user/20.mbox/%3C4ECC9AFC.6030307%40qualtrics.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4868) testMetaRebuild#TestOfflineMetaRebuildBase occasionally fails

2011-11-25 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157346#comment-13157346
 ] 

Jonathan Hsieh commented on HBASE-4868:
---

@Jinchao

Also, if this is call that can indefinitely block, I'd add timeout values for 
the test.

So instead of just 

{code}
@Test
{code}

change it to 

{code}
@Test(timeout=180)  // fail test after 180s
{code}

> testMetaRebuild#TestOfflineMetaRebuildBase occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4868_trial.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4868) TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails

2011-11-26 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157550#comment-13157550
 ] 

Jonathan Hsieh commented on HBASE-4868:
---

I mis-spoke -- the timeouts are already there.  Sorry about that.  

The check should be added to similar spots in the tests in 
TestOfflineMetaRebuildHole and TestOfflineMetaRebuildOverlap -- they would 
likely be vulnerable to the same kind of race.

> TestOfflineMetaRebuildBase#testMetaRebuild occasionally fails
> -
>
> Key: HBASE-4868
> URL: https://issues.apache.org/jira/browse/HBASE-4868
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.0
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4868_trial.patch, HBASE-4868_trunkv2.patch
>
>
> looks: 
> https://builds.apache.org/job/HBase-TRUNK-security/7/testReport/org.apache.hadoop.hbase.util.hbck/TestOfflineMetaRebuildBase/testMetaRebuild/
> Please review, see whether the method makes sense? 
> If it makes sense, I will check other cases?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-26 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157636#comment-13157636
 ] 

Jonathan Hsieh commented on HBASE-4862:
---

How feasible is it to add testing to this patch?  Maybe simulate the failure 
situation by aborting RS's and then starting them like in the 
TestSplitTransactionOnCluster tests?

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for 
> trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-26 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157653#comment-13157653
 ] 

Jonathan Hsieh commented on HBASE-4838:
---

@Lars:

+1 lgtm.

I didn't do a deep code review but I applied v3 and tested the 
TestAcidGuarantees ran it 20 times, and also ran the failures enumerated in 
HBASE-2856 they all pass.  (Wow, the diff between v1 and v3 is pretty subtle.)





> Port 2856 (TestAcidGuarantee is failing) to 0.92
> 
>
> Key: HBASE-4838
> URL: https://issues.apache.org/jira/browse/HBASE-4838
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0
>
> Attachments: 4838-v1.txt, 4838-v3.txt
>
>
> Moving back port into a separate issue (as suggested by JonH), because this 
> not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-26 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157671#comment-13157671
 ] 

Jonathan Hsieh commented on HBASE-4862:
---

@chenhui

I have a question and a few nits. 

What happens if the .temp gets left behind without being renamed?

You might want to mention that hlogs files in progress (.temp file suffixed) 
are excluded here.
{code}
+// After creating writer, simulate partial region's
+// replayRecoveredEditsIfAny() which gets SplitEditFiles of this
+// region,and delete them.
{code}

Also, probably want to update javadoc of getSplitEditFilesSorted.

Comment should probably be "most likely" instead of "mostly"
{code}
+try{
+  logSplitter.splitLog();
+} catch (IOException e) {
+  LOG.info(e);
+  Assert.fail("Throws IOException when spliting "
+  + "log, it is mostly because writing file does not "
+  + "exist which is caused by concurrent replayRecoveredEditsIfAny()");
+}
+if (fs.exists(corruptDir)) {
+  if (fs.listStatus(corruptDir).length > 0) {
+Assert.fail("There are some corrupt logs, "
++ "it is mostly caused by concurrent replayRecoveredEditsIfAny()");
+  }
+}
+  }
{code}


> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, 
> hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for 
> trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, 
> hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, 
> hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4912) HDFS API Changes

2011-12-01 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160713#comment-13160713
 ] 

Jonathan Hsieh commented on HBASE-4912:
---

I think this this might be one that is related to this bucket HADOOP-7873.

> HDFS API Changes
> 
>
> Key: HBASE-4912
> URL: https://issues.apache.org/jira/browse/HBASE-4912
> Project: HBase
>  Issue Type: Sub-task
>  Components: client, regionserver
>Reporter: Nicolas Spiegelberg
>Assignee: Pritam Damania
> Fix For: 0.94.0
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   >