[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store

2012-02-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204802#comment-13204802
 ] 

stack commented on HBASE-5367:
--

{code}
long flushSize = this.htableDescriptor.getMemStoreFlushSize();
if (flushSize == HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE) {
  flushSize = conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE,
 HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE);
}
{code}

So, looks like DEFAULT_MEMSTORE_FLUSH_SIZE is 64M which is confusing and we'll 
use whats in HTD IFF its different from this default.

Yeah, easy to get confused.

 [book] small formatting changes to compaction description in 
 Arch/Regions/Store
 ---

 Key: HBASE-5367
 URL: https://issues.apache.org/jira/browse/HBASE-5367
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_hbase_5367.xml.patch, book_hbase_5367_2.xml.patch


 Fixing a few small-but-important things that came out of a post-commit 
 comment in HBASE-5365
 book.xml
 * corrected default region flush size (it's actually 64mb)
 * removed trailing 'F' in a ratio discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store

2012-02-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204801#comment-13204801
 ] 

stack commented on HBASE-5367:
--

{code}
long flushSize = this.htableDescriptor.getMemStoreFlushSize();
if (flushSize == HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE) {
  flushSize = conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE,
 HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE);
}
{code}

So, looks like DEFAULT_MEMSTORE_FLUSH_SIZE is 64M which is confusing and we'll 
use whats in HTD IFF its different from this default.

Yeah, easy to get confused.

 [book] small formatting changes to compaction description in 
 Arch/Regions/Store
 ---

 Key: HBASE-5367
 URL: https://issues.apache.org/jira/browse/HBASE-5367
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_hbase_5367.xml.patch, book_hbase_5367_2.xml.patch


 Fixing a few small-but-important things that came out of a post-commit 
 comment in HBASE-5365
 book.xml
 * corrected default region flush size (it's actually 64mb)
 * removed trailing 'F' in a ratio discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store

2012-02-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204803#comment-13204803
 ] 

stack commented on HBASE-5367:
--

{code}
long flushSize = this.htableDescriptor.getMemStoreFlushSize();
if (flushSize == HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE) {
  flushSize = conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE,
 HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE);
}
{code}

So, looks like DEFAULT_MEMSTORE_FLUSH_SIZE is 64M which is confusing and we'll 
use whats in HTD IFF its different from this default.

Yeah, easy to get confused.

 [book] small formatting changes to compaction description in 
 Arch/Regions/Store
 ---

 Key: HBASE-5367
 URL: https://issues.apache.org/jira/browse/HBASE-5367
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_hbase_5367.xml.patch, book_hbase_5367_2.xml.patch


 Fixing a few small-but-important things that came out of a post-commit 
 comment in HBASE-5365
 book.xml
 * corrected default region flush size (it's actually 64mb)
 * removed trailing 'F' in a ratio discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params

2012-02-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204820#comment-13204820
 ] 

stack commented on HBASE-2375:
--

bq. Upping compactionThreshold from 3 to 5... Sounds like a change that can 
have a bigger impact but that mostly helps this specific use case...

Dunno.  3 strikes me as one of those decisions that made sense long time ago 
but a bunch has changed since...  We should test it I suppose.

On changing flush/regionsize, you'd rather have us split faster then slow as 
count of regions goes up.  Ok.

 Make decision to split based on aggregate size of all StoreFiles and revisit 
 related config params
 --

 Key: HBASE-2375
 URL: https://issues.apache.org/jira/browse/HBASE-2375
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.20.3
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
  Labels: moved_from_0_20_5
 Attachments: HBASE-2375-v8.patch


 Currently we will make the decision to split a region when a single StoreFile 
 in a single family exceeds the maximum region size.  This issue is about 
 changing the decision to split to be based on the aggregate size of all 
 StoreFiles in a single family (but still not aggregating across families).  
 This would move a check to split after flushes rather than after compactions. 
  This issue should also deal with revisiting our default values for some 
 related configuration parameters.
 The motivating factor for this change comes from watching the behavior of 
 RegionServers during heavy write scenarios.
 Today the default behavior goes like this:
 - We fill up regions, and as long as you are not under global RS heap 
 pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) 
 StoreFiles.
 - After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a 
 compaction on this region.
 - Compaction queues notwithstanding, this will create a 192MB file, not 
 triggering a split based on max region size (hbase.hregion.max.filesize).
 - You'll then flush two more 64MB MemStores and hit the compactionThreshold 
 and trigger a compaction.
 - You end up with 192 + 64 + 64 in a single compaction.  This will create a 
 single 320MB and will trigger a split.
 - While you are performing the compaction (which now writes out 64MB more 
 than the split size, so is about 5X slower than the time it takes to do a 
 single flush), you are still taking on additional writes into MemStore.
 - Compaction finishes, decision to split is made, region is closed.  The 
 region now has to flush whichever edits made it to MemStore while the 
 compaction ran.  This flushing, in our tests, is by far the dominating factor 
 in how long data is unavailable during a split.  We measured about 1 second 
 to do the region closing, master assignment, reopening.  Flushing could take 
 5-6 seconds, during which time the region is unavailable.
 - The daughter regions re-open on the same RS.  Immediately when the 
 StoreFiles are opened, a compaction is triggered across all of their 
 StoreFiles because they contain references.  Since we cannot currently split 
 a split, we need to not hang on to these references for long.
 This described behavior is really bad because of how often we have to rewrite 
 data onto HDFS.  Imports are usually just IO bound as the RS waits to flush 
 and compact.  In the above example, the first cell to be inserted into this 
 region ends up being written to HDFS 4 times (initial flush, first compaction 
 w/ no split decision, second compaction w/ split decision, third compaction 
 on daughter region).  In addition, we leave a large window where we take on 
 edits (during the second compaction of 320MB) and then must make the region 
 unavailable as we flush it.
 If we increased the compactionThreshold to be 5 and determined splits based 
 on aggregate size, the behavior becomes:
 - We fill up regions, and as long as you are not under global RS heap 
 pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) 
 StoreFiles.
 - After each MemStore flush, we calculate the aggregate size of all 
 StoreFiles.  We can also check the compactionThreshold.  For the first three 
 flushes, both would not hit the limit.  On the fourth flush, we would see 
 total aggregate size = 256MB and determine to make a split.
 - Decision to split is made, region is closed.  This time, the region just 
 has to flush out whichever edits made it to the MemStore during the 
 snapshot/flush of the previous MemStore.  So this time window has shrunk by 
 more than 75% as it was the time to write 64MB from memory not 320MB from 
 aggregating 5 hdfs files.  This 

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204852#comment-13204852
 ] 

stack commented on HBASE-5270:
--

bq. So when assigning regions, some regionplans whose destination is the dead 
server will be failed.

True.  They'll fail and get retried elsewhere which shouldn't be too bad.  But 
need to keep it in mind.  This all doesn't work (I think) if we need to scan 
meta and its on a server that has just gone down.

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205139#comment-13205139
 ] 

stack commented on HBASE-5325:
--

Ok.  I can go along w/ cutting the scope of this issue back to its original 
intent (smile -- sorry for the hairbraining).

I was going to make a comment that we have the jmx 'model' not be a completely 
new one -- that it instead pick the existing model (though admittedly its a bit 
messy) -- but it looks like your hand has been forced some already.   I was 
going to suggest that we name the mbean for metrics MasterMetrics rather than 
MasterStatistics but I see this an existing MBean (Who named our mbean 'Master' 
and 'MasterStatistics' -- our history doesn't say... they don't seem like good 
names... why not org.apache.hbase prefix... etc.)

Why is HBaseMasterMXBean in hbase.metrics and not in hbase.master.metrics?  
Ditto HbaseRegionServerMXBean

You don't need these in your new files:

+ * Copyright 2012 The Apache Software Foundation

Why not name the bean for the master servername especially as there can be 
multiple masters running in a cluster -- getServerName.

Why would you have this info in regionserver jmx attributes:

+  public String getHBaseMaster();

This is pretty useless:

+  public long getStartCode();

Better return the regionserver ServerName.  Or name the mbean for the 
ServerName.

More to follow...



 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-09 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205215#comment-13205215
 ] 

stack commented on HBASE-5325:
--

I suppose you can't name the bean for the ServerName because you need to be 
able to locate the bean -- i.e. there'll be N instances of RegionServer beans 
and you'd then figure which belonged to which by looking at the ServerName 
attribute (I remember this is how it worked now).

Is this in branch-0.20-append branch:

+import org.apache.hadoop.metrics2.util.MBeans;

If not, this breaks our ability to run on that branch (Up to this presumed we 
could run there).  Its in 1.0.0 hadoop?  (would need to check its in CDH..)

Do classes have to have an HBase (or Hbase) prefix? Seems redundant (and we 
should be consistent).

Can we have a better name than HBaseRegionServerInfo.  It says nothing.  If it 
was RegionServerMBean or MXBean, it'd say more about what this class is about.

You can all it 'instance'.  'theInstance' is too much?

And again, publishing master, ensemble, and startcode seems like a bunch of 
info you'd never act on.  Master maybe -- you'd know which master it was 
talking too -- and perhaps ensemble because then you know who its registered 
with (though having both seems unnessary... the ensemble with its rootdir will 
tell you which cluster we belong too... perhaps you should get the cluster id 
out here?) but startcode is no good to anyone really.  Should be ServerName 
coming out here.  Thats how we uniquely identify regionservers in fs, when they 
report into the master and up in ensemble.  Might as well continue the 
identifier here.

Does this class need to take a RegionServer implementation?  Can it take a 
o.a.h.h.Server and/or a o.a.h.h.regionserver.RegionServerServices publishing 
jmx attributes?  These are Interfaces.  Might make this all easier to write 
tests on.


HbaseRegionServerMXBean should be in the regionserver/metrics package then you 
could call it MXBean or ServerMXBean.


HBaseMasterMXBean should be min master/metrics, should be HBaseMasterMXBean.  
The RegionServerInfo in it should be showing more than this small set of 
metrics if you are going to the bother of putting this stuff out on jmx (we do 
requests per region now -- should there be one of these classes per region?)  

I think that you don't need startcode if getRegionServer is returning the 
ServerName as a String.

Don't RegionState up in RegionsInTransiation have a ServerName associated too?  
YOu're not publishing this?

Again, ain't these bad names for beans up in jmx?

+mxBean = MBeans.register(HBase, MasterInfo,
+mxBeanInfo);

shouldn't it be org.apache.hbase... or hbase if thats the parent for all of the 
hbase beans (Ain't there convention on bean naming -- IIRC).  MasterInfo says 
nothing.  Could it be just Master.  Then you do getServerName or what ever it 
is that returns ServerName to distingush this master from the backup Master?

Sorry for so many comments for such a small patch.  I just feel that this stuff 
can be really useful if its done right.  Else its just more stuff for us to 
maintain.  Thanks for doing this stuff lads.

 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5364) Fix source files missing licenses in 0.92 and trunk

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205685#comment-13205685
 ] 

stack commented on HBASE-5364:
--

You fellas going to commit?

 Fix source files missing licenses in 0.92 and trunk
 ---

 Key: HBASE-5364
 URL: https://issues.apache.org/jira/browse/HBASE-5364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.0
Reporter: Jonathan Hsieh
Assignee: Elliott Clark
Priority: Blocker
 Attachments: HBASE-5364-1.patch, hbase-5364-0.92.patch


 running 'mvn rat:check' shows that a few files have snuck in that do not have 
 proper apache licenses.  Ideally we should fix these before we cut another 
 release/release candidate.
 This is a blocker for 0.94, and probably should be for the other branches as 
 well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5368) Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205686#comment-13205686
 ] 

stack commented on HBASE-5368:
--

+1

 Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible 
 in HBase installs
 -

 Key: HBASE-5368
 URL: https://issues.apache.org/jira/browse/HBASE-5368
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5368.txt


 Very simple change to make PrefixSplitKeyPolicy accessible in HBase installs 
 (user still needs to setup the table(s) accordingly).
 Right now it is in src/test/org.apache.hadoop.hbase.regionserver, I propose 
 moving it to src/org.apache.hadoop.hbase.regionserver (alongside 
 ConstantSizeRegionSplitPolicy), and maybe renaming it too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5368) Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205687#comment-13205687
 ] 

stack commented on HBASE-5368:
--

+1

 Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible 
 in HBase installs
 -

 Key: HBASE-5368
 URL: https://issues.apache.org/jira/browse/HBASE-5368
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5368.txt


 Very simple change to make PrefixSplitKeyPolicy accessible in HBase installs 
 (user still needs to setup the table(s) accordingly).
 Right now it is in src/test/org.apache.hadoop.hbase.regionserver, I propose 
 moving it to src/org.apache.hadoop.hbase.regionserver (alongside 
 ConstantSizeRegionSplitPolicy), and maybe renaming it too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5377) Fix licenses on the 0.90 branch.

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205784#comment-13205784
 ] 

stack commented on HBASE-5377:
--

Why this:

{code}
/plugin
 
+plugin
+  artifactIdmaven-surefire-report-plugin/artifactId
+  version2.9/version
+/plugin
+plugin
+  groupIdorg.apache.avro/groupId
+  artifactIdavro-maven-plugin/artifactId
+  version${avro.version}/version
+/plugin
+plugin
+  groupIdorg.codehaus.mojo/groupId
+  artifactIdbuild-helper-maven-plugin/artifactId
+  version1.5/version
+/plugin
{code}

Else patch looks good to me.  If you can build site and the webapps work, 
commit I'd say.

 Fix licenses on the 0.90 branch.
 

 Key: HBASE-5377
 URL: https://issues.apache.org/jira/browse/HBASE-5377
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5377.patch


 There are a handful of empty files and several files missing apache licenses 
 on the 0.90 branch.  This patch will fixes all of them and in conjunction 
 with HBASE-5363 will allow it to pass RAT tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5363) Automatically run rat check on mvn release builds

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205887#comment-13205887
 ] 

stack commented on HBASE-5363:
--

Whats your hadoop wikiid Jon?

 Automatically run rat check on mvn release builds
 -

 Key: HBASE-5363
 URL: https://issues.apache.org/jira/browse/HBASE-5363
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.90.5, 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5363-0.90.patch, hbase-5363.2.patch, 
 hbase-5363.patch


 Some of the recent hbase release failed rat checks (mvn rat:check).  We 
 should add checks likely in the mvn package phase so that this becomes a 
 non-issue in the future.
 Here's an example from Whirr:
 https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205895#comment-13205895
 ] 

stack commented on HBASE-5313:
--

How do I read the above?  Its same amount of kvs in each of the files?

 Restructure hfiles layout for better compression
 

 Key: HBASE-5313
 URL: https://issues.apache.org/jira/browse/HBASE-5313
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 A HFile block contain a stream of key-values. Can we can organize these kvs 
 on the disk in a better way so that we get much greater compression ratios?
 One option (thanks Prakash) is to store all the keys in the beginning of the 
 block (let's call this the key-section) and then store all their 
 corresponding values towards the end of the block. This will allow us to 
 not-even decompress the values when we are scanning and skipping over rows in 
 the block.
 Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205936#comment-13205936
 ] 

stack commented on HBASE-5209:
--

bq. I have a multi-master HBase set up, and I'm trying to programmatically 
determine which of the masters is currently active. But the API does not allow 
me to do this. There is a getMaster() method in the HConnection class, but it 
returns an HMasterInterface, whose methods do not allow me to find out which 
master won the last race. The API should have a getActiveMasterHostname() or 
something to that effect.

If you do a getMaster, I'd think that you should get the active master, only, 
in HConnection.  Are you saying that it'll give you an Interface on the 
non-active Master?  Thats broke I'd say.  For the name of the Master, yeah, 
getServerName should be part of HMasterInterface.

On the patch:

{code}
+  private boolean isMasterRunning, isActiveMaster;
{code}

The above are the names of methods, not data members.  Should be masterRunning 
and activeMaster.

Whats going on here:

{code}
+this.master = master;
+this.isMasterRunning = isMasterRunning;
+this.isActiveMaster = isActiveMaster;
{code}

So, we could be reporting a master that is not running and not the active 
master?  Why would we even care about it in that case?

getMasterInfo as method name returning master ServerName seems off.  Is this 
the 'active' master or non-running master?

I think we need to be clear that ClusterStatus reports on the active master 
only (unless you want to add list of all running master which I don't think yet 
possible since they do not register until they assume mastership --- hmmm... 
looking further down in your patch, it looks like you are adding this facility 
to zk).

Is this of any use?

+  public boolean isMasterRunning() {

I mean, if master is not running, can you even get a ClusterStatus from the 
cluster?

Ditto for +  public boolean isActiveMaster() {

Won't this just be true anytime you get a ClusterStatue?

You up the ClusterStatue version number but you don't act on it (what if you 
are asked deserialize an earlier version of ClusterStatus?)

On MasterInterface, I'd suggest don't bother upping the version number -- just 
add the new method on the end.  Thats usually ok.  Also, isActiveMaster of any 
use even?  (You could ask zk directly?  Have hbaseadmin go ask zk rather than 
go via the master at all?  Isn't the master znode name its ServerName?  Isn't 
that what you need?)

I like your registering backup masters... and adding the list to the zk report.



 HConnection/HMasterInterface should allow for way to get hostname of 
 currently active master in multi-master HBase setup
 

 Key: HBASE-5209
 URL: https://issues.apache.org/jira/browse/HBASE-5209
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Aditya Acharya
Assignee: David S. Wang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff


 I have a multi-master HBase set up, and I'm trying to programmatically 
 determine which of the masters is currently active. But the API does not 
 allow me to do this. There is a getMaster() method in the HConnection class, 
 but it returns an HMasterInterface, whose methods do not allow me to find out 
 which master won the last race. The API should have a 
 getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5363) Automatically run rat check on mvn release builds

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205937#comment-13205937
 ] 

stack commented on HBASE-5363:
--

Try it now boss

 Automatically run rat check on mvn release builds
 -

 Key: HBASE-5363
 URL: https://issues.apache.org/jira/browse/HBASE-5363
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.90.5, 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5363-0.90.patch, hbase-5363.2.patch, 
 hbase-5363.patch


 Some of the recent hbase release failed rat checks (mvn rat:check).  We 
 should add checks likely in the mvn package phase so that this becomes a 
 non-issue in the future.
 Here's an example from Whirr:
 https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206026#comment-13206026
 ] 

stack commented on HBASE-5387:
--

Any reason for hardcoding of 32K for buffer size:

+  ((Configurable)codec).getConf().setInt(io.file.buffer.size, 32 * 1024);

Give this an initial reasonable size?

+compressedByteStream = new ByteArrayOutputStream();

So, we'll keep around the largest thing we ever wrote into this 
ByteArrayOutputStream?  Should we resize it or something from time to time?  Or 
I suppose we can just wait till its a prob?

Is the gzip stuff brittle?  The header can be bigger than 10bytes I suppose 
(spec allows extensions IIRC) but I suppose its safe because we presume java or 
underlying native compression.

Good stuff Mikhail.  +1 on patch.

 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206156#comment-13206156
 ] 

stack commented on HBASE-5387:
--

IIRC, the BAOS will keep the outline of the largest allocation that went 
through it -- reset doesn't put the BAOS backing buffer back to original 
size... I haven't looked at the src... maybe its better now (I wrote that 
crawler gzipping thing you cite above).

 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical
 Fix For: 0.94.0

 Attachments: D1719.1.patch, 
 Fix-deflater-leak-2012-02-10_18_48_45.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call

2012-02-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206269#comment-13206269
 ] 

stack commented on HBASE-3584:
--

Looks like HBASE-3967 RowMutation is a common denominator of Put and Delete 
where as this RowMutation is a container that holds a swathe of Puts and 
Deletes to apply to a single row; they are different.

I'm up for rename. This RowMutation becomes RowMutations?

 Allow atomic put/delete in one call
 ---

 Key: HBASE-3584
 URL: https://issues.apache.org/jira/browse/HBASE-3584
 Project: HBase
  Issue Type: New Feature
  Components: client, coprocessors, regionserver
Reporter: ryan rawson
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt


 Right now we have the following calls:
 put(Put)
 delete(Delete)
 increment(Increments)
 But we cannot combine all of the above in a single call, complete with a 
 single row lock.  It would be nice to do that.
 It would also allow us to do a CAS where we could do a put/increment if the 
 check succeeded.
 -
 Amendment:
 Since Increment does not currently support MVCC it cannot be included in an 
 atomic operation.
 So this for Put and Delete only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206271#comment-13206271
 ] 

stack commented on HBASE-5387:
--

@Ted Yeah, I took a look at BAOS.  Reset just sets count to zero as you see.  
If a BAOS goes big and stays big, maybe its not so bad.  If there are a bunch 
of these though, it could become a prob.  We could do as you suggest as long as 
we do it before we call resize?  So we'd have a conditional where if the size  
N times the buffer, then instead of resusing, we go get a new BAOS?

 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical
 Fix For: 0.94.0

 Attachments: D1719.1.patch, 
 Fix-deflater-leak-2012-02-10_18_48_45.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4920) We need a mascot, a totem

2012-02-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206548#comment-13206548
 ] 

stack commented on HBASE-4920:
--

The red one is not bad (purple too cartoony and the other too complicated).

On Marcy's latest, I think first set while graphically clean, they have lost 
too much detail and hard now to identify as an orca in particular.  Option 5 is 
not bad but not sure this an orca either w/ the blunt nose.  I'd say option 6 
the best of the lot.

Looking around more it looks like we could 'buy' these as is:

http://www.shutterstock.com/pic-58643317/stock-vector-vector-killer-whale-represented-in-the-form-of-a-tattoo.html

http://depositphotos.com/2900573/stock-illustration-Killer-whale-tattoo.html

http://www.canstockphoto.com/whale-tattoo-3058609.html

http://www.canstockphoto.com/orca-killer-whale-2297275.html

http://4.bp.blogspot.com/_HBjr0PcdZW4/TIge5Uok9eI/A8Q/0AdMNtuhRaY/s400/pm006-orca7.png

http://www.bigstockphoto.com/image-27015218/stock-vector-orca-killer-whale-cartoon-vector-illustratio

http://www.graphicsfactory.com/Clip_Art/Animals/Fish/funny_water_animals_053_377466.html

This one, the second, inverted might work 
http://depositphotos.com/2900573/stock-illustration-Killer-whale-tattoo.html, 
so its rising over the hbase logo from the right.

 We need a mascot, a totem
 -

 Key: HBASE-4920
 URL: https://issues.apache.org/jira/browse/HBASE-4920
 Project: HBase
  Issue Type: Task
Reporter: stack
 Attachments: HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 
 2011-11-30 at 4.06.17 PM.png, apache hbase orca logo_Proof 3.pdf, apache 
 logo_Proof 8.pdf, krake.zip, photo (2).JPG


 We need a totem for our t-shirt that is yet to be printed.  O'Reilly owns the 
 Clyesdale.  We need something else.
 We could have a fluffy little duck that quacks 'hbase!' when you squeeze it 
 and we could order boxes of them from some off-shore sweatshop that 
 subcontracts to a contractor who employs child labor only.
 Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from 
 Salesforce showed me, that was a bit too spiritual for me to be seen quoting 
 here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in 
 translation, bigdata).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207017#comment-13207017
 ] 

stack commented on HBASE-5200:
--

Can we do a test for this before it goes into 0.92 and trunk?

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-v2.txt, HBASE-5200.patch, HBASE-5200_1.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5388) Tuning HConnectionManager#getCachedLocation method

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207031#comment-13207031
 ] 

stack commented on HBASE-5388:
--

I agree w/ Zhihong, that '+if (!(this.internalMap instanceof NavigableMap)) 
{' block seems unnecessary.

Is the 'greatest' in this right?

{code}
+   * retrieves the value associated with the greatest key strictly less than
+   *  the given key, or null if there is no such key
{code}

Its unfortunate that SoftValueSortedMap sneaks back into HCM but I can live 
with it if no alternative (Lars' suggestion having SVSM implement NM didn't 
seem to bad but understand if it adds a bunch of boiler plate)



 Tuning HConnectionManager#getCachedLocation method
 --

 Key: HBASE-5388
 URL: https://issues.apache.org/jira/browse/HBASE-5388
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: ronghai.ma
Assignee: ronghai.ma
  Labels: patch
 Fix For: 0.94.0

 Attachments: 5388-v2.txt, HConnectionManager.java, 
 SoftValueSortedMap.java, SoftValueSortedMap.java


 About 75% improvement in execution time.
 1. Add the following method in SoftValueSortedMap:
 {code}
 public synchronized K, V EntryK, V lowerEntry(K key) {
   return ((TreeMap) this.internalMap).lowerEntry(key);
 }
 {code}
 2. Modify getCachedLocation:
 {code} 
 Map.Entrybyte[], HRegionLocation tEntry = tableLocations.lowerEntry(row);
   if (tEntry != null) {
   HRegionLocation possibleRegion = tEntry.getValue();
   //other code
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207045#comment-13207045
 ] 

stack commented on HBASE-3584:
--

This is an unfortunate situation -- e.g. Mutation is in 0.92 release and its 
what RowMutation is in 0.89fb IIUC -- but lets patch up the model as best we 
can with deprecations of the ugly so we're clear on what we want folks to use 
going forward (I believe I reviewed both patches -- I should have caught the 
name clash). 

 Allow atomic put/delete in one call
 ---

 Key: HBASE-3584
 URL: https://issues.apache.org/jira/browse/HBASE-3584
 Project: HBase
  Issue Type: New Feature
  Components: client, coprocessors, regionserver
Reporter: ryan rawson
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt


 Right now we have the following calls:
 put(Put)
 delete(Delete)
 increment(Increments)
 But we cannot combine all of the above in a single call, complete with a 
 single row lock.  It would be nice to do that.
 It would also allow us to do a CAS where we could do a put/increment if the 
 check succeeded.
 -
 Amendment:
 Since Increment does not currently support MVCC it cannot be included in an 
 atomic operation.
 So this for Put and Delete only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5388) Tuning HConnectionManager#getCachedLocation method

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207051#comment-13207051
 ] 

stack commented on HBASE-5388:
--

bq. The javadoc involving 'greatest' comes from javadoc for lowerEntry().

Ok

v3 is good by me except for the HCM pollution w/ SoftValueSortedMap.  I looked 
at making it implement NavigableMap and its 18 extra methods.  They could all 
throw unsupported as per Lars.  I'm fine w/ this patch though.

 Tuning HConnectionManager#getCachedLocation method
 --

 Key: HBASE-5388
 URL: https://issues.apache.org/jira/browse/HBASE-5388
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: ronghai.ma
Assignee: ronghai.ma
  Labels: patch
 Fix For: 0.94.0

 Attachments: 5388-v2.txt, 5388-v3.txt, HConnectionManager.java, 
 SoftValueSortedMap.java, SoftValueSortedMap.java


 About 75% improvement in execution time.
 1. Add the following method in SoftValueSortedMap:
 {code}
 public synchronized K, V EntryK, V lowerEntry(K key) {
   return ((TreeMap) this.internalMap).lowerEntry(key);
 }
 {code}
 2. Modify getCachedLocation:
 {code} 
 Map.Entrybyte[], HRegionLocation tEntry = tableLocations.lowerEntry(row);
   if (tEntry != null) {
   HRegionLocation possibleRegion = tEntry.getValue();
   //other code
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207063#comment-13207063
 ] 

stack commented on HBASE-2375:
--

+1 on patch

Make new issue to change compactionThreshod to at least 4 I'd say and then you 
can close out this one?

 Make decision to split based on aggregate size of all StoreFiles and revisit 
 related config params
 --

 Key: HBASE-2375
 URL: https://issues.apache.org/jira/browse/HBASE-2375
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.20.3
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
  Labels: moved_from_0_20_5
 Attachments: HBASE-2375-flush-split.patch, HBASE-2375-v8.patch


 Currently we will make the decision to split a region when a single StoreFile 
 in a single family exceeds the maximum region size.  This issue is about 
 changing the decision to split to be based on the aggregate size of all 
 StoreFiles in a single family (but still not aggregating across families).  
 This would move a check to split after flushes rather than after compactions. 
  This issue should also deal with revisiting our default values for some 
 related configuration parameters.
 The motivating factor for this change comes from watching the behavior of 
 RegionServers during heavy write scenarios.
 Today the default behavior goes like this:
 - We fill up regions, and as long as you are not under global RS heap 
 pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) 
 StoreFiles.
 - After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a 
 compaction on this region.
 - Compaction queues notwithstanding, this will create a 192MB file, not 
 triggering a split based on max region size (hbase.hregion.max.filesize).
 - You'll then flush two more 64MB MemStores and hit the compactionThreshold 
 and trigger a compaction.
 - You end up with 192 + 64 + 64 in a single compaction.  This will create a 
 single 320MB and will trigger a split.
 - While you are performing the compaction (which now writes out 64MB more 
 than the split size, so is about 5X slower than the time it takes to do a 
 single flush), you are still taking on additional writes into MemStore.
 - Compaction finishes, decision to split is made, region is closed.  The 
 region now has to flush whichever edits made it to MemStore while the 
 compaction ran.  This flushing, in our tests, is by far the dominating factor 
 in how long data is unavailable during a split.  We measured about 1 second 
 to do the region closing, master assignment, reopening.  Flushing could take 
 5-6 seconds, during which time the region is unavailable.
 - The daughter regions re-open on the same RS.  Immediately when the 
 StoreFiles are opened, a compaction is triggered across all of their 
 StoreFiles because they contain references.  Since we cannot currently split 
 a split, we need to not hang on to these references for long.
 This described behavior is really bad because of how often we have to rewrite 
 data onto HDFS.  Imports are usually just IO bound as the RS waits to flush 
 and compact.  In the above example, the first cell to be inserted into this 
 region ends up being written to HDFS 4 times (initial flush, first compaction 
 w/ no split decision, second compaction w/ split decision, third compaction 
 on daughter region).  In addition, we leave a large window where we take on 
 edits (during the second compaction of 320MB) and then must make the region 
 unavailable as we flush it.
 If we increased the compactionThreshold to be 5 and determined splits based 
 on aggregate size, the behavior becomes:
 - We fill up regions, and as long as you are not under global RS heap 
 pressure, you will write out 64MB (hbase.hregion.memstore.flush.size) 
 StoreFiles.
 - After each MemStore flush, we calculate the aggregate size of all 
 StoreFiles.  We can also check the compactionThreshold.  For the first three 
 flushes, both would not hit the limit.  On the fourth flush, we would see 
 total aggregate size = 256MB and determine to make a split.
 - Decision to split is made, region is closed.  This time, the region just 
 has to flush out whichever edits made it to the MemStore during the 
 snapshot/flush of the previous MemStore.  So this time window has shrunk by 
 more than 75% as it was the time to write 64MB from memory not 320MB from 
 aggregating 5 hdfs files.  This will greatly reduce the time data is 
 unavailable during splits.
 - The daughter regions re-open on the same RS.  Immediately when the 
 StoreFiles are opened, a compaction is triggered across all of their 
 StoreFiles because they contain references. 

[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207080#comment-13207080
 ] 

stack commented on HBASE-5325:
--

Does this need to be public?  Can it be protected in AM?

{code}
+public ServerName getServerName() {
{code}

Is the above even used?

Please file an issue to move +import org.apache.hadoop.metrics.util.MBeanUtil; 
to o.a.h.h.u after this patch goes in (Seems like a silly place for this class 
-- not your fault I know)

I think this is better for a name:

{code}
+MBeanUtil.registerMBean(org.apache.hadoop.hbase, Master, mxBeanInfo);
{code}

... maybe o.a.hbase instead of o.a.h.h (the o.a.hadoop prefix is legacy we'll 
undo one day).

This bean is new, right?  So its ok giving it a  name like this?

Needs a class comment: '+public class MasterMXBeanImpl implements MasterMXBean 
{'

Do you think we need to add an isActiveMaster attribute?  What if the master is 
not active?  Will it show in jmx?  If not, then no need of such an attribute.

Do you think this Interface belongs in Metrics?  
src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMXBean.java

Its more than just metrics, right?  Maybe should be in master package (Sorry, 
should have said that in previous review)

Why not call it RegionServerMXBeanImpl.java or just MXBeanImpl (here and in 
master package)?

I see now why no class comment on the Impl; its because you have class comment 
on the Interface.  Thats fine.

Do you have to do this getHBaseMaster with all of the messing to construct a 
master name (and its not right anyways since master has its service port, not 
its UI port when its name is made Leave it out I'd say... users can get the 
info in zk.

Good stuff



 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, 
 HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5393) Consider splitting after flushing

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207081#comment-13207081
 ] 

stack commented on HBASE-5393:
--

+1 on putting in 0.92 too...

 Consider splitting after flushing
 -

 Key: HBASE-5393
 URL: https://issues.apache.org/jira/browse/HBASE-5393
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.0

 Attachments: HBASE-2375-flush-split.patch


 Spawning this from HBASE-2375, I saw that it was much more efficient 
 compaction-wise to check if we can split right after flushing. Much like the 
 ideas that Jon spelled out in the description of that jira, the window is 
 smaller because you don't have to compact and then split right away to only 
 compact again when the daughters open.
 Another thing it improves is while we're normally waiting for the compaction 
 to happen, data that's still coming in will make us go way past the 
 MAX_FILESIZE to a point where for the first region I was seeing a store size 
 3-4x bigger before it was able to split.
 I targeted this for 0.94, but I'd like to get this into 0.92.1 or .2 too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207086#comment-13207086
 ] 

stack commented on HBASE-3584:
--

bq. Let's just rename trunk's RowMutation to RowMutations.

Sure, but don't we want to also deprecate RowMutation when it goes in and point 
people to instead use Mutation?

 Allow atomic put/delete in one call
 ---

 Key: HBASE-3584
 URL: https://issues.apache.org/jira/browse/HBASE-3584
 Project: HBase
  Issue Type: New Feature
  Components: client, coprocessors, regionserver
Reporter: ryan rawson
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt


 Right now we have the following calls:
 put(Put)
 delete(Delete)
 increment(Increments)
 But we cannot combine all of the above in a single call, complete with a 
 single row lock.  It would be nice to do that.
 It would also allow us to do a CAS where we could do a put/increment if the 
 check succeeded.
 -
 Amendment:
 Since Increment does not currently support MVCC it cannot be included in an 
 atomic operation.
 So this for Put and Delete only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207340#comment-13207340
 ] 

stack commented on HBASE-5200:
--

Attached unit test stands up an AssignmentManager and then manufactures the 
condition that Ram describes.  The test gets stuck and timesout after five 
seconds because the znode is not cleared on master failover (as per Ram 
description).

Ram, your patch no longer applies to TRUNK seemingly.

Why you make a hash w/ preset size of 1?

{code}
+  private SetString regionsProcessed = new HashSetString(1);
{code}

Is this the right name for this hash?  Should it be 
regionsProcessedJoiningCluster or some such?

The regionsProcessed hash is of a String.  I see in 
handleRegionWhileFailOverInProgress that we always get the regioninfo from 
meta.  Isn't possible that in processRegionInTransition we may have done this 
already?  That it may be non-null?  If so, shouldn't we keep it around so we 
don't have to go to the .META. every time but only for those cases where 
regioninfo is indeed null?  Would that mean changing regionsProcessed to be a 
Map of String to HRI?

Isn't getHRegionInfo repeating code from earlier up in 
processRegionInTransition?

If so, change it so that there is only one place where we go to meta... have 
both places call your new getRegionInfo method.

Why do this:

{code}
+  hri = p.getFirst();
+  return hri;
{code}

Why not just do return p.getFirst();?

Is everything shifted right because of this test?

{code}
+  if (regionState == null
+   !regionsProcessed.contains(encodedRegionName)) {

{code}

If so, shouldn't we just take the opposite of the above and return immediately 
if regionState is non-null and in regionsProcesed as in:

{code}
if (regionsState != null  regionsProcessed.contains(encodedRegionName)) 
return;
{code}

This would make your change less substantial.

It seems wrong that we are putting stuff into RIT in two places; in 
processRegionsInTransition and in handlRegion if we happen to be fielding a 
call back before failover has had a chance to run.

Would the fb trick of NOT processing callbacks during master failover help 
here?  At least for the scope of the AM.joinCluster?

Is this a good name for this  method?  handleRegionWhileFailOverInProgress  
Should it be checkFailover or some such?

The test I attached only checks the CLOSING state.  We should extend it to do 
the other states OPENING, etc.?

I can help with this.

Also, how did you figure out this bug.  It must have taken a bunch of head 
banging to figure that this was indeed what was going on.  Good stuff Ram.





 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, 
 HBASE-5200_1.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 

[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207386#comment-13207386
 ] 

stack commented on HBASE-5325:
--

bq. This was added to RegionState to expose the private field ServerName for 
display with regions in transition.

Ok.  I thought it was on AM.  My bad.

bq. Will replace with the zk quorum info if that seems appropriate.

Yeah, that might be more appropriate

Good stuff.

 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, 
 HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207402#comment-13207402
 ] 

stack commented on HBASE-5325:
--

+1

Will let it stew some in case others want to take a looksee.

 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, 
 HBASE-5325.3.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207403#comment-13207403
 ] 

stack commented on HBASE-5200:
--

bq. However, TestAssignmentManager#testBalanceOnMasterFailover fails with or 
without the patch.

Then the patch doesn't fix the issue?

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, 
 HBASE-5200_1.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207842#comment-13207842
 ] 

stack commented on HBASE-5200:
--

bq. - The region was not transitioned after the CLOSED transition got a call 
back for assigning it. So there was no RS to process the assign.

I did not think this needed since I'd reproduced your scenario.

Let me look at your changes.  Thanks.

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, 
 HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5398) HBase shell disable_all/enable_all/drop_all promp wrong tables for confirmation

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207899#comment-13207899
 ] 

stack commented on HBASE-5398:
--

Patch looks good.  How is disable_all supposed to work?  Its supposed to take a 
regex pattern?

 HBase shell disable_all/enable_all/drop_all promp wrong tables for 
 confirmation
 ---

 Key: HBASE-5398
 URL: https://issues.apache.org/jira/browse/HBASE-5398
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.92.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.94.0, 0.92.0

 Attachments: hbase-5398.patch


 When using hbase shell to disable_all/enable_all/drop_all tables, the tables 
 prompted for confirmation are wrong.
 For example, disable_all 'test*'
 will ask form confirmation to diable tables like:
 mytest1
 test123
 Fortunately, these tables will not be disabled actually since Java pattern 
 doesn't match this way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207905#comment-13207905
 ] 

stack commented on HBASE-3584:
--

@Kannan Yes.  Sounds good.  Was saying that the forwarded ported RowMutation 
should be deprecated on forward-port since it does what Mutation is in TRUNK -- 
but we can that in another, subsequent patch.

 Allow atomic put/delete in one call
 ---

 Key: HBASE-3584
 URL: https://issues.apache.org/jira/browse/HBASE-3584
 Project: HBase
  Issue Type: New Feature
  Components: client, coprocessors, regionserver
Reporter: ryan rawson
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt


 Right now we have the following calls:
 put(Put)
 delete(Delete)
 increment(Increments)
 But we cannot combine all of the above in a single call, complete with a 
 single row lock.  It would be nice to do that.
 It would also allow us to do a CAS where we could do a put/increment if the 
 check succeeded.
 -
 Amendment:
 Since Increment does not currently support MVCC it cannot be included in an 
 atomic operation.
 So this for Put and Delete only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208032#comment-13208032
 ] 

stack commented on HBASE-5399:
--

Hurray on below:

{code}
+  @Deprecated
   public ZooKeeperWatcher getZooKeeperWatcher() throws IOException;

...

+   * @deprecated Removed because it was a mistake exposing zookeeper in this
+   * interface (ZooKeeper is an implementation detail).
*/
+  @Deprecated
   public HMasterInterface getMaster()
{code}

Why are these deprecated? (Add why to the deprecated note -- add pointer to 
where user can get functionality elsewise):

{code}
+  @Deprecated
   public HRegionInterface getHRegionConnection(HServerAddress regionServer)
{code}

Fix your comments in HCM.  Missing 'e' on 'not' and 'd' on 'use'

Do we get the clusterid on connection setup?  Do we have to?  Can we just get 
that when someone asks for it?

Fix...

{code}
  // We will ope/close a ZooKeeper 
{code}

What about isTableEnabled, etc., should they be deprecated, moved out of 
HConnnection?  Or that is for a different issue?

Should we be using straight ZooKeeper instead of ZooKeeperWatcher?  We don't 
need watch facility?

So far so good...




 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208040#comment-13208040
 ] 

stack commented on HBASE-5200:
--

bq. How should HBASE-5270 be solved using the above approach ?

There would be no concurrent servershutdownhandler running.  For the 
specialization, hbase-4748, in the current version hbase-5344 there'd be no 
need to get .META. on line to complete failover (but it looks like Mikhail is 
revisiting this aspect in his last comments up on hbase-5344).

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208061#comment-13208061
 ] 

stack commented on HBASE-5399:
--

bq. We can do that, it requires some work to do it well (ZooKeeperWatcher has 
spread too much, and has a lot of responsibilities (for example, it's the owner 
of the znode names, created from the config parameters)). It would be much 
cleaner, and a little bit faster. We would still pay for the tcp connection 
however.

Ok.  For another patch then I'd say (Agree ZKW is like a dumping ground for zk 
ops)

bq. isTableEnabled, etc., should they be deprecated, moved out of HConnnection 
 I was thinking putting them in HBaseAdmin, does it makes sense?

I think it makes sense... deprecate them in HCM and move to HBaseAdmin 

Sweet.

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5400) Some tests does not have annotations for (Small|Medium|Large)Tests

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208140#comment-13208140
 ] 

stack commented on HBASE-5400:
--

+1

 Some tests does not have annotations for (Small|Medium|Large)Tests 
 ---

 Key: HBASE-5400
 URL: https://issues.apache.org/jira/browse/HBASE-5400
 Project: HBase
  Issue Type: Bug
  Components: security, test
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-5400_v1.patch


 These tests does not have annotations, and are not picked up by -PrunAllTests
 {code}
 security/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessControlFilter.java
 security/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
 security/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java
 security/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionsWatcher.java
 security/src/test/java/org/apache/hadoop/hbase/security/token/TestTokenAuthentication.java
 security/src/test/java/org/apache/hadoop/hbase/security/token/TestZKSecretWatcher.java
 src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
 {code}
 We can also backport this to 0.92.1, since development will continue on 0.92 
 branch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208173#comment-13208173
 ] 

stack commented on HBASE-5200:
--

You are right Ted.

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208183#comment-13208183
 ] 

stack commented on HBASE-5200:
--

bq. I think the solution provided by Ramkrishna should be integrated.

Into TRUNK?

Its an improvement but I feel that there are loads of holes in here trying to 
process callbacks at the same time as trying to bring the new master online w/ 
a coherent picture of cluster state; it strikes me as a task w/o end -- hard to 
test too (witness the test added here).  We need a refactor of master failover. 
 Holding up all callback processing strikes me as a basic simplification that 
we should take on.

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208230#comment-13208230
 ] 

stack commented on HBASE-5200:
--

Make a new issue Ram since we've already committed a patch to 0.90 on this 
issue?

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208238#comment-13208238
 ] 

stack commented on HBASE-5200:
--

Sorry.  I misread hadoopqa output.

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-14 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208239#comment-13208239
 ] 

stack commented on HBASE-5396:
--

On the below:

{code}
+  public boolean isRegionOnline(HRegionInfo hri) {
+HServerInfo hsi = this.regions.get(hri);
+if (hsi != null  this.isServerOnline(hsi.getServerName())) {
+  return true;
+}
+return false;
+  }
{code}

Don't you have to take out a lock on this.regions before you access it?  See 
the comment in this.servers.

Also, could write the end of the method so:

{code}
return hsi != null  this.isServerOnline(hsi.getServerName();
{code}

Whats a RegionsWithDeadServer?   Is it RegionsOnDeadServers?

The below is called regionplan but its storing HRegionInfos?

{code}
+SetHRegionInfo regionPlanOnThisServer = new HashSetHRegionInfo();
{code}

And then here, we are storing a Set of HRIs but method name talks of 
RegionPlans.  Its a little hard to follow?

Ditto here:

{code}
+private SetHRegionInfo regionPlanOnThisServer = null;
{code}

and this...

{code}
+public SetHRegionInfo getRegionPlanOnThisServer() {
{code}

This comment doesn't seem right?

{code}
+   * Process result used by processServerShutdown.
{code}

There is no processing done in this data structure.


... and save a few lines?

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.7

 Attachments: HBASE-5396-90.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-15 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208655#comment-13208655
 ] 

stack commented on HBASE-5200:
--

It would make sense that it get committed to 0.92 also.

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-15 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208803#comment-13208803
 ] 

stack commented on HBASE-5270:
--

@Chunhui Excellent.

I am all good w/ modifying core to make it more testable.  Would suggest 
instead though that you make more generic changes up in Master, etc.

For example, the attached patch makes it so a subclass of HMaster can observe 
log splitting and insert pauses and override the RegionServerTracker to pause 
on deletes until a gate is cleared.  This might make more sense than the custom 
changes made to RegionServerTracker and HMaster in this patch.

On the patch, I think the HMaster and RegionServerTracker changes are too 
specific to this test case.  Would suggest making more generic changes to these 
core classes because other tests will be able to make use of a more generic 
change (I think its good to mod core classes to make them more testable).
'
On the test, can we have a better name than TestHRegionserverKilled?  It 
doesn't say what this test does (testDataCorrectnessWhenMasterFailOver might be 
better as class name).

Why do this:

+Configuration conf = HBaseConfiguration.create();

Why not use whats in your HBaseTestingUtility (do a getConfiguration -- see the 
attached test).

Also, you might use the junit primitives for test setup and teardown as per the 
attache test.





 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
 Fix For: 0.94.0, 0.92.1

 Attachments: 5270-90-testcase.patch, 5270-90.patch, 
 5270-testcase.patch, hbase-5270.patch


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209478#comment-13209478
 ] 

stack commented on HBASE-4365:
--

I can add the above changes (will fix the superclass from where I copied this 
stuff too) but I'm more interested in feedback along the lines of whether folks 
think we should put this in as default split policy.  If so, will then spend 
time on it trying it on cluster, otherwise not.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Todd Lipcon
 Attachments: 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209561#comment-13209561
 ] 

stack commented on HBASE-5200:
--

Ram You should add it to trunk too going by Ted's reasoning above.  You 
reviewed my last version?  Its ok w/ you?

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209621#comment-13209621
 ] 

stack commented on HBASE-4365:
--

Chatting w/ J-D, probably less disruptive if we do square of the count of 
regions on a regionserver so we get to max size faster (then there'll be less 
regions created overall by this phenomeon).


 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Todd Lipcon
 Attachments: 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209650#comment-13209650
 ] 

stack commented on HBASE-5075:
--

@Ronhai.ma Put up a patch so we can see what you are thinking; are you talking 
about a supervisor-like process that will remove the regionserver ephemeral 
node if the pid goes missing?  Thanks.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: 代åæ—čæœ
 Fix For: 0.90.5


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209679#comment-13209679
 ] 

stack commented on HBASE-4403:
--

What kind of feedback do you need Jimmy?

These are deprecated:

org/apache/hadoop/hbase/HServerAddress.java
org/apache/hadoop/hbase/HServerInfo.java

I suppose you want to mark them anyways.

Its ugly that the below is Audience=public and Stability-Evolving but I suppose 
it the truth:

org/apache/hadoop/hbase/KeyValue.java

I see this is still evolving:

org/apache/hadoop/hbase/ClusterStatus.java

(Its having backup masters added as we speak)

Is this public?

org/apache/hadoop/hbase/ClockOutOfSyncException.java

I mean client can get it?

Maybe this should be too...

org/apache/hadoop/hbase/LocalHBaseCluster.java

Its a useful tool I'd think.

Some of these public=private are used in tests... does that make it 
audience=public?  Or test building blocks are for devs and so audience=public

Do we have defines for these classifications?  If so, should go into dev 
section of manual... or into index or something.

Is org/apache/hadoop/hbase/client/HConnectionManager.java public?  Should be 
private?  Would be good if connection implementation was 'hidden'

org/apache/hadoop/hbase/io/TimeRange.java is in io?  Doesn't that come through 
in client api?  I'd think it stable too.  Seems like its in wrong package.

Master and Regionserver has stuff that is used in tests too... 

Some stuff in Util could be public... the Keying utility class, the Strings 
class...

Otherwise +1.  Good stuff.


 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-4403-interface.txt, 
 hbase-4403-nowhere-near-done.txt


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5414) Assign different memstoreTS to different KV's in the same WALEdit during replay

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209684#comment-13209684
 ] 

stack commented on HBASE-5414:
--

bq. can guarantee that the Put wins over the Delete even though both 
happened at the same NOW timestamp

Trying to understand... I don't get it Kannan.  If they both went in together 
at same TS, doesn't the Delete sort before the Put so we'd see it first... so 
it would overshadow the contemporaneous Put?

 Assign different memstoreTS to different KV's in the same WALEdit during 
 replay
 ---

 Key: HBASE-5414
 URL: https://issues.apache.org/jira/browse/HBASE-5414
 Project: HBase
  Issue Type: Sub-task
  Components: client, coprocessors, regionserver
Reporter: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: HBASE-5414.D1749.1.patch


 HBASE-5203 combines all the different Puts/Deletes into one WALEdit. This is
 required to ensure that we persist the atomic mutation in its enterity and not
 in parts.
 When combined into a single WALEdit, we create one big familyMap that is a 
 combination
 of all the family maps in the mutations. The KV's in this familyMap have no 
 information
 about memstoreTS (it is not yet assigned).
 However, when we apply the mutations to the Memstore (if there are no 
 failures) we end up
 incrementing the memstoreTS for each operation. 
 This can lead to the client seeing different order of operations -- depending 
 on weather or
 not there was a RS crash/restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209760#comment-13209760
 ] 

stack commented on HBASE-5416:
--

Interesting idea.  Patch looks pretty non-invasive to add some nice 
functionality.  Would appreciate some better doc in the filters package doc or 
over in the manual to accompany this change.  Nice one Max.

Comments on patch:

Please follow the convention you see in the surrounding file and parenthesize 
code blocks.  E.g. in the below:

{code}
+
+  @Override
+  public boolean isFamilyEssential(byte[] name) {
+for (Filter filter : filters)
+  if (filter.isFamilyEssential(name))
+return true;
+return false;
+  }
{code}

What is a 'joinedScanner' in the below:

{code}
+  ListKeyValueScanner joinedScanners = new ArrayListKeyValueScanner();

{code}

It needs a bit of a comment I'd say.

Why drop the check for empty results in below?

{code}
-  if (results.isEmpty() || filterRow()) {
+  boolean filtered = filterRow();
{code}

Please submit a patch with a --no-prefix so we can see how your patch does 
against hadoopqa.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered-scans_0.90.4.patch, Filtered-scans_trunk.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209762#comment-13209762
 ] 

stack commented on HBASE-4403:
--

bq. How about those coprocessor and rest related classes?

CPs we are calling dev facing APIs which would make them private I suppose in 
your classification scheme.

REST classes similar since 'users' make REST calls -- the interface is the 
/table/row/, etc. scheme.

Lets add Todd's definitions to the book as part of this patch or in a new one 
(I can do it if you want -- just say so).

 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-4403-interface.txt, 
 hbase-4403-nowhere-near-done.txt


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209771#comment-13209771
 ] 

stack commented on HBASE-5416:
--

@Max You need a test too.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered-scans_0.90.4.patch, Filtered-scans_trunk.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3584) Allow atomic put/delete in one call

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209908#comment-13209908
 ] 

stack commented on HBASE-3584:
--

Ugh, commented over in different issue:

{code}
stack has commented on the revision HBASE-5413 [jira] Rename RowMutation to 
RowMutations.

 We have the notion of Operation already.  It would seem to cut across rows in 
that a Scan and a RowMutations is ugly, agree, but it does convey notions of 
'many' and 'in a row'.

 Regards RowOperation, Get or Put subclass Operation (actually, its subclass 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/OperationWithAttributes.html).
RowOperation therefore could work in that it narrows the set of Operations 
only, the odd thing is that Delete does not inherit from 
OperationWIthAttribute.  I'd think we should fix this hole in our 'model' if we 
went with RowOperation.

 If I was looking into the client package and saw both RowMutation and 
RowOperation sitting beside each other, I'd think I'd be confused what to use 
whereas with RowMutation and RowMutuations... I'd think I'd have some notion 
how they might be used.

 AtomicRowMutation I feel is w/o meaning since all Row mutations are 'atomic' 
-- or its a bug - so the Atomic qualifier would seem to add nothing.

 As Thomas Pan says, just my 2c.
{code}

 Allow atomic put/delete in one call
 ---

 Key: HBASE-3584
 URL: https://issues.apache.org/jira/browse/HBASE-3584
 Project: HBase
  Issue Type: New Feature
  Components: client, coprocessors, regionserver
Reporter: ryan rawson
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 3584-final.txt, 3584-v1.txt, 3584-v3.txt


 Right now we have the following calls:
 put(Put)
 delete(Delete)
 increment(Increments)
 But we cannot combine all of the above in a single call, complete with a 
 single row lock.  It would be nice to do that.
 It would also allow us to do a CAS where we could do a put/increment if the 
 check succeeded.
 -
 Amendment:
 Since Increment does not currently support MVCC it cannot be included in an 
 atomic operation.
 So this for Put and Delete only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5420) TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 hadoop)

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209913#comment-13209913
 ] 

stack commented on HBASE-5420:
--

+1 on patch.  Mind resubmitting w/ --no-prefix please Gregory and then doing 
'submit patch' to try it against hadoopqa?

 TestImportTsv does not shut down MR Cluster correctly (fails against 0.23 
 hadoop)
 -

 Key: HBASE-5420
 URL: https://issues.apache.org/jira/browse/HBASE-5420
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5420.patch


 Test calls startMiniMapReduceCluster() but never calls 
 shutdownMiniMapReduceCluster().
 This causes failures with -Dhadoop.profile=23 when both testMROnTable and 
 testMROnTableWithCustomMapper are run, because the cluster cannot start up 
 properly for the second test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209925#comment-13209925
 ] 

stack commented on HBASE-5421:
--

I'm being lazy Shaneal.  This patch effects the 0.23 profile section of the pom 
only?

 use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
 

 Key: HBASE-5421
 URL: https://issues.apache.org/jira/browse/HBASE-5421
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
  Labels: build
 Fix For: 0.92.1

 Attachments: hbase-5421.patch


 Hadoop recently added hadoop-client and hadoop-minicluster artifacts for 
 Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009).
 Let's use them instead of manually specifying transitive dependency exclusion 
 lists (which is error prone and annoying).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209927#comment-13209927
 ] 

stack commented on HBASE-5421:
--

Oh, you did it.

 use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
 

 Key: HBASE-5421
 URL: https://issues.apache.org/jira/browse/HBASE-5421
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
  Labels: build
 Fix For: 0.92.1

 Attachments: hbase-5421.patch


 Hadoop recently added hadoop-client and hadoop-minicluster artifacts for 
 Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009).
 Let's use them instead of manually specifying transitive dependency exclusion 
 lists (which is error prone and annoying).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5421) use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209933#comment-13209933
 ] 

stack commented on HBASE-5421:
--

@Shaneal Yeah, submit patch will trigger hadoopqa.  Lets see what it says.

 use hadoop-client/hadoop-minicluster artifacts for Hadoop 0.23 build
 

 Key: HBASE-5421
 URL: https://issues.apache.org/jira/browse/HBASE-5421
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
  Labels: build
 Fix For: 0.92.1

 Attachments: hbase-5421.patch


 Hadoop recently added hadoop-client and hadoop-minicluster artifacts for 
 Hadoop 0.23+ that don't export all the internal dependencies (HADOOP-8009).
 Let's use them instead of manually specifying transitive dependency exclusion 
 lists (which is error prone and annoying).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210093#comment-13210093
 ] 

stack commented on HBASE-5200:
--

Ok.  Go ahead commit Ram?

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210367#comment-13210367
 ] 

stack commented on HBASE-5075:
--

Thanks for doing this.  It looks very interesting.

Please do not reformat existing code.  It bloats your patch and makes reviews 
take longer; reviewer attention span is short (at least in this case) and its a 
shame to spend it going over code reformats.

On the patch, is this necessary: +  public String getRSPidAndRsZknode();

Can't you get the pid from a process listing?  Or you want us to publish it via 
jmx?   Or it looks like it is already published via jmx.  Can your tool pick it 
up there?  On the znode, can't you get the regionserver servername and then do 
lookup in zk directly?

Can't you have supervisor do this?  Is there not existing utilities that watch 
a pid and allow you do stuff when its gone?  Or is it that you'd kill the 
server if a long GC pause?

Do you have a bit of documentation on how this new utility works?

Thanks.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: 5075.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210393#comment-13210393
 ] 

stack commented on HBASE-5425:
--

Committed to 0.92 branch too.

  Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's 
 EnableTableHandler)
 

 Key: HBASE-5425
 URL: https://issues.apache.org/jira/browse/HBASE-5425
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5, 0.92.0
Reporter: terry zhang
 Fix For: 0.94.0

 Attachments: HBASE-5425.patch


 please take a look at the code below in EnableTableHandler(hbase master):
 {code:title=EnableTableHandler.java|borderStyle=solid}
 protected boolean waitUntilDone(long timeout)
 throws InterruptedException {
 
   .
   int lastNumberOfRegions = this.countOfRegionsInTable;
   while (!server.isStopped()  remaining  0) {
 Thread.sleep(waitingTimeForEvents);
 regions = assignmentManager.getRegionsOfTable(tableName);
 if (isDone(regions)) break;
 // Punt on the timeout as long we make progress
 if (regions.size()  lastNumberOfRegions) {
   lastNumberOfRegions = regions.size();
   timeout += waitingTimeForEvents;
 }
 remaining = timeout - (System.currentTimeMillis() - startTime);
 
 }
 private boolean isDone(final ListHRegionInfo regions) {
   return regions != null  regions.size() = this.countOfRegionsInTable;
 }
 {code} 
 We can easily find out if we let lastNumberOfRegions = 
 this.countOfRegionsInTable , the function of punt on timeout code will never 
 be executed. I think initlize lastNumberOfRegions = 0 can make it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210450#comment-13210450
 ] 

stack commented on HBASE-5407:
--

Liyin.  Is this a backport for 0.89fb?   If so, is there something you've added 
to your backport that we should have in trunk?  Thanks boss.

 Show the per-region level request/sec count in the web ui
 -

 Key: HBASE-5407
 URL: https://issues.apache.org/jira/browse/HBASE-5407
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch


 It would be nice to show the per-region level request/sec count in the web 
 ui, especially when debugging the hot region problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210460#comment-13210460
 ] 

stack commented on HBASE-5200:
--

Ram wants me to apply the patches here but the version for 0.90 is very 
different to the version for 0.92.  This is removed:

-RS_ZK_REGION_CLOSING  (1),   // RS is in process of closing a region


And this is added:

+M_ZK_REGION_CLOSING   (51),  // Master adds this region as closing in 
ZK

This looks like a port from 0.92?



 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, 5200-v3.txt, 5200-v4.txt, 
 HBASE-5200.patch, HBASE-5200_1.patch, 
 HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch, hbase-5200_90_latest_new.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5423) Regionserver may block forever on waitOnAllRegionsToClose when aborting

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210466#comment-13210466
 ] 

stack commented on HBASE-5423:
--

Patch looks good Chunhui.

Change name of Set from addedRegionsToCallClose to closed.

Why do we have this Set?  Are we calling close multiple times on same region?

So, we'd break even though online regions is not yet empty?

{code}
+  if (this.regionsInTransitionInRS.isEmpty()) {
+break;
+  }
{code}

Thanks.

 Regionserver may block forever on waitOnAllRegionsToClose when aborting
 ---

 Key: HBASE-5423
 URL: https://issues.apache.org/jira/browse/HBASE-5423
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5423.patch


 If closeRegion throws any exception (It would be caused by FS ) when RS is 
 aborting, 
 RS will block forever on waitOnAllRegionsToClose().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4640) Catch ClosedChannelException and document it

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210483#comment-13210483
 ] 

stack commented on HBASE-4640:
--

+1

On commit add the CCE.getMessage to the LOG.WARN just in case its got info of 
use (I'm fine on skipping stack trace)

 Catch ClosedChannelException and document it
 

 Key: HBASE-4640
 URL: https://issues.apache.org/jira/browse/HBASE-4640
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4640.patch


 ClosedChannelException is a pretty obscure exception for the non-expert and 
 doesn't tell you why you get it. We should instead catch it, print a WARN, 
 don't print a stack trace, and add a line in the book about this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210491#comment-13210491
 ] 

stack commented on HBASE-5120:
--

I committed to trunk.  Will not commit to 0.92.  Not important enough of a bug 
I'd say.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, 
 HBASE-5120_5.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210492#comment-13210492
 ] 

stack commented on HBASE-5120:
--

I did not commit to 0.90 either.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, 
 HBASE-5120_5.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 

[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210565#comment-13210565
 ] 

stack commented on HBASE-5195:
--

This looks like a pretty important fix.  Should it be more than major priority? 
 Should it go into 0.92.1?

 [Coprocessors] preGet hook does not allow overriding or wrapping filter on 
 incoming Get
 ---

 Key: HBASE-5195
 URL: https://issues.apache.org/jira/browse/HBASE-5195
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5195.patch


 Without the ability to wrap the internal Scan on the Get, we can't override 
 (or protect, in the case of access control) Gets as we can Scans. The result 
 is inconsistent behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210567#comment-13210567
 ] 

stack commented on HBASE-5209:
--

Patch looks excellent.

One issue is upping of the HMasterInterface version.  Its the 'right' thing to 
do but then it means I can't apply to 0.92.1 and it breaks a 0.92 talking to a 
0.94 which currently is possible.  Can you try adding the isActiveMaster to the 
end of the Interface and NOT update the version.  See if you can connect to a 
0.92.1 server from a 0.92.0 client and see if it you can do basic 
HMasterInterface operations such as isLoadBalancer running.



 HConnection/HMasterInterface should allow for way to get hostname of 
 currently active master in multi-master HBase setup
 

 Key: HBASE-5209
 URL: https://issues.apache.org/jira/browse/HBASE-5209
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Aditya Acharya
Assignee: David S. Wang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: HBASE-5209-v0.diff, HBASE-5209-v1.diff


 I have a multi-master HBase set up, and I'm trying to programmatically 
 determine which of the masters is currently active. But the API does not 
 allow me to do this. There is a getMaster() method in the HConnection class, 
 but it returns an HMasterInterface, whose methods do not allow me to find out 
 which master won the last race. The API should have a 
 getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5265) Fix 'revoke' shell command

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210570#comment-13210570
 ] 

stack commented on HBASE-5265:
--

Is this important for 0.92.1 lads?

 Fix 'revoke' shell command
 --

 Key: HBASE-5265
 URL: https://issues.apache.org/jira/browse/HBASE-5265
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Andrew Purtell
Assignee: Eugene Koontz
 Fix For: 0.94.0, 0.92.1


 The 'revoke' shell command needs to be reworked for the AccessControlProtocol 
 implementation that was finalized for 0.92. The permissions being removed 
 must exactly match what was previously granted. No wildcard matching is done 
 server side.
 Allow two forms of the command in the shell for convenience:
 Revocation of a specific grant:
 {code}
 revoke user, table, column family [ , column_qualifier ]
 {code}
 Have the shell automatically do so for all permissions on a table for a given 
 user:
 {code}
 revoke user, table
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5428) Allow for custom filters to be registered within the Thrift interface

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210576#comment-13210576
 ] 

stack commented on HBASE-5428:
--

@Robert We need this code because the filter 'language' only recognizes some 
base set of filters?  This code extends the set of filters known to the filter 
language?

 Allow for custom filters to be registered within the Thrift interface
 -

 Key: HBASE-5428
 URL: https://issues.apache.org/jira/browse/HBASE-5428
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Affects Versions: 0.92.0
Reporter: Robert Roland
  Labels: patch
 Fix For: 0.94.0

 Attachments: ThriftCustomFilters.patch


 Custom filters work within the Java client API, but are not accessible within 
 the Thrift API.  Attempting to use one will generate a Filter Name x not 
 supported
 Attached patch allows a user to specify a list of custom filters that are 
 registered at Thrift server startup time within the HBase configuration files:
 property
   namehbase.thrift.filters/name
   valueMyFilter:com.foo.Filter,OtherFilter:com.foo.OtherFilter/value
 /property
 Patch created off SVN r1245727

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210582#comment-13210582
 ] 

stack commented on HBASE-5279:
--

Committed to 0.92 and to TRUNK.  Thanks for the patch Tobias.

 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5195) [Coprocessors] preGet hook does not allow overriding or wrapping filter on incoming Get

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210590#comment-13210590
 ] 

stack commented on HBASE-5195:
--

oops.  Its not already in trunk.  But my commit to get it into trunk was messy. 
 I did it in three commits; first a bungled commit that added half which I then 
reverted.  Then added back.  Here is where I made my mess:

{code}

r1245773 | stack | 2012-02-17 13:34:46 -0800 (Fri, 17 Feb 2012) | 1 line

HBASE-5195 [Coprocessors] preGet hook does not allow overriding or wrapping 
filter on incoming Get -- SECOND HALF OF THIS COMMIT

r1245768 | stack | 2012-02-17 13:26:23 -0800 (Fri, 17 Feb 2012) | 1 line

HBASE-5279 NPE in Master after upgrading to 0.92.0 -- REVERT OVERCOMMIT TO 
HREGION

r1245767 | stack | 2012-02-17 13:24:21 -0800 (Fri, 17 Feb 2012) | 1 line

HBASE-5279 NPE in Master after upgrading to 0.92.0
{code}

r1245767 adds

{code}
@@ -3684,6 +3682,8 @@
}
 }
 
+Scan scan = new Scan(get);
+
 RegionScanner scanner = null;
 try {
   scanner = getScanner(scan);
{code}

... but not...


{code}
@@ -3673,8 +3673,6 @@
*/
   private ListKeyValue get(Get get, boolean withCoprocessor)
   throws IOException {
-Scan scan = new Scan(get);
-
 ListKeyValue results = new ArrayListKeyValue();
 
 // pre-get CP hook
{code}

... then r1245768 reverts it because above failed to go in.

Then r1245773 is fixup.

 [Coprocessors] preGet hook does not allow overriding or wrapping filter on 
 incoming Get
 ---

 Key: HBASE-5195
 URL: https://issues.apache.org/jira/browse/HBASE-5195
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5195.patch


 Without the ability to wrap the internal Scan on the Get, we can't override 
 (or protect, in the case of access control) Gets as we can Scans. The result 
 is inconsistent behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210616#comment-13210616
 ] 

stack commented on HBASE-5399:
--

I don't follow:

{code}
+// todo stack nkeywal
+// We used to check in a loop that the master was running.
+// 1) There were imbricated tries loop. One here and one in getMaster
+// 2) As the master can disappear, it may not be necessary to check it.
+// We don't need the connection immediately, but we used to check the
+//  connection at the beginning in the past, and it's more user friendly
+//  to have the error immediately.
+// connection.isMasterRunning();
{code}

I'd say lets not check master is there till we need it.  Seems like a PITA 
going ahead and checking master on construction.  This changes the behavior but 
I think its one thats ok to change.

You are doing your own Callables.  You've seen the Callables that go on in 
HTable.  Any reason you avoid them?  I suppose this is different in that you 
want to let go of the shared master.  Looks fine.

Are we getting retrieveClusterId on startup?  Can we not do that?  Can we do 
that when someone asks for it? Or is it happening after we've set up the zk 
connection anyways?

Patch makes sense so far.  Good stuff N.













 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210629#comment-13210629
 ] 

stack commented on HBASE-5422:
--

I think I understand what is going on.  Help me out Chunhui.  So, on region 
open, we do update of the RIT timers.  When bulk opening, we are not adding 
plans to this.regionPlans so that when an open comes in from a bulk assign, 
since we go against this.regionPlans, we'll not update timers of other 
outstanding RITs?  This seems like nice bug fix.

I see that BulkReOpen adds to this.regionPlans but it does it one at a time.  
Should it use your putAll?  Maybe we should make an addPlan method that takes a 
Map of plans and have it used by BulkReOpen and by BulkOpen?

 StartupBulkAssigner would cause a lot of timeout on RIT when assigning large 
 numbers of regions (timeout = 3 mins)
 --

 Key: HBASE-5422
 URL: https://issues.apache.org/jira/browse/HBASE-5422
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
 Attachments: 5422-90.patch, hbase-5422.patch


 In our produce environment
 We find a lot of timeout on RIT when cluster up, there are about 7w regions 
 in the cluster( 25 regionservers ).
 First, we could see the following log:(See the region 
 33cf229845b1009aa8a3f7b0f85c9bd0)
 master's log
 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Async create of unassigned node for 
 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 
 2012-02-13 18:07:42,560 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
  rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=OFFLINE, ts=1329127661409, 
 server=r03f11025.yh.aliyun.com,60020,1329127549907 
 2012-02-13 18:07:42,996 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
  rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=OFFLINE, ts=1329127661409 
 2012-02-13 18:10:48,072 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=PENDING_OPEN, ts=1329127662996
 2012-02-13 18:10:48,072 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 2012-02-13 18:11:16,744 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_OPENED, 
 server=r03f11025.yh.aliyun.com,60020,1329127549907, 
 region=33cf229845b1009aa8a3f7b0f85c9bd0 
 2012-02-13 18:38:07,310 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 
 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Deleting existing unassigned node for 
 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state 
 RS_ZK_REGION_OPENED 
 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 
 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 
 2012-02-13 18:38:07,573 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on 
 r03f11025.yh.aliyun.com,60020,1329127549907 
 2012-02-13 18:50:54,428 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so 
 generated a random one; 
 hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., 
 src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, 
 exclude=null) available servers 
 2012-02-13 18:50:54,428 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to 
 r01b05043.yh.aliyun.com,60020,1329127549041 
 2012-02-13 19:31:50,514 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 state=PENDING_OPEN, ts=1329132528086 
 2012-02-13 19:31:50,514 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 
 Regionserver's log
 2012-02-13 18:07:43,537 INFO 
 

[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210643#comment-13210643
 ] 

stack commented on HBASE-4403:
--

@Enis Agree

 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-4403-interface.txt, 
 hbase-4403-nowhere-near-done.txt


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5371) Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210663#comment-13210663
 ] 

stack commented on HBASE-5371:
--

bq. What about re-changing the version to 1, since we just added a new method, 
but not changed anything on the wire, it should be compatible. The only catch 
is that if you invoke the new API from a new client, but the server is using 
the old version, you would get a NoSuchMethod or smt. Is that asking for too 
much, wdyt?

I think it a good idea.  Test an old client talking to a new w/o changing the 
version.  See what happens.  If it works, lets get it into 0.92.

 Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) 
 API
 

 Key: HBASE-5371
 URL: https://issues.apache.org/jira/browse/HBASE-5371
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.94.0

 Attachments: HBASE-5371_v2.patch, HBASE-5371_v3-noprefix.patch, 
 HBASE-5371_v3.patch


 We need to introduce something like 
 AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
 that clients can check access rights before carrying out the operations. We 
 need this kind of operation for HCATALOG-245, which introduces authorization 
 providers for hbase over hcat. We cannot use getUserPermissions() since it 
 requires ADMIN permissions on the global/table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5371) Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210668#comment-13210668
 ] 

stack commented on HBASE-5371:
--

@Enis It has to work w/ 0.92?  You can't wait on 0.94?  Which should be 
soonish... Month or two?

 Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) 
 API
 

 Key: HBASE-5371
 URL: https://issues.apache.org/jira/browse/HBASE-5371
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.94.0

 Attachments: HBASE-5371_v2.patch, HBASE-5371_v3-noprefix.patch, 
 HBASE-5371_v3.patch


 We need to introduce something like 
 AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
 that clients can check access rights before carrying out the operations. We 
 need this kind of operation for HCATALOG-245, which introduces authorization 
 providers for hbase over hcat. We cannot use getUserPermissions() since it 
 requires ADMIN permissions on the global/table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4925) Collect test cases for hadoop/hbase cluster

2012-02-17 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210670#comment-13210670
 ] 

stack commented on HBASE-4925:
--

@Gaojinchao What did you finish?  You have an automation that runs some of 
Thomas's tests?

 Collect test cases for hadoop/hbase cluster
 ---

 Key: HBASE-4925
 URL: https://issues.apache.org/jira/browse/HBASE-4925
 Project: HBase
  Issue Type: Brainstorming
  Components: test
Reporter: Thomas Pan

 This entry is used to collect all the useful test cases to verify a 
 hadoop/hbase cluster. This is to follow up on yesterday's hack day in 
 Salesforce. Hopefully that the information would be very useful for the whole 
 community.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251300#comment-13251300
 ] 

stack commented on HBASE-5756:
--

You will have to convince Lars to apply hbase-5655 to trunk.  Its too late to 
do it for 0.92 I'd say.

 we can change defalult File Appender to RFA instead of DRFA.
 

 Key: HBASE-5756
 URL: https://issues.apache.org/jira/browse/HBASE-5756
 Project: HBase
  Issue Type: Bug
Reporter: rohithsharma
Priority: Minor

 This can be a point of concern when on a certain day the logging happens more 
 because of more and more activity. In that case the log file for that day can 
 grow huge. These logs can not be opened for analysis since size is more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251299#comment-13251299
 ] 

stack commented on HBASE-5756:
--

You will have to convince Lars to apply hbase-5655 to trunk.  Its too late to 
do it for 0.92 I'd say.

 we can change defalult File Appender to RFA instead of DRFA.
 

 Key: HBASE-5756
 URL: https://issues.apache.org/jira/browse/HBASE-5756
 Project: HBase
  Issue Type: Bug
Reporter: rohithsharma
Priority: Minor

 This can be a point of concern when on a certain day the logging happens more 
 because of more and more activity. In that case the log file for that day can 
 grow huge. These logs can not be opened for analysis since size is more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251341#comment-13251341
 ] 

stack commented on HBASE-5604:
--

None +1 on commit.

 M/R tool to replay WAL files
 

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 5604-v9.txt, HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251345#comment-13251345
 ] 

stack commented on HBASE-5756:
--

I would not hold up a release for this.   Rohit, if you want to make a patch 
for the other branches, that would be welcome.  Sounds like you might make an 
entry for the reference guide too A note on keeping a cap on growing logs?

 we can change defalult File Appender to RFA instead of DRFA.
 

 Key: HBASE-5756
 URL: https://issues.apache.org/jira/browse/HBASE-5756
 Project: HBase
  Issue Type: Bug
Reporter: rohithsharma
Priority: Minor

 This can be a point of concern when on a certain day the logging happens more 
 because of more and more activity. In that case the log file for that day can 
 grow huge. These logs can not be opened for analysis since size is more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251351#comment-13251351
 ] 

stack commented on HBASE-5666:
--

Yes.  That looks good.  Let me retry against hadoopqa.  A test hung above.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, 
 HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, 
 HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, 
 hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, 
 hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251352#comment-13251352
 ] 

stack commented on HBASE-5666:
--

I see that you don't have the above change in v7.  You want to add it in a v8 
and retry hadoopqa?   Thanks Matteo.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, 
 HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, 
 HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, 
 hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, 
 hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4336) Convert source tree into maven modules

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251355#comment-13251355
 ] 

stack commented on HBASE-4336:
--

bq. I'm for having it now since we know we are going to need common and 
security isn't yet in (and could still be required if we want to compile 
against older versions).

One suggestion would be to have this issue predicated on security being rolled 
back into core (hbase-5372) -- then we could do w/o having to mess with a 
security module.

Or not.  Just do hbase-common and hbase-security?  Can roll hbase-security back 
into common when time comes.

I'm excited about this one.  I liked the notion by Mat Corgan that we'd have an 
hbase-hregion module and then there'd be an hbase-wal so Li Pi can experiment 
standalone w/ multiple WALs at the one time, etc., etc.  The new client would 
be one.

 Convert source tree into maven modules
 --

 Key: HBASE-4336
 URL: https://issues.apache.org/jira/browse/HBASE-4336
 Project: HBase
  Issue Type: Task
  Components: build
Reporter: Gary Helmling
Priority: Critical
 Fix For: 0.96.0


 When we originally converted the build to maven we had a single core module 
 defined, but later reverted this to a module-less build for the sake of 
 simplicity.
 It now looks like it's time to re-address this, as we have an actual need for 
 modules to:
 * provide a trimmed down client library that applications can make use of
 * more cleanly support building against different versions of Hadoop, in 
 place of some of the reflection machinations currently required
 * incorporate the secure RPC engine that depends on some secure Hadoop classes
 I propose we start simply by refactoring into two initial modules:
 * core - common classes and utilities, and client-side code and interfaces
 * server - master and region server implementations and supporting code
 This would also lay the groundwork for incorporating the HBase security 
 features that have been developed.  Once the module structure is in place, 
 security-related features could then be incorporated into a third module -- 
 security -- after normal review and approval.  The security module could 
 then depend on secure Hadoop, without modifying the dependencies of the rest 
 of the HBase code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5756) we can change defalult File Appender to RFA instead of DRFA.

2012-04-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251950#comment-13251950
 ] 

stack commented on HBASE-5756:
--

Default is DRFA in 0.94 and before.  RFA after (0.96)

 we can change defalult File Appender to RFA instead of DRFA.
 

 Key: HBASE-5756
 URL: https://issues.apache.org/jira/browse/HBASE-5756
 Project: HBase
  Issue Type: Bug
Reporter: rohithsharma
Priority: Minor

 This can be a point of concern when on a certain day the logging happens more 
 because of more and more activity. In that case the log file for that day can 
 grow huge. These logs can not be opened for analysis since size is more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5737) Minor Improvements related to balancer.

2012-04-11 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252207#comment-13252207
 ] 

stack commented on HBASE-5737:
--

Ram, the AM#setBalancer is not right?  Doesn't AM make a balancer instance of 
its own up in its constructor?  We should at least remove that.  Could we pass 
in the load balancer to use into the AM's constructor rather than call a 
setBalancer method?

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5747) Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252228#comment-13252228
 ] 

stack commented on HBASE-5747:
--

Not sure why tests are not completing.  Running on a mac I see problem in this 
test:

{code}
Running org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.415 sec  
FAILURE!

Results :

Failed tests:   
testLeaderSelection(org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager): New 
leader should exist
{code}

 Forward port hbase-5708 [89-fb] Make MiniMapRedCluster directory a 
 subdirectory of target/test
 

 Key: HBASE-5747
 URL: https://issues.apache.org/jira/browse/HBASE-5747
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Attachments: 5474.txt, 5474v2.txt, 5474v3 (1).txt, 5474v3.txt


 Forward port as much as we can of Mikhail's hard-won test cleanups over on 
 0.89 branch  Will improve our being able to run unit tests in //.  He also 
 found a few bugs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252504#comment-13252504
 ] 

stack commented on HBASE-5754:
--

Let me do the same.  I did not match generator map tasks to verify reducers.  
Then let me recreate the split issue Eric describes above.  Thanks lads.

 data lost with gora continuous ingest test (goraci)
 ---

 Key: HBASE-5754
 URL: https://issues.apache.org/jira/browse/HBASE-5754
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
 Environment: 10 node test cluster
Reporter: Eric Newton
Assignee: stack

 Keith Turner re-wrote the accumulo continuous ingest test using gora, which 
 has both hbase and accumulo back-ends.
 I put a billion entries into HBase, and ran the Verify map/reduce job.  The 
 verification failed because about 21K entries were missing.  The goraci 
 [README|https://github.com/keith-turner/goraci] explains the test, and how it 
 detects missing data.
 I re-ran the test with 100 million entries, and it verified successfully.  
 Both of the times I tested using a billion entries, the verification failed.
 If I run the verification step twice, the results are consistent, so the 
 problem is
 probably not on the verify step.
 Here's the versions of the various packages:
 ||package||version||
 |hadoop|0.20.205.0|
 |hbase|0.92.1|
 |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
 |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
 The change I made to goraci was to configure it for hbase and to allow it to 
 build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5773) HtablePool constructor not reading config files in certain cases

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252586#comment-13252586
 ] 

stack commented on HBASE-5773:
--

It doesn't apply to 0.90 branch.

 HtablePool constructor not reading config files in certain cases
 

 Key: HBASE-5773
 URL: https://issues.apache.org/jira/browse/HBASE-5773
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.6, 0.92.1, 0.94.1
Reporter: Ioan Eugen Stan
Priority: Minor
 Fix For: 0.92.2, 0.94.0

 Attachments: different-config-behaviour.patch


 Creating a HtablePool can issue two behaviour depanding on the constructor 
 called. 
 Case 1: loads the configs from hbase-site
   public HTablePool() {
 this(HBaseConfiguration.create(), Integer.MAX_VALUE);
   }
 Calling this with null values for Configuration: 
 public HTablePool(final Configuration config, final int maxSize) {
 this(config, maxSize, null, null);
   }
 will issue:
  public HTablePool(final Configuration config, final int maxSize,
   final HTableInterfaceFactory tableFactory, PoolType poolType) {
 // Make a new configuration instance so I can safely cleanup when
 // done with the pool.
 this.config = config == null ? new Configuration() : config;
 which does not read the hbase-site config files as 
 HBaseConfiguration.create() does. 
 I've tracked this problem to all versions of hbase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252709#comment-13252709
 ] 

stack commented on HBASE-3443:
--

0.94?

 ICV optimization to look in memstore first and then store files (HBASE-3082) 
 does not work when deletes are in the mix
 --

 Key: HBASE-3443
 URL: https://issues.apache.org/jira/browse/HBASE-3443
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3, 0.90.4, 0.90.5, 0.90.6, 
 0.92.0, 0.92.1
Reporter: Kannan Muthukkaruppan
Assignee: Lars Hofhansl
Priority: Critical
  Labels: corruption
 Fix For: 0.96.0

 Attachments: 3443.txt


 For incrementColumnValue() HBASE-3082 adds an optimization to check memstores 
 first, and only if not present in the memstore then check the store files. In 
 the presence of deletes, the above optimization is not reliable.
 If the column is marked as deleted in the memstore, one should not look 
 further into the store files. But currently, the code does so.
 Sample test code outline:
 {code}
 admin.createTable(desc)
 table = HTable.new(conf, tableName)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 admin.flush(tableName)
 sleep(2)
 del = Delete.new(Bytes.toBytes(row))
 table.delete(del)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 get = Get.new(Bytes.toBytes(row))
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())};
 end
 {code}
 The above prints:
 {code}
 Expect 5; Got Value=10
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3443) ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252744#comment-13252744
 ] 

stack commented on HBASE-3443:
--

Oh, and if you don't fix it, you'll have to explain why you didn't to BenoƮt.

 ICV optimization to look in memstore first and then store files (HBASE-3082) 
 does not work when deletes are in the mix
 --

 Key: HBASE-3443
 URL: https://issues.apache.org/jira/browse/HBASE-3443
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3, 0.90.4, 0.90.5, 0.90.6, 
 0.92.0, 0.92.1
Reporter: Kannan Muthukkaruppan
Assignee: Lars Hofhansl
Priority: Critical
  Labels: corruption
 Fix For: 0.96.0

 Attachments: 3443.txt


 For incrementColumnValue() HBASE-3082 adds an optimization to check memstores 
 first, and only if not present in the memstore then check the store files. In 
 the presence of deletes, the above optimization is not reliable.
 If the column is marked as deleted in the memstore, one should not look 
 further into the store files. But currently, the code does so.
 Sample test code outline:
 {code}
 admin.createTable(desc)
 table = HTable.new(conf, tableName)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 admin.flush(tableName)
 sleep(2)
 del = Delete.new(Bytes.toBytes(row))
 table.delete(del)
 table.incrementColumnValue(Bytes.toBytes(row), cf1name, 
 Bytes.toBytes(column), 5);
 get = Get.new(Bytes.toBytes(row))
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Expect 5; Got Value=#{Bytes.toLong(keyValue.getValue())};
 end
 {code}
 The above prints:
 {code}
 Expect 5; Got Value=10
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5777) MiniHBaseCluster cannot start multiple region servers

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252881#comment-13252881
 ] 

stack commented on HBASE-5777:
--

We have an hbase-site.xml at src/test that is used when we run tests.  It 
disables the UI.  You think we should apply this patch too Jimmy?

 MiniHBaseCluster cannot start multiple region servers
 -

 Key: HBASE-5777
 URL: https://issues.apache.org/jira/browse/HBASE-5777
 Project: HBase
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-5777.patch


 MiniHBaseCluster can try to start multiple region servers.  But all of them 
 except one will die in putting up the web UI
 because of BindException since HConstants.REGIONSERVER_INFO_PORT_AUTO is set 
 to false by default.
 This issue will make many unit tests depending on multiple region servers 
 flaky, such as TestAdmin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252947#comment-13252947
 ] 

stack commented on HBASE-5778:
--

+1  Add release note w/ how to turn it off

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

2012-04-12 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253135#comment-13253135
 ] 

stack commented on HBASE-5754:
--

I ran w/ 10 generators and 10 slots for the verify step and got the below which 
doesn't prints out only a REFERENCED count.

Running these recent tests I let it do its natural splitting so it grew from 
zero to 260odd regions so maybe the issue you see Eric comes of manual splits 
coming out of the UI.  Let me try that next.

Thanks lads.

{code}
12/04/13 05:16:23 INFO mapred.JobClient:  map 100% reduce 99%
12/04/13 05:16:54 INFO mapred.JobClient:  map 100% reduce 100%
12/04/13 05:16:59 INFO mapred.JobClient: Job complete: job_201204092039_0046
12/04/13 05:16:59 INFO mapred.JobClient: Counters: 30
12/04/13 05:16:59 INFO mapred.JobClient:   Job Counters
12/04/13 05:16:59 INFO mapred.JobClient: Launched reduce tasks=10
12/04/13 05:16:59 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=30125694
12/04/13 05:16:59 INFO mapred.JobClient: Total time spent by all reduces 
waiting after reserving slots (ms)=0
12/04/13 05:16:59 INFO mapred.JobClient: Total time spent by all maps 
waiting after reserving slots (ms)=0
12/04/13 05:16:59 INFO mapred.JobClient: Rack-local map tasks=6
12/04/13 05:16:59 INFO mapred.JobClient: Launched map tasks=256
12/04/13 05:16:59 INFO mapred.JobClient: Data-local map tasks=250
12/04/13 05:16:59 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5832198
12/04/13 05:16:59 INFO mapred.JobClient:   goraci.Verify$Counts
12/04/13 05:16:59 INFO mapred.JobClient: REFERENCED=10
12/04/13 05:16:59 INFO mapred.JobClient:   File Output Format Counters
12/04/13 05:16:59 INFO mapred.JobClient: Bytes Written=0
12/04/13 05:16:59 INFO mapred.JobClient:   FileSystemCounters
12/04/13 05:16:59 INFO mapred.JobClient: FILE_BYTES_READ=83022967343
12/04/13 05:16:59 INFO mapred.JobClient: HDFS_BYTES_READ=156414
12/04/13 05:16:59 INFO mapred.JobClient: FILE_BYTES_WRITTEN=112881560332
12/04/13 05:16:59 INFO mapred.JobClient:   File Input Format Counters
12/04/13 05:16:59 INFO mapred.JobClient: Bytes Read=0
12/04/13 05:16:59 INFO mapred.JobClient:   Map-Reduce Framework
12/04/13 05:16:59 INFO mapred.JobClient: Map output materialized 
bytes=29992170602
12/04/13 05:16:59 INFO mapred.JobClient: Map input records=10
12/04/13 05:16:59 INFO mapred.JobClient: Reduce shuffle bytes=29874879887
12/04/13 05:16:59 INFO mapred.JobClient: Spilled Records=7527086436
12/04/13 05:16:59 INFO mapred.JobClient: Map output bytes=25992155242
12/04/13 05:16:59 INFO mapred.JobClient: CPU time spent (ms)=20182570
12/04/13 05:16:59 INFO mapred.JobClient: Total committed heap usage 
(bytes)=99953082368
12/04/13 05:16:59 INFO mapred.JobClient: Combine input records=0
12/04/13 05:16:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=156414
12/04/13 05:16:59 INFO mapred.JobClient: Reduce input records=20
12/04/13 05:16:59 INFO mapred.JobClient: Reduce input groups=10
12/04/13 05:16:59 INFO mapred.JobClient: Combine output records=0
12/04/13 05:16:59 INFO mapred.JobClient: Physical memory (bytes) 
snapshot=91762372608
12/04/13 05:16:59 INFO mapred.JobClient: Reduce output records=0
12/04/13 05:16:59 INFO mapred.JobClient: Virtual memory (bytes) 
snapshot=391126540288
12/04/13 05:16:59 INFO mapred.JobClient: Map output records=20
{code}

 data lost with gora continuous ingest test (goraci)
 ---

 Key: HBASE-5754
 URL: https://issues.apache.org/jira/browse/HBASE-5754
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
 Environment: 10 node test cluster
Reporter: Eric Newton
Assignee: stack

 Keith Turner re-wrote the accumulo continuous ingest test using gora, which 
 has both hbase and accumulo back-ends.
 I put a billion entries into HBase, and ran the Verify map/reduce job.  The 
 verification failed because about 21K entries were missing.  The goraci 
 [README|https://github.com/keith-turner/goraci] explains the test, and how it 
 detects missing data.
 I re-ran the test with 100 million entries, and it verified successfully.  
 Both of the times I tested using a billion entries, the verification failed.
 If I run the verification step twice, the results are consistent, so the 
 problem is
 probably not on the verify step.
 Here's the versions of the various packages:
 ||package||version||
 |hadoop|0.20.205.0|
 |hbase|0.92.1|
 |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
 |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
 The change I made to goraci was to configure it for hbase and to allow it to 
 build properly.

--
This message is automatically generated by JIRA.
If you think it was 

[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253454#comment-13253454
 ] 

stack commented on HBASE-5778:
--

I backed it out of 0.94 and trunk.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5784) Enable mvn deploy of website

2012-04-13 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253691#comment-13253691
 ] 

stack commented on HBASE-5784:
--

Committed to trunk

 Enable mvn deploy of website
 

 Key: HBASE-5784
 URL: https://issues.apache.org/jira/browse/HBASE-5784
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Fix For: 0.96.0

 Attachments: 5784.txt


 Up to this, deploy of website has been build local and then copy up to apache 
 and put it into place under /www/hbase.apache.org.  Change it so can have 
 maven deploy the site.  The good thing about having the latter do it is that 
 its regular; permissions will always be the same so Doug and I won't be 
 fighting each other when we stick stuff up there.  Also, its a one step 
 process rather than multiple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




<    4   5   6   7   8   9   10   11   12   13   >