from:"stack \(Commented\) \(JIRA\)"

[jira] [Commented] (HBASE-5244) Add a test for a column with 1M (10M? 100M) items and see how we do with it

2012-01-21 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190542#comment-13190542
 ] 

stack commented on HBASE-5244:
--

Lets add a test that fills a row with a million columns and see how we do 
returning items out of it.  How big do we think we should be able to go in a 
single row?  1M columns?  10M?  100M?  Whatever we think we should be able to 
do, we should have a test for it.

Such a test would not be part of our general test suite but instead would be 
done up with the new 'mvn verify' failsafe facility.

 Add a test for a column with 1M (10M? 100M) items and see how we do with it
 ---

 Key: HBASE-5244
 URL: https://issues.apache.org/jira/browse/HBASE-5244
 Project: HBase
  Issue Type: Task
Reporter: stack



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4920) We need a mascot, a totem

2012-01-22 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190777#comment-13190777
]

stack commented on HBASE-4920:
--

I like circle. I like the monochormatic take. I wonder if black and white
might not be better since then its more identifiably an orca. It does make me
think of a hockey team for some reason. It doesn't seem to go well w/ the
'apache hbase'. Maybe inverted and on right side would help and perhaps
smaller. Tell your buddy thanks Marcy for helping move this along.

We need a mascot, a totem
-

Key: HBASE-4920
URL: https://issues.apache.org/jira/browse/HBASE-4920
Project: HBase
Issue Type: Task
Reporter: stack
Attachments: HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot
2011-11-30 at 4.06.17 PM.png, photo (2).JPG

We need a totem for our t-shirt that is yet to be printed. O'Reilly owns the
Clyesdale. We need something else.
We could have a fluffy little duck that quacks 'hbase!' when you squeeze it
and we could order boxes of them from some off-shore sweatshop that
subcontracts to a contractor who employs child labor only.
Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from
Salesforce showed me, that was a bit too spiritual for me to be seen quoting
here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in
translation, bigdata).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5237) Addendum for HBASE-5160 and HBASE-4397

2012-01-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190788#comment-13190788
 ] 

stack commented on HBASE-5237:
--

+1 on commit to 0.92 branch so we are consistent with 0.90 and with TRUNK.

The patch is a little odd though in that if getRegionPlan returns null, it 
means no servers online supposedly.  And then in this case we set a flag up in 
TM.  But TM only runs every 30minutes so the setting of this flag doesn't do 
much?  This patch is setting all servers offline in the middle of an assign.  
It doesn't seem like we should be doing this here.

Anyways, +1 so we are consistent with other branches.

 Addendum for HBASE-5160 and HBASE-4397
 --

 Key: HBASE-5237
 URL: https://issues.apache.org/jira/browse/HBASE-5237
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.6

 Attachments: HBASE-5237_0.90.patch, HBASE-5237_trunk.patch


 As part of HBASE-4397 there is one more scenario where the patch has to be 
 applied.
 {code}
 RegionPlan plan = getRegionPlan(state, forceNewPlan);
   if (plan == null) {
 debugLog(state.getRegion(),
 Unable to determine a plan to assign  + state);
 return; // Should get reassigned later when RIT times out.
   }
 {code}
 I think in this scenario also 
 {code}
 this.timeoutMonitor.setAllRegionServersOffline(true);
 {code}
 this should be done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5243) LogSyncerThread not getting shutdown waiting for the interrupted flag

2012-01-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190793#comment-13190793
 ] 

stack commented on HBASE-5243:
--

+1 for 0.92 branch

 LogSyncerThread not getting shutdown waiting for the interrupted flag
 -

 Key: HBASE-5243
 URL: https://issues.apache.org/jira/browse/HBASE-5243
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: HBASE-5243_0.90.patch, HBASE-5243_0.90_1.patch, 
 HBASE-5243_trunk.patch


 In the LogSyncer run() we keep looping till this.isInterrupted flag is set.
 But in some cases the DFSclient is consuming the Interrupted exception.  So
 we are running into infinite loop in some shutdown cases.
 I would suggest that as we are the ones who tries to close down the
 LogSyncerThread we can introduce a variable like
 Close or shutdown and based on the state of this flag along with
 isInterrupted() we can make the thread stop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5245) HBase shell should use alternate jruby if JRUBY_HOME is set, should pass along JRUBY_OPTS

2012-01-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190795#comment-13190795
 ] 

stack commented on HBASE-5245:
--

I tried apply to TRUNK and it fails.  Mind fixing Philip?  And I'm interested 
as to why you need this?  Thanks.

 HBase shell should use alternate jruby if JRUBY_HOME is set, should pass 
 along JRUBY_OPTS
 -

 Key: HBASE-5245
 URL: https://issues.apache.org/jira/browse/HBASE-5245
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.90.4
Reporter: Philip (flip) Kromer
Priority: Minor
 Attachments: hbase-jruby_home-and-jruby_opts.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 Invoking {{hbase shell}}, the hbase runner launches the jruby jar directly, 
 and so behaves differently than the traditional jruby runner. Specifically, it
 * does not respect the {{JRUBY_OPTS}} environment variable (among other 
 things, I cannot launch the shell to use ruby-1.9 mode)
 * does not respect the {{JRUBY_HOME}} environment variable (placing things in 
 an inconsistent state if my classpath holds the system jruby).
 This patch allows you to use an alternative jruby and to specify options to 
 the jruby jar.
 * When the command is 'shell', adds {{$JRUBY_OPTS}} to the CLASS
 * When the command is 'shell' and {{$JRUBY_HOME}} is set, adds 
 {{$JRUBY_HOME/lib/jruby.jar}} to the classpath, and sets {{-Djruby.home}} 
 and {{-Djruby.job}} config variables.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5245) HBase shell should use alternate jruby if JRUBY_HOME is set, should pass along JRUBY_OPTS

2012-01-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190797#comment-13190797
 ] 

stack commented on HBASE-5245:
--

Oh, nvm.  I see you say why you need this above.

 HBase shell should use alternate jruby if JRUBY_HOME is set, should pass 
 along JRUBY_OPTS
 -

 Key: HBASE-5245
 URL: https://issues.apache.org/jira/browse/HBASE-5245
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.90.4
Reporter: Philip (flip) Kromer
Priority: Minor
 Attachments: hbase-jruby_home-and-jruby_opts.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 Invoking {{hbase shell}}, the hbase runner launches the jruby jar directly, 
 and so behaves differently than the traditional jruby runner. Specifically, it
 * does not respect the {{JRUBY_OPTS}} environment variable (among other 
 things, I cannot launch the shell to use ruby-1.9 mode)
 * does not respect the {{JRUBY_HOME}} environment variable (placing things in 
 an inconsistent state if my classpath holds the system jruby).
 This patch allows you to use an alternative jruby and to specify options to 
 the jruby jar.
 * When the command is 'shell', adds {{$JRUBY_OPTS}} to the CLASS
 * When the command is 'shell' and {{$JRUBY_HOME}} is set, adds 
 {{$JRUBY_HOME/lib/jruby.jar}} to the classpath, and sets {{-Djruby.home}} 
 and {{-Djruby.job}} config variables.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5237) Addendum for HBASE-5160 and HBASE-4397

2012-01-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190873#comment-13190873
 ] 

stack commented on HBASE-5237:
--

Thanks for explanation Ram.  This code is hard to follow.  That a call to 
getRegionPlan returning null means ...there are no regionservers on line and 
then setting a flag on the timeout monitor to indicate that seems like extreme 
indirection.  We can fix that in another issue.  Please commit this patch to 
0.92.

 Addendum for HBASE-5160 and HBASE-4397
 --

 Key: HBASE-5237
 URL: https://issues.apache.org/jira/browse/HBASE-5237
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.6

 Attachments: HBASE-5237_0.90.patch, HBASE-5237_trunk.patch


 As part of HBASE-4397 there is one more scenario where the patch has to be 
 applied.
 {code}
 RegionPlan plan = getRegionPlan(state, forceNewPlan);
   if (plan == null) {
 debugLog(state.getRegion(),
 Unable to determine a plan to assign  + state);
 return; // Should get reassigned later when RIT times out.
   }
 {code}
 I think in this scenario also 
 {code}
 this.timeoutMonitor.setAllRegionServersOffline(true);
 {code}
 this should be done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

2012-01-22 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190894#comment-13190894
 ] 

stack commented on HBASE-5179:
--

Here is some review of v17.

logDirExists should be called getLogDir or getLogDirIfExists?

Why is the below done up in ServerManager rather than down in 
ServerShutdownHandler?  I'd think it'd make more sense therein?  Perhaps even 
inside in the MetaServerShutdownHandler or whatever its called?   Not a biggie. 
 Just asking.

Can we talk more about what there are: +   * @param onlineServers onlined 
servers when master starts

Are these servers that have checked in between master start and the call to 
processFailover.  Could other servers come between the making of the list we 
pass into processFailover and the running of the processFailover code?  Should 
this be a 'live' list?   Or, I see that we are actually getting rid of the 
'live' list of online servers to replace it w/ this static one in these lines:

{code}
-  } else if 
(!serverManager.isServerOnline(regionLocation.getServerName())) {
+  } else if (!onlineServers.contains(regionLocation.getServerName())) {
{code}

Why do we do this?

Could this block of be code be done out in a nice coherent method rather than 
inline here in HMaster:

{code}
+// Check zk for regionservers that are up but didn't register
+for (String sn : this.regionServerTracker.getOnlineServerNames()) {
+  if (!this.serverManager.isServerOnline(sn)) {
+HServerInfo serverInfo = HServerInfo.getServerInfo(sn);
+if (serverInfo != null) {
+  HServerInfo existingServer = serverManager
+  .haveServerWithSameHostAndPortAlready(serverInfo
+  .getHostnamePort());
+  if (existingServer == null) {
+// Not registered; add it.
+LOG.info(Registering server found up in zk but who has not yet 
++ reported in:  + sn);
+// We set serverLoad with one region, it could differentiate with
+// regionserver which is started just now
+HServerLoad serverLoad = new HServerLoad();
+serverLoad.setNumberOfRegions(1);
+serverInfo.setLoad(serverLoad);
+this.serverManager.recordNewServer(serverInfo, true, null);
+  }
+} else {
+  LOG.warn(Server  + sn
+  +  found up in zk, but is not a correct server name);
+}
+  }
+}
{code}

Can we say more about why we are doing this?  And how do we know that it did 
not just start?  Because if it has more than 0 regions, it must have been 
already running?

{code}
+if (rootServerLoad != null  rootServerLoad.getNumberOfRegions()  0) 
{
+  // If rootServer is online  not start just now, we expire it
+  this.serverManager.expireServer(rootServerInfo);
+}
{code}

It looks like the processing of the meta server duplicates code from the 
processing of the root server.  Can we have instead the duplicated code out in 
a method? Is that possible?  Then pass in args for whether root or meta to 
process?

Regards the hard coded wait of 1000ms when looking for dir to go away, see 
'13.4.1.2.2. Sleeps in tests' in 
http://hbase.apache.org/book.html#hbase.tests... it seems like we should wait 
less than 1000 going by the reference guide.

DeadServer class looks much better.

I can't write a test for a patch on 0.90 and this v17 won't work for trunk at 
all.  The trunk patch will be very different which is a problem because then I 
have no confidence my trunk unit test applies to this patch that is to be 
applied on 0.90 branch.

 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 
 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 
 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 
 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 
 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 
 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, 
 hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, 
 hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, 
 hbase-5179v7.patch, hbase-5179v8.patch,

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-01-22 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190898#comment-13190898
]

stack commented on HBASE-5229:
--

In the above when you talk of column prefix, couldn't that be column family?

For sure we need to talk up ColumnRangeFilter and exercise it more (e.g. your
observation on filters and versions needs looking into...)

Good stuff.

Explore building blocks for multi-row local transactions.
---

Key: HBASE-5229
URL: https://issues.apache.org/jira/browse/HBASE-5229
Project: HBase
Issue Type: New Feature
Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt

HBase should provide basic building blocks for multi-row local transactions.
Local means that we do this by co-locating the data. Global (cross region)
transactions are not discussed here.
After a bit of discussion two solutions have emerged:
1. Keep the row-key for determining grouping and location and allow efficient
intra-row scanning. A client application would then model tables as
HBase-rows.
2. Define a prefix-length in HTableDescriptor that defines a grouping of
rows. Regions will then never be split inside a grouping prefix.
#1 is true to the current storage paradigm of HBase.
#2 is true to the current client side API.
I will explore these two with sample patches here.

Was:
As discussed (at length) on the dev mailing list with the HBASE-3584 and
HBASE-5203 committed, supporting atomic cross row transactions within a
region becomes simple.
I am aware of the hesitation about the usefulness of this feature, but we
have to start somewhere.
Let's use this jira for discussion, I'll attach a patch (with tests)
momentarily to make this concrete.

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-01-22 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190897#comment-13190897
]

stack commented on HBASE-5229:
--

In the above when you talk of column prefix, couldn't that be column family?

For sure we need to talk up ColumnRangeFilter and exercise it more (e.g. your
observation on filters and versions needs looking into...)

Good stuff.

Explore building blocks for multi-row local transactions.
---

Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt

[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92

2012-01-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191377#comment-13191377
 ] 

stack commented on HBASE-5231:
--

It looks like a method named getAssignmentsByTable will only do this if a 
particular configuration is set; else it will do assignments the old way.  
Seems like an odd name for this method.  I'd have thought it would have 
remained getAssignments and then in getAssignments we'd switch on whether to do 
by table or not.

Does this change default? I can't tell.


 Backport HBASE-3373 (per-table load balancing) to 0.92
 --

 Key: HBASE-5231
 URL: https://issues.apache.org/jira/browse/HBASE-5231
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
 Fix For: 0.92.1

 Attachments: 5231-v2.txt, 5231.txt


 This JIRA backports per-table load balancing to 0.90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5255) Use singletons for OperationStatus to save memory

2012-01-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191919#comment-13191919
 ] 

stack commented on HBASE-5255:
--

+1 on patch

 Use singletons for OperationStatus to save memory
 -

 Key: HBASE-5255
 URL: https://issues.apache.org/jira/browse/HBASE-5255
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.90.5, 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor
  Labels: performance
 Fix For: 0.94.0, 0.92.1

 Attachments: 5255-92.txt, 5255-v2.txt, 
 HBASE-5255-0.92-Use-singletons-to-remove-unnecessary-memory-allocati.patch, 
 HBASE-5255-trunk-Use-singletons-to-remove-unnecessary-memory-allocati.patch


 Every single {{Put}} causes the allocation of at least one 
 {{OperationStatus}}, yet {{OperationStatus}} is almost always stateless, so 
 these allocations are unnecessary and could be avoided.  Attached patch adds 
 a few singletons and uses them, with no public API change.  I didn't test the 
 patches, but you get the idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5189) Add metrics to keep track of region-splits in RS

2012-01-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191920#comment-13191920
 ] 

stack commented on HBASE-5189:
--

+1

 Add metrics to keep track of region-splits in RS
 

 Key: HBASE-5189
 URL: https://issues.apache.org/jira/browse/HBASE-5189
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Affects Versions: 0.90.5, 0.92.0
Reporter: Mubarak Seyed
Assignee: Mubarak Seyed
Priority: Minor
  Labels: noob
 Fix For: 0.94.0

 Attachments: HBASE-5189.trunk.v1.patch, HBASE-5189.trunk.v2.patch


 For write-heavy workload with region-size 1 GB, region-split is considerably 
 high. We do normally grep the NN log (grep mkdir*.split NN.log | sort | 
 uniq -c) to get the count.
 I would like to have a counter incremented each time region-split execution 
 succeeds and this counter exposed via the metrics stuff in HBase.
 - regionSplitSuccessCount
 - regionSplitFailureCount (will help us to correlate the timestamp range in 
 RS logs across all RS)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-01-23 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191927#comment-13191927
]

stack commented on HBASE-5229:
--

I see now what the other delete issue is about.

Regards CF as prefix, yeah, its a prob. now where the logical cf and physical
cf are same thing. If we had locality groups, as per BT paper, we could have
lots of cfs.

Explore building blocks for multi-row local transactions.
---

Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt

[jira] [Commented] (HBASE-5262) Structured event log for HBase for monitoring and auto-tuning performance

2012-01-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191933#comment-13191933
 ] 

stack commented on HBASE-5262:
--

@Mikhail Sounds good.  We'd write to hbase because we can't have many writers 
write same file in hdfs?  This system table would be a new one, not .META.?  
We'd fill it to the end of time or the TTL on the table would get rid of old 
records (that'd work).

 Structured event log for HBase for monitoring and auto-tuning performance
 -

 Key: HBASE-5262
 URL: https://issues.apache.org/jira/browse/HBASE-5262
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin

 Creating this JIRA to open a discussion about a structured (machine-readable) 
 log that will record events such as compaction start/end times, compaction 
 input/output files, their sizes, the same for flushes, etc. This can be 
 stored e.g. in a new system table in HBase itself. The data from this log can 
 then be analyzed and used to optimize compactions at run time, or otherwise 
 auto-tune HBase configuration to reduce the number of knobs the user has to 
 configure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

2012-01-23 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191954#comment-13191954
]

stack commented on HBASE-5179:
--

Reviewing 0.92v17

isDeadServerInProgress is a new public method in ServerManager but it does not
seem to be used anywhere.

Does isDeadRootServerInProgress need to be public? Ditto for meta version.

This method param names are not right 'definitiveRootServer'; what is meant by
definitive? Do they need this qualifier?

Is there anything in place to stop us expiring a server twice if its carrying
root and meta?

What is difference between asking assignment manager isCarryingRoot and this
variable that is passed in? Should be doc'd at least. Ditto for meta.

I think I've asked for this a few times -- onlineServers needs to be
explained... either in javadoc or in comment. This is the param passed into
joinCluster. How does it arise? I think I know but am unsure. God love the
poor noob that comes awandering this code trying to make sense of it all.

It looks like we get the list by trawling zk for regionserver znodes that have
not checked in. Don't we do this operation earlier in master setup? Are we
doing it again here?

Though distributed split log is configured, we will do in master single process
splitting under some conditions with this patch. Its not explained in code why
we would do this. Why do we think master log splitting 'high priority' when it
could very well be slower. Should we only go this route if distributed
splitting is not going on. Do we know if concurrent distributed log splitting
and master splitting works?

Why would we have dead servers in progress here in master startup? Because a
servershutdownhandler fired?

This patch is different to the patch for 0.90. Should go into trunk first with
tests, then 0.92. Should it be in this issue? This issue is really hard to
follow now. Maybe this issue is for 0.90.x and new issue for more work on this
trunk patch?

This patch needs to have the v18 differences applied.

Concurrent processing of processFaileOver and ServerShutdownHandler may cause
region to be assigned before log splitting is completed, causing data loss

Key: HBASE-5179
URL: https://issues.apache.org/jira/browse/HBASE-5179
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
Fix For: 0.94.0, 0.90.6, 0.92.1

Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch,
5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch,
5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch,
5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch,
5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch,
5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt,
Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch,
hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch,
hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch

If master's processing its failover and ServerShutdownHandler's processing
happen concurrently, it may appear following case.
1.master completed splitLogAfterStartup()
2.RegionserverA restarts, and ServerShutdownHandler is processing.
3.master starts to rebuildUserRegions, and RegionserverA is considered as
dead server.
4.master starts to assign regions of RegionserverA because it is a dead
server by step3.
However, when doing step4(assigning region), ServerShutdownHandler may be
doing split log, Therefore, it may cause data loss.

[jira] [Commented] (HBASE-5230) Unit test to ensure compactions don't cache data on write

2012-01-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191959#comment-13191959
 ] 

stack commented on HBASE-5230:
--

lgtm

 Unit test to ensure compactions don't cache data on write
 -

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92

2012-01-23 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191971#comment-13191971
 ] 

stack commented on HBASE-5231:
--

That the methods are package private is not relevant.  My comment still stands 
(The fact that balancer is switched via conf also is not pertinent).

 Backport HBASE-3373 (per-table load balancing) to 0.92
 --

 Key: HBASE-5231
 URL: https://issues.apache.org/jira/browse/HBASE-5231
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
 Fix For: 0.92.1

 Attachments: 5231-v2.txt, 5231.txt


 This JIRA backports per-table load balancing to 0.90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5269) IllegalMonitorStateException while retryin HLog split in 0.90 branch.

2012-01-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192248#comment-13192248
 ] 

stack commented on HBASE-5269:
--

+1

 IllegalMonitorStateException while retryin HLog split in 0.90 branch.
 -

 Key: HBASE-5269
 URL: https://issues.apache.org/jira/browse/HBASE-5269
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6

 Attachments: HBASE-5269.patch


 As part of HBASE-5137 fix this bug is introduced.  The splitLogLock is 
 released in the finally block inside the do-while loop. So when the loop 
 executes second time the unlock of the splitLogLock throws Illegal Monitor 
 Exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92

2012-01-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192251#comment-13192251
 ] 

stack commented on HBASE-5231:
--

It does not necessarily by table; could be by server.

Whats this:

{code}
+result.put(ensemble, getAssignments());
{code}

Whats 'ensemble'?

 Backport HBASE-3373 (per-table load balancing) to 0.92
 --

 Key: HBASE-5231
 URL: https://issues.apache.org/jira/browse/HBASE-5231
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
 Fix For: 0.92.1

 Attachments: 5231-v2.txt, 5231.txt


 This JIRA backports per-table load balancing to 0.90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5262) Structured event log for HBase for monitoring and auto-tuning performance

2012-01-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192253#comment-13192253
 ] 

stack commented on HBASE-5262:
--

@Mikhail Sounds good to me

 Structured event log for HBase for monitoring and auto-tuning performance
 -

 Key: HBASE-5262
 URL: https://issues.apache.org/jira/browse/HBASE-5262
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin

 Creating this JIRA to open a discussion about a structured (machine-readable) 
 log that will record events such as compaction start/end times, compaction 
 input/output files, their sizes, the same for flushes, etc. This can be 
 stored e.g. in a new system table in HBase itself. The data from this log can 
 then be analyzed and used to optimize compactions at run time, or otherwise 
 auto-tune HBase configuration to reduce the number of knobs the user has to 
 configure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default

2012-01-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192257#comment-13192257
 ] 

stack commented on HBASE-5267:
--

lgtm

 Add a configuration to disable the slab cache by default
 

 Key: HBASE-5267
 URL: https://issues.apache.org/jira/browse/HBASE-5267
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: 5267.txt


 From what I commented at the tail of HBASE-4027:
 {quote}
 I changed the release note, the patch doesn't have a hbase.offheapcachesize 
 configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize 
 (which is actually a big problem when you consider this: 
 http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). 
 {quote}
 We need to add hbase.offheapcachesize and set it to false by default.
 Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5204) Backward compatibility fixes for 0.92

2012-01-24 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192267#comment-13192267
]

stack commented on HBASE-5204:
--

Yes, agree, adding patch to trunk would be silly. Lets drop it. The codes
should never have been changed. Hopefully your warning...

{code}
1231985 stack

1231985 stack // WARNING: Please do not insert, remove or swap any
line in this static //
1231985 stack // block. Doing so would change or shift all the codes
used to serialize //
1231985 stack // objects, which makes backwards compatibility very
hard for clients.//
1231985 stack // New codes should always be added at the end. Code
removal is //
1231985 stack // discouraged because code is a short now.
//
1231985 stack

{code}

will help w/ that.

Can we check in an asynchbase unit test that exercises all apis you need so we
fail fast in case we mess up again (HBase has 16 committers now and hard to
have them all on message)

Backward compatibility fixes for 0.92
-

Key: HBASE-5204
URL: https://issues.apache.org/jira/browse/HBASE-5204
Project: HBase
Issue Type: Bug
Components: ipc
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Blocker
Labels: backwards-compatibility
Fix For: 0.92.0

Attachments:
0001-Add-some-backward-compatible-support-for-reading-old.patch,
0002-Make-sure-that-a-connection-always-uses-a-protocol.patch,
0003-Change-the-code-used-when-serializing-HTableDescript.patch, 5204-92.txt,
5204-trunk.txt, 5204.addendum

Attached are 3 patches that are necessary to allow compatibility between
HBase 0.90.x (and previous releases) and HBase 0.92.0.
First of all, I'm well aware that 0.92.0 RC4 has been thumbed up by a lot of
people and would probably wind up being released as 0.92.0 tomorrow, so I
sincerely apologize for creating this issue so late in the process. I spent
a lot of time trying to work around the quirks of 0.92 but once I realized
that with a few very quasi-trivial changes compatibility would be made
significantly easier, I immediately sent these 3 patches to Stack, who
suggested I create this issue.
The first patch is required as without it clients sending a 0.90-style RPC to
a 0.92-style server causes the server to die uncleanly. It seems that 0.92
ships with {{\-XX:OnOutOfMemoryError=kill \-9 %p}}, and when a 0.92 server
fails to deserialize a 0.90-style RPC, it attempts to allocate a large buffer
because it doesn't read fields of 0.90-style RPCs properly. This allocation
attempt immediately triggers an OOME, which causes the JVM to die abruptly of
a {{SIGKILL}}. So whenever a 0.90.x client attempts to connect to HBase, it
kills whichever RS is hosting the {{\-ROOT-}} region.
The second patch fixes a bug introduced by HBASE-2002, which added support
for letting clients specify what protocol they want to speak. If a client
doesn't properly specify what protocol to use, the connection's {{protocol}}
field will be left {{null}}, which causes any subsequent RPC on that
connection to trigger an NPE in the server, even though the connection was
successfully established from the client's point of view. The fix is to
simply give the connection a default protocol, by assuming the client meant
to speak to a RegionServer.
The third patch fixes an oversight that slipped in HBASE-451, where a change
to {{HbaseObjectWritable}} caused all the codes used to serialize
{{Writables}} to shift by one. This was carefully avoided in other changes
such as HBASE-1502, which cleanly removed entries for {{HMsg}} and
{{HMsg[]}}, so I don't think this breakage in HBASE-451 was intended.

[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92

2012-01-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192372#comment-13192372
 ] 

stack commented on HBASE-5231:
--

Its unexplained in the code, its just introduced and its very odd.  Wrong even.

 Backport HBASE-3373 (per-table load balancing) to 0.92
 --

 Key: HBASE-5231
 URL: https://issues.apache.org/jira/browse/HBASE-5231
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
 Fix For: 0.92.1

 Attachments: 5231-v2.txt, 5231.txt


 This JIRA backports per-table load balancing to 0.90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92

2012-01-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192440#comment-13192440
 ] 

stack commented on HBASE-5231:
--

In a method named getAssignmentsByTable we will return regions by table IFF a 
configuration hbase.master.loadbalance.bytable is true.  Otherwise we return 
all regions belonging to a 'table' named 'ensemble'.  No where is 'ensemble' 
explained.  How is a noob who follows along after us trying to make sense of 
this supposed to figure whats going on?  A method named getAssignmentsByTable 
should return assignments by table... not assignments by table and then if some 
flag in config. is set, assignments by some arbitrary pseudo table. At a 
minimum it needs to be explained by comments and in javadoc.  But really it 
says to me that this new feature is not well thought through.  Why do we worry 
about regions by table outside of the balancer invocation; shouldn't the 
balancer-by-table being asking about a regions table down in the balancer guts 
rather than up here high in the master.

Looking more at what is going on, when the balance finishes, do we have a 
balanced cluster?  There is no test to prove it and thinking on it, given as we 
invoke the balancer per table, if lots of tables with different region count 
skew, I'd think it could throw off the basic cluster balance (regions per 
regionserver).



 Backport HBASE-3373 (per-table load balancing) to 0.92
 --

 Key: HBASE-5231
 URL: https://issues.apache.org/jira/browse/HBASE-5231
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
 Fix For: 0.92.1

 Attachments: 5231-v2.txt, 5231.txt


 This JIRA backports per-table load balancing to 0.90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5262) Structured event log for HBase for monitoring and auto-tuning performance

2012-01-24 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192603#comment-13192603
]

stack commented on HBASE-5262:
--

@Mikhail We used have an extra column family in .META. called history. Into it
we'd write 'events'. It was stripped because though a nice idea, it caused
more trouble than it was worth (x-cluster deadlocking, interfering w/
shutdowns, growing at a different rate to the info family in .META.)...so yeah,
should fail fast if can't put events, etc.

Structured event log for HBase for monitoring and auto-tuning performance
-

Key: HBASE-5262
URL: https://issues.apache.org/jira/browse/HBASE-5262
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin

Creating this JIRA to open a discussion about a structured (machine-readable)
log that will record events such as compaction start/end times, compaction
input/output files, their sizes, the same for flushes, etc. This can be
stored e.g. in a new system table in HBase itself. The data from this log can
then be analyzed and used to optimize compactions at run time, or otherwise
auto-tune HBase configuration to reduce the number of knobs the user has to
configure.

[jira] [Commented] (HBASE-5231) Backport HBASE-3373 (per-table load balancing) to 0.92

2012-01-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192612#comment-13192612
 ] 

stack commented on HBASE-5231:
--

bq. The relationship between this feature and balancer sloppiness is definitely 
within the scope of discussion.

I don't understand how.

bq. Since the existing tests for load balancer pass, there is no regression 
introduced by this feature.

The existing test uses table balancing?  (There is no test for table balancer 
that I can see)

-1 on this patch going into 0.92. Its incoherent.

 Backport HBASE-3373 (per-table load balancing) to 0.92
 --

 Key: HBASE-5231
 URL: https://issues.apache.org/jira/browse/HBASE-5231
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu
 Fix For: 0.92.1

 Attachments: 5231-v2.txt, 5231.addendum, 5231.txt


 This JIRA backports per-table load balancing to 0.90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5249) Using a stale explicit row lock in a Put triggers an NPE

2012-01-24 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192635#comment-13192635
 ] 

stack commented on HBASE-5249:
--

Copying over Ted comment that was in hbase-5171 (resolved as a duplicate):

{code}
Zhihong Yu added a comment - 10/Jan/12 18:03
One workaround is to increase the value for hbase.rowlock.wait.duration
But a new exception should be introduced anyways.
{code}

Yves says up on the list:

{code}
After checking the source code I've noticed that the value which is going to be 
put into the HashMap can be null in the case where the waitForLock flag is true 
or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, line 
2111ff). The latter I think happens in our case as we have heavy load hitting 
the server.
{code}

 Using a stale explicit row lock in a Put triggers an NPE
 

 Key: HBASE-5249
 URL: https://issues.apache.org/jira/browse/HBASE-5249
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Minor
 Fix For: 0.92.1


 After acquiring an explicit row lock, if one attempts to send {{Put}} after 
 the row lock has expired, an NPE is triggered in the RegionServer, instead of 
 throwing an {{UnknownRowLockException}} back to the client.
 {code}
 2012-01-20 17:09:54,074 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining
 row lock (fsOk: true)
 java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2313)
at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2299)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1327)
 It happened only once out of thousands of RPCs that grabbed and
 released a row lock.
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-01-25 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193187#comment-13193187
]

stack commented on HBASE-5270:
--

bq. So, the param 'definitiveRootServer' is used in this case to ensure the
dead root server is carryingRoot when it is being expired.

Whats 'definitive' about it? Is it that we know for sure the server was
carrying root or meta? How?

bq. Is there any possible to expire a server if its carrying root and meta now?
I don't think so.

You are saying that this patch does nothing new here? We COULD expire the
server that was carrying root, wait on its log split, then expire the server
carrying meta (though it may have been the same server)... it might be ok but
we might kill a server that has just started. I'm ok if fixing this is outside
scope of this patch.

bq. I don't find this operation earlier in master setup, and this operation is
not introduced by this issue. And I only introduce this logic for 90 from trunk.

So, you copied this to 0.90 from TRUNK (so my notion that we already had this
is my remembering how things work on TRUNK.. that would make sense).

bq. I think we need explain it, But whether we shouldn't use distributed split
log, I'm not very sure.

If we are not sure, we shouldn't do it.

bq. When matser is initializing, if one RS is killed and restart, then dead
server is in progress while master startup

This seems like a small window. Or do you think it could happen frequent?
Could we hold up shutdownserverhandler until master is up?

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Fix For: 0.94.0, 0.92.1

This JIRA continues the effort from HBASE-5179. Starting with Stack's
comments about patches for 0.92 and TRUNK:
Reviewing 0.92v17
isDeadServerInProgress is a new public method in ServerManager but it does
not seem to be used anywhere.
Does isDeadRootServerInProgress need to be public? Ditto for meta version.
This method param names are not right 'definitiveRootServer'; what is meant
by definitive? Do they need this qualifier?
Is there anything in place to stop us expiring a server twice if its carrying
root and meta?
What is difference between asking assignment manager isCarryingRoot and this
variable that is passed in? Should be doc'd at least. Ditto for meta.
I think I've asked for this a few times - onlineServers needs to be
explained... either in javadoc or in comment. This is the param passed into
joinCluster. How does it arise? I think I know but am unsure. God love the
poor noob that comes awandering this code trying to make sense of it all.
It looks like we get the list by trawling zk for regionserver znodes that
have not checked in. Don't we do this operation earlier in master setup? Are
we doing it again here?
Though distributed split log is configured, we will do in master single
process splitting under some conditions with this patch. Its not explained in
code why we would do this. Why do we think master log splitting 'high
priority' when it could very well be slower. Should we only go this route if
distributed splitting is not going on. Do we know if concurrent distributed
log splitting and master splitting works?
Why would we have dead servers in progress here in master startup? Because a
servershutdownhandler fired?
This patch is different to the patch for 0.90. Should go into trunk first
with tests, then 0.92. Should it be in this issue? This issue is really hard
to follow now. Maybe this issue is for 0.90.x and new issue for more work on
this trunk patch?
This patch needs to have the v18 differences applied.

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-01-25 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193209#comment-13193209
]

stack commented on HBASE-5270:
--

bq. What if the region server hosting .META. went down ?

Yes... was just thinking about that. In this case we'd run the splitter
in-line, in SSH, not via executor let me look at code. I'm trying to write
tests and catch-up on all the stuff that was done over on previous issue.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Fix For: 0.94.0, 0.92.1

[jira] [Commented] (HBASE-5276) PerformanceEvaluation does not set the correct classpath for MR because it lives in the test jar

2012-01-25 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193255#comment-13193255
 ] 

stack commented on HBASE-5276:
--

@Tim Maybe open issue against CDH and close this one?

 PerformanceEvaluation does not set the correct classpath for MR because it 
 lives in the test jar
 

 Key: HBASE-5276
 URL: https://issues.apache.org/jira/browse/HBASE-5276
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.4
Reporter: Tim Robertson
Priority: Minor

 Note: This was discovered running the CDH version hbase-0.90.4-cdh3u2
 Running the PerformanceEvaluation as follows:
   $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 5
 fails because the MR tasks do not get the HBase jar on the CP, and thus hit 
 ClassNotFoundExceptions.
 The job gets the following only:
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/hbase-0.90.4-cdh3u2-tests.jar
   
 file:/Users/tim/dev/hadoop/hadoop-0.20.2-cdh3u2/hadoop-core-0.20.2-cdh3u2.jar
   
 file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar
 The RowCounter etc all work because they live in the HBase jar, not the test 
 jar, and they get the following 
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/guava-r06.jar
   
 file:/Users/tim/dev/hadoop/hadoop-0.20.2-cdh3u2/hadoop-core-0.20.2-cdh3u2.jar
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/hbase-0.90.4-cdh3u2.jar
   
 file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar
 Presumably this relates to 
   job.setJarByClass(PerformanceEvaluation.class);
   ...
   TableMapReduceUtil.addDependencyJars(job);
 A (cowboy) workaround to run PE is to unpack the jars, and copy the 
 PerformanceEvaluation* classes building a patched jar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5278) HBase shell script refers to removed migrate functionality

2012-01-25 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193365#comment-13193365
 ] 

stack commented on HBASE-5278:
--

@Jon I've a bit of practise

 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-01-25 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193363#comment-13193363
 ] 

stack commented on HBASE-5279:
--

Skipping should be fine.  You have a scan of .META. from before upgrade?

Are you up now?

 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Attachments: HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-01-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195136#comment-13195136
 ] 

stack commented on HBASE-4991:
--

bq. For successive regions R1, R2 and R3, if we delete R2, we can change the 
end key of R1 to be the original end key of R2 and drop region R2 directly.

Changing the end key of R1 to be the end key of R2 will require creating a new 
region R1'; we'll have to close R1 and R2 and delete both (after moving the R1 
data to R1').

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu

 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195163#comment-13195163
 ] 

stack commented on HBASE-5120:
--

So, setting down the TM period from 30 minutes to 2 minutes and doing a bunch 
of online enable/disable works?  If so, I'm +1 on this patch (We'd actually set 
the timeout down from 30minutes to 5minutes over in hbase-5119?)

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, 
 HBASE-5120_5.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-01-27 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195187#comment-13195187
]

stack commented on HBASE-5229:
--

@Lars I'm 'intellectually' interested. I have no practical need. I'm more
interested in our being able to support large rows (intra-row scanning, etc.);
i.e. one row in a region and that region is huge and it works.

Explore building blocks for multi-row local transactions.
---

Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt

[jira] [Commented] (HBASE-3134) [replication] Add the ability to enable/disable streams

2012-01-27 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195189#comment-13195189
 ] 

stack commented on HBASE-3134:
--

bq. Should I do this in this ticket or file another issue?

Up to you.

 [replication] Add the ability to enable/disable streams
 ---

 Key: HBASE-3134
 URL: https://issues.apache.org/jira/browse/HBASE-3134
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Teruyoshi Zenmyo
Priority: Minor
  Labels: replication
 Fix For: 0.94.0

 Attachments: HBASE-3134.patch


 This jira was initially in the scope of HBASE-2201, but was pushed out since 
 it has low value compared to the required effort (and when want to ship 
 0.90.0 rather soonish).
 We need to design a way to enable/disable replication streams in a 
 determinate fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4810) Clean up IPC logging configuration

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197029#comment-13197029
 ] 

stack commented on HBASE-4810:
--

+1

 Clean up IPC logging configuration
 --

 Key: HBASE-4810
 URL: https://issues.apache.org/jira/browse/HBASE-4810
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Reporter: Gary Helmling

 The current IPC classes -- HBaseClient, HBaseServer, HBaseRPC, 
 WritableRpcEngine, etc. -- use mangled package names (org.apache.hadoop.ipc) 
 when obtaining loggers so that we can enable debug logging on the full 
 org.apache.hadoop.hbase base package and not have the noise from per-request 
 log messages drown out other important information.  This has the desired 
 effect, but is extremely hacky and counter-intuitive.
 I think it would be better to fix the package name used by these classes (use 
 org.apache.hadoop.hbase.ipc), and change the noisy, per-request messages to 
 be logged at trace level.  This will keep those messages out of standard 
 debug output, while allowing the log4j configuration when you actually wish 
 to see them to be a little more intuitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5285) runtime exception -- cached an already cached block -- during compaction

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197311#comment-13197311
 ] 

stack commented on HBASE-5285:
--

Can you make this happen reliably Simon?

 runtime exception -- cached an already cached block -- during compaction
 

 Key: HBASE-5285
 URL: https://issues.apache.org/jira/browse/HBASE-5285
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
 Environment: hadoop-1.0 and hbase-0.92
 18 node cluster, dedicated namenode, zookeeper, hbasemaster, and YCSB client 
 machine. 
 latest YCSB
Reporter: Simon Dircks
Priority: Trivial

 #On YCSB client machine:
 /usr/local/bin/java -cp build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/ 
 com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P 
 workloads/workloada -p columnfamily=family1 -p recordcount=500 -s  
 load.dat
 loaded 5mil records, that created 8 regions. (balanced all onto the same RS)
 /usr/local/bin/java -cp build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/ 
 com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P 
 workloads/workloada -p columnfamily=family1 -p operationcount=500 
 -threads 10 -s  transaction.dat
 #On RS that was holding the 8 regions above. 
 2012-01-25 23:23:51,556 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x134f70a343101a0 Successfully transitioned node 
 162702503c650e551130e5fb588b3ec2 from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT
 2012-01-25 23:23:51,616 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.lang.RuntimeException: Cached an already cached block
 at 
 org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:268)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:276)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:487)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:168)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:181)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:111)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:83)
 at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1721)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:2861)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1432)
 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1424)
 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1400)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:3688)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:3581)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1771)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325)
 2012-01-25 23:23:51,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x134f70a343101a0 Attempting to transition node 
 162702503c650e551130e5fb588b3ec2 from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5286) bin/hbase's logic of adding Hadoop jar files to the classpath is fragile when presented with split packaged Hadoop 0.23 installation

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197316#comment-13197316
 ] 

stack commented on HBASE-5286:
--

Thanks for filing the issue Roman.

 bin/hbase's logic of adding Hadoop jar files to the classpath is fragile when 
 presented with split packaged Hadoop 0.23 installation
 

 Key: HBASE-5286
 URL: https://issues.apache.org/jira/browse/HBASE-5286
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik

 Here's the bit from bin/hbase that might need TLC now that Hadoop can be 
 spotted in the wild in split-package configuration:
 {noformat}
 #If avail, add Hadoop to the CLASSPATH and to the JAVA_LIBRARY_PATH
 if [ ! -z $HADOOP_HOME ]; then
   HADOOPCPPATH=
   if [ -z $HADOOP_CONF_DIR ]; then
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} ${HADOOP_HOME}/conf)
   else
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} ${HADOOP_CONF_DIR})
   fi
   if [ `echo ${HADOOP_HOME}/hadoop-core*.jar` != 
 ${HADOOP_HOME}/hadoop-core*.jar ] ; then
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-core*.jar | head -1`)
   else
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-common*.jar | head -1`)
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-hdfs*.jar | head -1`)
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-mapred*.jar | head -1`)
   fi
 {noformat}
 There's a couple of issues with the above code:
0. HADOOP_HOME is now deprecated in Hadoop 0.23
1. the list of jar files added to the class-path should be revised
2. we need to figure out a more robust way to get the jar files that are 
 needed to the classpath (things like hadoop-mapred*.jar tend to match 
 src/test jars as well)
 Better yet, it would be useful to look into whether we can transition HBase's 
 bin/hbase onto using bin/hadoop as a launcher script instead of direct JAVA 
 invocations (Pig, Hive, Sqoop and Mahout already do that)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-01-31 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197328#comment-13197328
]

stack commented on HBASE-5229:
--

bq. I hear you. Means that the client and data modeling would have to change
completely if it needs transactions.

Could this be done as a facade atop our current api?

Explore building blocks for multi-row local transactions.
---

Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197340#comment-13197340
 ] 

stack commented on HBASE-5268:
--

I added a footnote to book in delete section pointing at your blog Lars.

 Add delete column prefix delete marker
 --

 Key: HBASE-5268
 URL: https://issues.apache.org/jira/browse/HBASE-5268
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Reporter: Lars Hofhansl
 Attachments: 5268-proof.txt, 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 
 5268-v5.txt, 5268.txt


 This is another part missing in the wide row challenge.
 Currently entire families of a row can be deleted or individual columns or 
 versions.
 There is no facility to mark multiple columns for deletion by column prefix.
 Turns out that be achieve with very little code (it's possible that I missed 
 some of the new delete bloom filter code, so please review this thoroughly). 
 I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5266) Add documentation for ColumnRangeFilter

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197347#comment-13197347
 ] 

stack commented on HBASE-5266:
--

Nothing wrong w/ your english.

The below needs a bit more english, namely a 'be'...

can used

Can be fixed on commit.

Otherwise doc is excellent.

+1


 Add documentation for ColumnRangeFilter
 ---

 Key: HBASE-5266
 URL: https://issues.apache.org/jira/browse/HBASE-5266
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5266-v2.txt, 5266-v3.txt, 5266.txt


 There are only a few lines of documentation for ColumnRangeFilter.
 Given the usefulness of this filter for efficient intra-row scanning (see 
 HBASE-5229 and HBASE-4256), we should make this filter more prominent in the 
 documentation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5281) Should a failure in creating an unassigned node abort the master?

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197359#comment-13197359
 ] 

stack commented on HBASE-5281:
--

I'm not sure removing the abort only is enough Harsh.  If we can't write zk, we 
can't assign a region so holes in the table.  Thats radical.

We should be consistent regards our policy when a zk write fails.  We are not 
currently as you've found here.

 Should a failure in creating an unassigned node abort the master?
 -

 Key: HBASE-5281
 URL: https://issues.apache.org/jira/browse/HBASE-5281
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5
Reporter: Harsh J
Assignee: Harsh J
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5281.patch


 In {{AssignmentManager}}'s {{CreateUnassignedAsyncCallback}}, we have the 
 following condition:
 {code}
 if (rc != 0) {
 // Thisis resultcode.  If non-zero, need to resubmit.
 LOG.warn(rc != 0 for  + path +  -- retryable connectionloss --  +
   FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;);
 this.zkw.abort(Connectionloss writing unassigned at  + path +
   , rc= + rc, null);
 return;
 }
 {code}
 While a similar structure inside {{ExistsUnassignedAsyncCallback}} (which the 
 above is linked to), does not have such a force abort.
 Do we really require the abort statement here, or can we make do without?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5299) CatalogTracker.getMetaServerConnection() checks for root server connection and makes waitForMeta to go into infinite loop in region assignment flow.

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197362#comment-13197362
 ] 

stack commented on HBASE-5299:
--

I could take this one Ram.  I'm trying to hack up testing harness for Master.  
Making a test for this condition would be a good exercise.

 CatalogTracker.getMetaServerConnection() checks for root server connection 
 and makes waitForMeta to go into infinite loop in region assignment flow.
 

 Key: HBASE-5299
 URL: https://issues.apache.org/jira/browse/HBASE-5299
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor

 RSA, RS B and RS C are 3 region servers.
 RS A - META
 RS B - ROOT
 RS C - NON META and NON ROOT
 Kill RS B and wait for server shutdown handler to start.  
 Start RS B again before assigning ROOT to RS C.
 Now the cluster will try to assign new regions to RS B.  
 But as ROOT is not yet assigned the OpenRegionHandler.updateMeta will fail to 
 update the regions just because ROOT is not online.
 {code}
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:23:25,126 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Attempting to transition node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:23:25,159 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Successfully transitioned node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:23:35,385 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Attempting to transition node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:23:35,449 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Successfully transitioned node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:24:16,666 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Attempting to transition node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:24:16,701 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Successfully transitioned node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:24:20,788 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Interrupting 
 thread Thread[PostOpenDeployTasks:a87109263ed53e67158377a149c5a7be,5,main]
 2012-01-30 16:24:30,699 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Exception 
 running postOpenDeployTasks; region=a87109263ed53e67158377a149c5a7be
 org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Interrupted
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:439)
   at 
 org.apache.hadoop.hbase.catalog.MetaEditor.updateRegionLocation(MetaEditor.java:142)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1382)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:221)
 {code}
 So we need to wait for TM to assign the regions again. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5280) Remove AssignmentManager#clearRegionFromTransition and replace with assignmentManager#regionOffline

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197373#comment-13197373
 ] 

stack commented on HBASE-5280:
--

+1

 Remove AssignmentManager#clearRegionFromTransition and replace with 
 assignmentManager#regionOffline
 ---

 Key: HBASE-5280
 URL: https://issues.apache.org/jira/browse/HBASE-5280
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Jonathan Hsieh

 These two methods are essentially the same and both present in the code base. 
  It was suggested in the review for HBASE-5128 to remove 
 #clearRegionFromTransition in favor of #regionOffline  (HBASE-5128 deprecates 
 this method, but it is internal to the HMaster, so should be safely removable 
 from 0.92 and 0.94).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5309) Hbase web-app /jmx throws an exception

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197393#comment-13197393
 ] 

stack commented on HBASE-5309:
--

If you go to /jmx, are we supposed to dump a view on jmx?

 Hbase web-app /jmx throws an exception
 --

 Key: HBASE-5309
 URL: https://issues.apache.org/jira/browse/HBASE-5309
 Project: HBase
  Issue Type: Bug
Reporter: Hitesh Shah

 hbasemaster:60010/jmx throws an NoSuchMethodError exception

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3850) Log more details when a scanner lease expires

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197392#comment-13197392
 ] 

stack commented on HBASE-3850:
--

+1 on patch (fix spacing on the 'else' on commit)

Won't regionname include name of table?

 Log more details when a scanner lease expires
 -

 Key: HBASE-3850
 URL: https://issues.apache.org/jira/browse/HBASE-3850
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Benoit Sigoure
Assignee: Darren Haas
Priority: Critical
 Fix For: 0.94.0

 Attachments: HBASE-3850.trunk.v1.patch


 The message logged by the RegionServer when a Scanner lease expires isn't as 
 useful as it could be.  {{Scanner 4765412385779771089 lease expired}} - most 
 clients don't log their scanner ID, so it's really hard to figure out what 
 was going on.  I think it would be useful to at least log the name of the 
 region on which the Scanner was open, and it would be great to have the 
 ip:port of the client that had that lease too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3859) Increment a counter when a Scanner lease expires

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197395#comment-13197395
 ] 

stack commented on HBASE-3859:
--

Patch looks good. 

Any evidence it works Mubarak?

 Increment a counter when a Scanner lease expires
 

 Key: HBASE-3859
 URL: https://issues.apache.org/jira/browse/HBASE-3859
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Affects Versions: 0.90.2
Reporter: Benoit Sigoure
Assignee: Mubarak Seyed
Priority: Minor
 Attachments: HBASE-3859.trunk.v1.patch


 Whenever a Scanner lease expires, the RegionServer will close it 
 automatically and log a message to complain.  I would like the RegionServer 
 to increment a counter whenever this happens and expose this counter through 
 the metrics system, so we can plug this into our monitoring system (OpenTSDB) 
 and keep track of how frequently this happens.  It's not supposed to happen 
 frequently so it's good to keep an eye on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5283) Request counters may become negative for heavily loaded regions

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197398#comment-13197398
 ] 

stack commented on HBASE-5283:
--

+1 on patch.

 Request counters may become negative for heavily loaded regions
 ---

 Key: HBASE-5283
 URL: https://issues.apache.org/jira/browse/HBASE-5283
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5283.trunk.v1.patch


 Requests counter showing negative count, example under 'Requests' column: 
 -645470239
 {code}
 Name  Region Server   Start Key   End Key Requests
 usertable,user2037516127892189021,1326756873774.16833e4566d1daef109b8fdcd1f4b5a6.
  xxx.com:60030   user2037516127892189021 user2296868939942738705  
-645470239
 {code}
 RegionLoad.readRequestsCount and RegionLoad.writeRequestsCount are of int 
 type. Our Ops has been running lots of heavy load operation. 
 RegionLoad.getRequestsCount() overflows int.MAX_VALUE. It is set to D986E7E1. 
 In table.jsp, RegionLoad.getRequestsCount() is assigned to long type. 
 D986E7E1 is converted to long D986E7E1 which is -645470239 in decimal.
 Suggested fix is to make readRequestsCount and writeRequestsCount long type. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5283) Request counters may become negative for heavily loaded regions

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197400#comment-13197400
 ] 

stack commented on HBASE-5283:
--

+1 on patch.

 Request counters may become negative for heavily loaded regions
 ---

 Key: HBASE-5283
 URL: https://issues.apache.org/jira/browse/HBASE-5283
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5283.trunk.v1.patch


 Requests counter showing negative count, example under 'Requests' column: 
 -645470239
 {code}
 Name  Region Server   Start Key   End Key Requests
 usertable,user2037516127892189021,1326756873774.16833e4566d1daef109b8fdcd1f4b5a6.
  xxx.com:60030   user2037516127892189021 user2296868939942738705  
-645470239
 {code}
 RegionLoad.readRequestsCount and RegionLoad.writeRequestsCount are of int 
 type. Our Ops has been running lots of heavy load operation. 
 RegionLoad.getRequestsCount() overflows int.MAX_VALUE. It is set to D986E7E1. 
 In table.jsp, RegionLoad.getRequestsCount() is assigned to long type. 
 D986E7E1 is converted to long D986E7E1 which is -645470239 in decimal.
 Suggested fix is to make readRequestsCount and writeRequestsCount long type. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5281) Should a failure in creating an unassigned node abort the master?

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197445#comment-13197445
 ] 

stack commented on HBASE-5281:
--

At least in 0.92.0 (and 0.89fb) we are using recoverablezk.  It will retry 
'recoverable' zk errors.  Are you suggesting something else Jimmy?  That we 
wrap all zk ops in a higher-level retry loop?

 Should a failure in creating an unassigned node abort the master?
 -

 Key: HBASE-5281
 URL: https://issues.apache.org/jira/browse/HBASE-5281
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5
Reporter: Harsh J
Assignee: Harsh J
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5281.patch


 In {{AssignmentManager}}'s {{CreateUnassignedAsyncCallback}}, we have the 
 following condition:
 {code}
 if (rc != 0) {
 // Thisis resultcode.  If non-zero, need to resubmit.
 LOG.warn(rc != 0 for  + path +  -- retryable connectionloss --  +
   FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;);
 this.zkw.abort(Connectionloss writing unassigned at  + path +
   , rc= + rc, null);
 return;
 }
 {code}
 While a similar structure inside {{ExistsUnassignedAsyncCallback}} (which the 
 above is linked to), does not have such a force abort.
 Do we really require the abort statement here, or can we make do without?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5256) Use WritableUtils.readVInt() in RegionLoad.readFields()

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197475#comment-13197475
 ] 

stack commented on HBASE-5256:
--

Will we have to be careful with this patch?  It can't go into a minor version, 
right else you could have a regionserver reporting a master metrics in a format 
it can't interpret (or vice versa, the master will be expecting them as longs 
but they come over as vlongs).

We up the version on HServerLoad but don't exploit the version change.  We 
could have had HSL self-migrate reading using old code if it got a v1 HSL to 
deserialize.  Then we could have this in a point version.

 Use WritableUtils.readVInt() in RegionLoad.readFields()
 ---

 Key: HBASE-5256
 URL: https://issues.apache.org/jira/browse/HBASE-5256
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-5256.trunk.v1.patch


 Currently in.readInt() is used in RegionLoad.readFields()
 More metrics would be added to RegionLoad in the future, we should utilize 
 WritableUtils.readVInt() to reduce the amount of data exchanged between 
 Master and region servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5256) Use WritableUtils.readVInt() in RegionLoad.readFields()

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197586#comment-13197586
 ] 

stack commented on HBASE-5256:
--

Yes.  Upgrading.  Seems like it'd be easy to make this self-migrating.  Would 
suggest we do it.  Make a new issue?

 Use WritableUtils.readVInt() in RegionLoad.readFields()
 ---

 Key: HBASE-5256
 URL: https://issues.apache.org/jira/browse/HBASE-5256
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-5256.trunk.v1.patch


 Currently in.readInt() is used in RegionLoad.readFields()
 More metrics would be added to RegionLoad in the future, we should utilize 
 WritableUtils.readVInt() to reduce the amount of data exchanged between 
 Master and region servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5309) Hbase web-app /jmx throws an exception

2012-01-31 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197587#comment-13197587
 ] 

stack commented on HBASE-5309:
--

That'd work for us.

 Hbase web-app /jmx throws an exception
 --

 Key: HBASE-5309
 URL: https://issues.apache.org/jira/browse/HBASE-5309
 Project: HBase
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Enis Soztutar

 hbasemaster:60010/jmx throws an NoSuchMethodError exception

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1621) merge tool should work on online cluster, but disabled table

2012-02-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197942#comment-13197942
 ] 

stack commented on HBASE-1621:
--

Would you mind uploading a version of online_merge.rb w/ patch included Alexey 
to make it easier for others using the script.  Thank you.

 merge tool should work on online cluster, but disabled table
 

 Key: HBASE-1621
 URL: https://issues.apache.org/jira/browse/HBASE-1621
 Project: HBase
  Issue Type: Bug
Reporter: ryan rawson
Assignee: stack
 Fix For: 0.94.0

 Attachments: 1621-trunk.txt, HBASE-1621-v2.patch, HBASE-1621.patch, 
 hbase-onlinemerge.patch, online_merge.rb


 taking down the entire cluster to merge 2 regions is a pain, i dont see why 
 the table or regions specifically couldnt be taken offline, then merged then 
 brought back up.
 this might need a new API to the regionservers so they can take direction 
 from not just the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5314) Gracefully rolling restart region servers in rolling-restart.sh

2012-02-01 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198024#comment-13198024
]

stack commented on HBASE-5314:
--

This is interesting. You are putting together rolling_restart.sh and
graceful_stop.sh. Nice.

What version of hbase you working against?

Should this new behavior be optional otherwise those who might be using
rolling_restart.sh now might get a surprise (IMO, its unlikely anyone is using
rolling_restart.sh because it so disruptive.. which you are fixing... but
still, I'd think a --graceful option that enables this new facility would be
way to go).

Patch looks good but for this.

Gracefully rolling restart region servers in rolling-restart.sh
---

Key: HBASE-5314
URL: https://issues.apache.org/jira/browse/HBASE-5314
Project: HBase
Issue Type: Improvement
Components: scripts
Reporter: YiFeng Jiang
Priority: Minor
Attachments: HBASE-5314.patch

The rolling-restart.sh has a --rs-only option which simply restarts all
region servers in the cluster.
Consider improve it to gracefully restart region servers to avoid the offline
time of the regions deployed on that server, and keep the region
distributions same as what it was before the restarting.

[jira] [Commented] (HBASE-5286) bin/hbase's logic of adding Hadoop jar files to the classpath is fragile when presented with split packaged Hadoop 0.23 installation

2012-02-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198037#comment-13198037
 ] 

stack commented on HBASE-5286:
--

Why do they have to be in a different dir?  Can't we do some sed foo to purge 
them from list of lib jars?

If HADOOP_HOME defined, can we not put the hadoop and zk jars at the front of 
the CLASSPATH or is it that we can't have hadoop jars in the CLASSPATH twice, 
both the new and old, because we'll find classes we shouldn't or IIRC, we can't 
have hadoop jars in front of hbase jars because then we'll do things like pick 
up its webapps instead of ours.

Thanks Roman.

 bin/hbase's logic of adding Hadoop jar files to the classpath is fragile when 
 presented with split packaged Hadoop 0.23 installation
 

 Key: HBASE-5286
 URL: https://issues.apache.org/jira/browse/HBASE-5286
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik

 Here's the bit from bin/hbase that might need TLC now that Hadoop can be 
 spotted in the wild in split-package configuration:
 {noformat}
 #If avail, add Hadoop to the CLASSPATH and to the JAVA_LIBRARY_PATH
 if [ ! -z $HADOOP_HOME ]; then
   HADOOPCPPATH=
   if [ -z $HADOOP_CONF_DIR ]; then
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} ${HADOOP_HOME}/conf)
   else
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} ${HADOOP_CONF_DIR})
   fi
   if [ `echo ${HADOOP_HOME}/hadoop-core*.jar` != 
 ${HADOOP_HOME}/hadoop-core*.jar ] ; then
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-core*.jar | head -1`)
   else
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-common*.jar | head -1`)
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-hdfs*.jar | head -1`)
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-mapred*.jar | head -1`)
   fi
 {noformat}
 There's a couple of issues with the above code:
0. HADOOP_HOME is now deprecated in Hadoop 0.23
1. the list of jar files added to the class-path should be revised
2. we need to figure out a more robust way to get the jar files that are 
 needed to the classpath (things like hadoop-mapred*.jar tend to match 
 src/test jars as well)
 Better yet, it would be useful to look into whether we can transition HBase's 
 bin/hbase onto using bin/hadoop as a launcher script instead of direct JAVA 
 invocations (Pig, Hive, Sqoop and Mahout already do that)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

2012-02-02 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199528#comment-13199528
]

stack commented on HBASE-3792:
--

@Bryan Yes please. Thats crazy acrobatics you are at just to close a
connection. Thanks.

TableInputFormat leaks ZK connections
-

Key: HBASE-3792
URL: https://issues.apache.org/jira/browse/HBASE-3792
Project: HBase
Issue Type: Bug
Components: mapreduce
Affects Versions: 0.90.1
Environment: Java 1.6.0_24, Mac OS X 10.6.7
Reporter: Bryan Keller
Attachments: patch0.90.4, tableinput.patch

The TableInputFormat creates an HTable using a new Configuration object, and
it never cleans it up. When running a Mapper, the TableInputFormat is
instantiated and the ZK connection is created. While this connection is not
explicitly cleaned up, the Mapper process eventually exits and thus the
connection is closed. Ideally the TableRecordReader would close the
connection in its close() method rather than relying on the process to die
for connection cleanup. This is fairly easy to implement by overriding
TableRecordReader, and also overriding TableInputFormat to specify the new
record reader.
The leak occurs when the JobClient is initializing and needs to retrieves the
splits. To get the splits, it instantiates a TableInputFormat. Doing so
creates a ZK connection that is never cleaned up. Unlike the mapper, however,
my job client process does not die. Thus the ZK connections accumulate.
I was able to fix the problem by writing my own TableInputFormat that does
not initialize the HTable in the getConf() method and does not have an HTable
member variable. Rather, it has a variable for the table name. The HTable is
instantiated where needed and then cleaned up. For example, in the
getSplits() method, I create the HTable, then close the connection once the
splits are retrieved. I also create the HTable when creating the record
reader, and I have a record reader that closes the connection when done.

[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199534#comment-13199534
 ] 

stack commented on HBASE-5325:
--

bq. Similar to the Namenode and Jobtracker, it would be good if the hbase 
master could expose some information through mbeans.

Can we go further (smile)?

Would be cool if when a regionserver registered, master started listening (or 
asking) regionservers for their metrics via jmx.  Master could use the 
collected info when it displays its UI (Perhaps it would make sense to do a 
cluster federated mbean that has the sum of all regionserver jmx metrics and 
use this displaying master ui (?) plus a master mbean with the master's 
vitals).  If we got this working, using jmx to query vitals, perhaps we could 
undo the rpc that the regionserver does to the master every second or so to 
tell it about its current load since HServerLoad is essentially duplicating 
metrics (Static or near-static properties that HServerLoad reports such as 
loaded coprocessors could be hoisted up as data in the regionservers ephemeral 
znode as protobufs/json data -- we could add to whats in HServerLoad reporting 
system vitals too like RAM, CPUs, Disks.  Master could do the same hoisting 
vitals up into its zk znode... and/or emit in an mbean).


 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Minor
 Attachments: HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199542#comment-13199542
 ] 

stack commented on HBASE-5325:
--

Can the bean implementation be outside of the master?  Just wondering if we can 
save having HMaster implement yet another Interface (as I'm sure you've 
noticed, this is a bloated class much in need a refactor; adding more stuff to 
this class makes the DOR -- Day Of Refactor -- an even longer day.

{code}
+mxBean = MBeans.register(HBase, HBaseMasterInfo, this);
{code}

Is this a good name for our bean?  Should it be at org.apache.hbase -- I don't 
remember how you do that -- and the bean name be just Master (I can imagine one 
day the ServerManager and the AssignmentManager registering mbeans...)  

On JSON'ing it, is that the way to go?  Can we not make MBeans instead -- and 
MBean per regionserver or a an MBean pointer to list of all registered MBeans 
(This would be a federated MBean?  Pardon me if this is all nonsense, its a 
while since I've done this stuff).  So, if CoProcessors implemented MBean, you 
could output list of MBeans as result of query on Master getCoProcessors.  

getDeadRegionServers should return a List of Strings... or whatever the MBean 
digestible format of an MBean list of Strings is.

Similar getRegionsInTransition.

Else patch looks great.

What you think about the MBean being actionable; i.e. setters so we could 
change certain values (not regionservers or coprocessors, but maybe future 
stuff)

Patch looks great.  Thanks for digging in here.



 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Minor
 Attachments: HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5318) Support Eclipse Indigo

2012-02-02 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199545#comment-13199545
]

stack commented on HBASE-5318:
--

Should this get a bit of doc in the reference guide? Around here:
http://hbase.apache.org/book.html#eclipse?

Good stuff.

Support Eclipse Indigo
---

Key: HBASE-5318
URL: https://issues.apache.org/jira/browse/HBASE-5318
Project: HBase
Issue Type: Improvement
Components: build
Affects Versions: 0.94.0
Environment: Eclipse Indigo (1.4.1) which includes m2eclipse (1.0
SR1).
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
Labels: maven
Attachments: mvn_HBASE-5318_r0.patch

The current 'standard' release of Eclipse (indigo) comes with m2eclipse
installed. However, as of m2e v1.0, interesting lifecycle phases are now
handled via a 'connector'. However, several of the plugins we use don't
support connectors. This means that eclipse bails out and won't build the
project or view it as 'working' even though it builds just fine from the the
command line.
Since Eclipse is one of the major java IDEs and that Indigo has been around
for a while, we should make it easy to for new devs to pick up the code and
for older devs to upgrade painlessly. The original build should not be
modified in any significant way.

[jira] [Commented] (HBASE-4336) Convert source tree into maven modules

2012-02-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199551#comment-13199551
 ] 

stack commented on HBASE-4336:
--

@Jesse For this kind of work I'd suggest something like restructure101.  They 
are a good crew and have given apache hbase a license; I could ask them for one 
for you.  Using restructure101, you can see our tangled mess in a pretty 
diagram Then you can try out various refactorings w/o actually changing 
code to see how viable a refactor.  If you want to check it out, come by and 
we'll sit around a screen for an hour.

On your suggestions:

{code}
I'm proposing to follow two general ideas to resolve this:
1) Abstracting out the static methods into a Util class of the same name (so it 
would be something like HLogUtils for static method in HLog).
{code}

... and then the Util class would be hoisted up into the packages above so it 
could be used from multiple packages?  Master and Regionserver packages need to 
know about hlog?  Should we refactor hlog to its own package?  Is that any 
better?  Lots of tools rate complexity by how much interpackage dependency we 
have going on (I suppose we could have an hlog module in an org.hbase.wal 
package that the regionserver and master modules depend on but yeah, we'd have 
a million modules then).

{code} 
2) Moving super general constants to HConstants (like HFile.DEFAULT_BLOCK_SIZE, 
which is scattered liberally throughout) and then more specific constants to 
subclasses within HConstants, meaning you might get something like 
HConstants.HLog.SPLIT_SKIP_ERRORS_DEFAULT.
{code}

I'm not a fan of building up HConstants to be this fat Interface refereneced by 
all and sundry.  I'd think that an HFile.DEFAULT_BLOCK_SIZE belongs in HFile.  
Whats odd is that there are refernces to this constant outside of hfile.  Why 
they reference it?

{code}
My other idea for (2) was to have a constants package that just has the 
constants for each class. This second means smaller files, but less 
easy/immediate/natural access to the constants, but gives you nice separation 
between the constants.
{code}

An org.apache.hadoop.hbase.constants?

 Convert source tree into maven modules
 --

 Key: HBASE-4336
 URL: https://issues.apache.org/jira/browse/HBASE-4336
 Project: HBase
  Issue Type: Task
  Components: build
Reporter: Gary Helmling
Priority: Critical
 Fix For: 0.94.0


 When we originally converted the build to maven we had a single core module 
 defined, but later reverted this to a module-less build for the sake of 
 simplicity.
 It now looks like it's time to re-address this, as we have an actual need for 
 modules to:
 * provide a trimmed down client library that applications can make use of
 * more cleanly support building against different versions of Hadoop, in 
 place of some of the reflection machinations currently required
 * incorporate the secure RPC engine that depends on some secure Hadoop classes
 I propose we start simply by refactoring into two initial modules:
 * core - common classes and utilities, and client-side code and interfaces
 * server - master and region server implementations and supporting code
 This would also lay the groundwork for incorporating the HBase security 
 features that have been developed.  Once the module structure is in place, 
 security-related features could then be incorporated into a third module -- 
 security -- after normal review and approval.  The security module could 
 then depend on secure Hadoop, without modifying the dependencies of the rest 
 of the HBase code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5328) Small changes to Master to make it more testable

2012-02-03 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199888#comment-13199888
 ] 

stack commented on HBASE-5328:
--

bq. I think MockRegionServer.java should be placed under 
src/test/java/org/apache/hadoop/hbase/regionserver/

Not yet.  Its built for testing the Master (as it says on the class comment).  
Later when it gets flushed out more it might make sense being moved elsewhere.

Thanks for review.  Hold off though till its more finished.  Good on you Ted.



 Small changes to Master to make it more testable
 

 Key: HBASE-5328
 URL: https://issues.apache.org/jira/browse/HBASE-5328
 Project: HBase
  Issue Type: Task
Reporter: stack
 Attachments: 5328.txt


 Here are some small changes in Master that make it more testable.  Included 
 tests stand up a Master and then fake it into thinking that three 
 regionservers are registering making master assign root and meta, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5318) Support Eclipse Indigo

2012-02-03 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200014#comment-13200014
 ] 

stack commented on HBASE-5318:
--

That looks like CLASSPATH messup where hadoop jar is now before hbase jar 
(classloader is finding a class that is in hadoop and hbase jar in the hadoop 
jar and then doing subsequent lookups in hadoop jar)

 Support Eclipse Indigo 
 ---

 Key: HBASE-5318
 URL: https://issues.apache.org/jira/browse/HBASE-5318
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.94.0
 Environment: Eclipse Indigo (1.4.1) which includes m2eclipse (1.0 
 SR1).
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
  Labels: maven
 Attachments: mvn_HBASE-5318_r0.patch


 The current 'standard' release of Eclipse (indigo) comes with m2eclipse 
 installed. However, as of m2e v1.0, interesting lifecycle phases are now 
 handled via a 'connector'. However, several of the plugins we use don't 
 support connectors. This means that eclipse bails out and won't build the 
 project or view it as 'working' even though it builds just fine from the the 
 command line.
 Since Eclipse is one of the major java IDEs and that Indigo has been around 
 for a while, we should make it easy to for new devs to pick up the code and 
 for older devs to upgrade painlessly. The original build should not be 
 modified in any significant way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202103#comment-13202103
]

stack commented on HBASE-5341:
--

bq. This would break the ability to compile HBase 0.92+ against Hadoop releases
without security.

Perhaps we could entertain breaking this for 0.94.0? i.e. saying we only run
on hadoops w/ security? (CDH3 has it? What doesn't that we want to run on by
the time 0.94.0 is out).

On modularization, yes, if hbase-4336 is done soon, security is a natural.
Otherwise, we should do as Enis suggests.

HBase build artifact should include security code by defult

Key: HBASE-5341
URL: https://issues.apache.org/jira/browse/HBASE-5341
Project: HBase
Issue Type: Improvement
Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

Hbase 0.92.0 was released with two artifacts, plain and security. The
security code is built with -Psecurity. There are two tarballs, but only the
plain jar in maven repo at repository.a.o.
I see no reason to do a separate artifact for the security related code,
since 0.92 already depends on secure Hadoop 1.0.0, and all of the security
related code is not loaded by default. In this issue, I propose, we merge the
code under /security to src/ and remove the maven profile.

[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-06 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202108#comment-13202108
]

stack commented on HBASE-5325:
--

@Enis metrics2 seems a bit out there for us (hadoop 0.23?). We want to run on
0.23 and 1.0 and 2.0, etc., so it'd be a while before we could lean on it.
metrics2 has facility that would help? (I've not studied it).

@Hitesh Regards I am still digging into jmx internals but I could not find
anything which mentions it as an option for pushing information., even if
there was a means (IIRC there is but am likely off), I think we'd have master
pulling.

bq. Having the master pull information from all region servers using jmx ( or
any other point to point protocol ) would likely be a bad idea from a
performance point of view.

Currently every regionserver sends status every (configurable) second. Its a
fat Writable serialization of each regionservers counters and current state.
IIRC, this mechanism runs mostly independent and beside our metrics (so
there'll be Writable serialization of regionstate and if something like tsdb is
running, there'll be a JMX serialization of server stating happening too).
Would be an improvement if we did metrics reporting one way only if possible.

bq. Also, was your intention to have the HMaster be a metric aggregator for the
RegionServers' metrics?

It does this now for key stats.

bq. I still need to look at nesting of mbeans from various components and also
need to look at the hbase code in more detail to see what kind of management
options could be exposed via jmx.

I'd be interested in what you think. We need to figure being able to config a
running cluster; i.e. change Configuration values and have hbase notice.
Having this go via jmx would likely be like taking the 'killarney road to
dingle' as my grandma used to say (its shorter if you take the tralee road) so
maybe jmx is read-only rather than 'management'.

Expose basic information about the master-status through jmx beans
---

Key: HBASE-5325
URL: https://issues.apache.org/jira/browse/HBASE-5325
Project: HBase
Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
Fix For: 0.94.0

Attachments: HBASE-5325.1.patch, HBASE-5325.wip.patch

Similar to the Namenode and Jobtracker, it would be good if the hbase master
could expose some information through mbeans.

[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202898#comment-13202898
 ] 

stack commented on HBASE-5347:
--

Very nice idea (if the reference counting can be figured).

 GC free memory management in Level-1 Block Cache
 

 Key: HBASE-5347
 URL: https://issues.apache.org/jira/browse/HBASE-5347
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 On eviction of a block from the block-cache, instead of waiting for the 
 garbage collecter to reuse its memory, reuse the block right away.
 This will require us to keep reference counts on the HFile blocks. Once we 
 have the reference counts in place we can do our own simple 
 blocks-out-of-slab allocation for the block-cache.
 This will help us with
 * reducing gc pressure, especially in the old generation
 * making it possible to have non-java-heap memory backing the HFile blocks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-07 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202908#comment-13202908
]

stack commented on HBASE-5341:
--

@Enis 0.92 won't be modularized. HBASE-5288 will be fixed in 0.92.1. Regards
getting security artifacts into maven, I'm not sure how I'd do that. I suppose
I'd do everything w/ the security profile. Will try it (Thats going to be fun
w/ our build taking two hours and mvn rebuilding 4 times I believe per artifact
before I the artifact gets hoisted to apache staging).

HBase build artifact should include security code by defult

[jira] [Commented] (HBASE-5348) Constraint configuration loaded with bloat

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202916#comment-13202916
 ] 

stack commented on HBASE-5348:
--

Why don't you want to load the 'default' config?  What do you get if you don't 
load *.xml files?

 Constraint configuration loaded with bloat
 --

 Key: HBASE-5348
 URL: https://issues.apache.org/jira/browse/HBASE-5348
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, regionserver
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Attachments: java_HBASE-5348.patch, java_HBASE-5348.patch


 Constraints load the configuration but don't load the 'correct' 
 configuration, but instead instantiate the default configuration (via new 
 Configuration). It should just be Configuration(false)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202919#comment-13202919
 ] 

stack commented on HBASE-5347:
--

Blocks are not all of the same size.  Will this be an issue?  Blocks of an 
awkward size -- say a block that happen to have a massive KeyValue in them and 
they exceed massively the default block size -- would need to be treated 
differently?

 GC free memory management in Level-1 Block Cache
 

 Key: HBASE-5347
 URL: https://issues.apache.org/jira/browse/HBASE-5347
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 On eviction of a block from the block-cache, instead of waiting for the 
 garbage collecter to reuse its memory, reuse the block right away.
 This will require us to keep reference counts on the HFile blocks. Once we 
 have the reference counts in place we can do our own simple 
 blocks-out-of-slab allocation for the block-cache.
 This will help us with
 * reducing gc pressure, especially in the old generation
 * making it possible to have non-java-heap memory backing the HFile blocks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5348) Constraint configuration loaded with bloat

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202946#comment-13202946
 ] 

stack commented on HBASE-5348:
--

@Jesse Ok.  Make a patch w/ the little paragraph as a comment on why you pass 
false and I'll +1 it (passing a boolean in to a Configuration constructor is 
not done elsewhere in hbase codebase that I know of so it needs a little 
explaination).  Thanks.

 Constraint configuration loaded with bloat
 --

 Key: HBASE-5348
 URL: https://issues.apache.org/jira/browse/HBASE-5348
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, regionserver
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Attachments: java_HBASE-5348.patch, java_HBASE-5348.patch


 Constraints load the configuration but don't load the 'correct' 
 configuration, but instead instantiate the default configuration (via new 
 Configuration). It should just be Configuration(false)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202951#comment-13202951
 ] 

stack commented on HBASE-5341:
--

Will do.  Let me peg this issue against 0.92.1 so we don't forget it.

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.92.1


 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5350) Fix jamon generated package names

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202966#comment-13202966
 ] 

stack commented on HBASE-5350:
--

If you run hbase, the webui comes up fine?

Otherwise patch lgtm.

 Fix jamon generated package names
 -

 Key: HBASE-5350
 URL: https://issues.apache.org/jira/browse/HBASE-5350
 Project: HBase
  Issue Type: Bug
  Components: monitoring
Affects Versions: 0.92.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.94.0

 Attachments: jamon_HBASE-5350.patch


 Previously, jamon was creating the template files in org.apache.hbase, but 
 it should be org.apache.hadoop.hbase, so it's in line with rest of source 
 files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5348) Constraint configuration loaded with bloat

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202971#comment-13202971
 ] 

stack commented on HBASE-5348:
--

I see that now.  Agree it would be overkill.  Let me commit.

 Constraint configuration loaded with bloat
 --

 Key: HBASE-5348
 URL: https://issues.apache.org/jira/browse/HBASE-5348
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, regionserver
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Attachments: java_HBASE-5348.patch, java_HBASE-5348.patch


 Constraints load the configuration but don't load the 'correct' 
 configuration, but instead instantiate the default configuration (via new 
 Configuration). It should just be Configuration(false)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5346) Fix testColumnFamilyCompression and test_TIMERANGE in TestHFileOutputFormat

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202986#comment-13202986
 ] 

stack commented on HBASE-5346:
--

Patch looks good to me.  Let me submit the patch to see if it breaks when we 
run on 1.0.0 hadoop.

  Fix testColumnFamilyCompression and test_TIMERANGE in TestHFileOutputFormat
 

 Key: HBASE-5346
 URL: https://issues.apache.org/jira/browse/HBASE-5346
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce, test
Affects Versions: 0.90.4, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5346-v0.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92 (for testColumnFamilyCompression and test_TIMERANGE):
 Failed tests: 
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
 test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
 The problem is that these tests make incorrect assumptions about the output 
 of mapreduce jobs.  Prior to 0.23, temporary data was in, for example:
 ./_temporary/_attempt___r_00_0/b/1979617994050536795
 Now that has changed.  The correct way to get that path is based on 
 getDefaultWorkFile.
 Also, the data is not moved into the outputPath until both the Task and Job 
 are committed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5335) Dynamic Schema Configurations

2012-02-07 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203114#comment-13203114
]

stack commented on HBASE-5335:
--

bq. Combined with online schema change, this will allow us to safely iterate on
configuration settings.

So, configuration change would come in via online schema change rather than
say, via zookeeper callback? The former would be heavyweight compared (closing
and reopening regions which seems overkill for say, a change in flush size).
Scoping to table and store also makes it so this scheme won't work for
configurations that are not table nor store.

Dynamic Schema Configurations
-

Key: HBASE-5335
URL: https://issues.apache.org/jira/browse/HBASE-5335
Project: HBase
Issue Type: New Feature
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Labels: configuration, schema

Currently, the ability for a core developer to add per-table per-CF
configuration settings is very heavyweight. You need to add a reserved
keyword all the way up the stack you have to support this variable
long-term if you're going to expose it explicitly to the user. This has
ended up with using Configuration.get() a lot because it is lightweight and
you can tweak settings while you're trying to understand system behavior
[since there are many config params that may never need to be tuned]. We
need to add the ability to put read arbitrary KV settings in the HBase
schema. Combined with online schema change, this will allow us to safely
iterate on configuration settings.

[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203273#comment-13203273
 ] 

stack commented on HBASE-5221:
--

So idea is to put hadoop jars under /share?  Where does this notion come from? 
(Pardon my ignorance)

 bin/hbase script doesn't look for Hadoop jars in the right place in trunk 
 layout
 

 Key: HBASE-5221
 URL: https://issues.apache.org/jira/browse/HBASE-5221
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-5221.txt


 Running against an 0.24.0-SNAPSHOT hadoop:
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or 
 directory
 ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: 
 No such file or directory
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or 
 directory
 The jars are rooted deeper in the heirarchy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5335) Dynamic Schema Configurations

2012-02-07 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203269#comment-13203269
]

stack commented on HBASE-5335:
--

bq ...but you'd need to do a large refactor to handle truly online
configuration transitions.

What if we just did the 'important' ones?

On the locking primitives, if a long or something, we could just have the
config. volatile? Would be more costly than a final long but could make a
local copy per method invocation I suppose.

bq. Note that, with your region server draining, you can change global config
values in an online fashion on a per-server basis by issuing a 'regionserver
restart --draining' or something like that.

We can do that now, right? Make the change then do our rolling restart with
graceful shedding of regions from the server, and then graceful replacement.

The addition of the HBaseTableConfiguration and HBaseStoreConfiguration would
make it all a little nicer. I'm interested in how the HBaseStoreConfiguration
configs make it into the schema and then get undone on the other side (You
could ask me how a config. change in a zk callback makes it up into a
regionserver Configuration instance... not sure)

bq. ...have you done a lot of testing with killing ZKQuorumPeers on clusters
with a lot of regions

Haven't done any. Sounds like we should? Does the ensemble have to have a
bunch of znodes afloat for you to see the spikes or is it just catching up the
restarted peers state regardless of the particular znode loading at the time?

Dynamic Schema Configurations
-

Key: HBASE-5335
URL: https://issues.apache.org/jira/browse/HBASE-5335
Project: HBase
Issue Type: New Feature
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Labels: configuration, schema

[jira] [Commented] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203277#comment-13203277
 ] 

stack commented on HBASE-5292:
--

This patch is for trunk?  You going to commit Mikhail?  Seems like 
TestReplication failed only which is probably unrelated?

 getsize per-CF metric incorrectly counts compaction related reads as well 
 --

 Key: HBASE-5292
 URL: https://issues.apache.org/jira/browse/HBASE-5292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
 Attachments: 
 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, 
 D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch


 The per-CF getsize metric's intent was to track bytes returned (to HBase 
 clients) per-CF. [Note: We already have metrics to track # of HFileBlock's 
 read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt 
 vs. fsblockreadcnt.]
 Currently, the getsize metric gets updated for both client initiated 
 Get/Scan operations as well for compaction related reads. The metric is 
 updated in StoreScanner.java:next() when the Scan query matcher returns an 
 INCLUDE* code via a:
  HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength());
 We should not do the above in case of compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5336) Spurious exceptions in HConnectionImplementation

2012-02-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203314#comment-13203314
 ] 

stack commented on HBASE-5336:
--

Are we calling a sync before a new WAL initialized?

 Spurious exceptions in HConnectionImplementation
 

 Key: HBASE-5336
 URL: https://issues.apache.org/jira/browse/HBASE-5336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 I have seen this on the client a few time during heave write testing:
 java.util.concurrent.ExecutionException: java.io.IOException: 
 java.io.IOException: java.lang.NullPointerException
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1376)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:891)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:743)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:730)
   at NewsFeedCreate.insert(NewsFeedCreate.java:91)
   at NewsFeedCreate$1.run(NewsFeedCreate.java:38)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.NullPointerException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:212)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1348)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   ... 1 more
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:243)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1289)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1386)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2161)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1954)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3363)
   at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1353)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1351)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
   ... 7 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA,

[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout

2012-02-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203687#comment-13203687
 ] 

stack commented on HBASE-5221:
--

No worries Jimmy.  That'll do.  Thanks. Let me commit.

 bin/hbase script doesn't look for Hadoop jars in the right place in trunk 
 layout
 

 Key: HBASE-5221
 URL: https://issues.apache.org/jira/browse/HBASE-5221
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-5221.txt


 Running against an 0.24.0-SNAPSHOT hadoop:
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or 
 directory
 ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: 
 No such file or directory
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or 
 directory
 The jars are rooted deeper in the heirarchy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3537) [site] Make it so each page of manual allows users comment like mysql's manual does

2012-02-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203785#comment-13203785
 ] 

stack commented on HBASE-3537:
--

I don't seem to have filled in here that I couldn't figure how to make facebook 
comments include the page the comment was made on.  Here is another 
http://disqus.com/ we might embed.  

 [site] Make it so each page of manual allows users comment like mysql's 
 manual does
 ---

 Key: HBASE-3537
 URL: https://issues.apache.org/jira/browse/HBASE-3537
 Project: HBase
  Issue Type: Improvement
Reporter: stack

 I like the way the mysql manuals allow users comment, improve or correct 
 mysql manual pages.  We should have same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout

2012-02-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203801#comment-13203801
 ] 

stack commented on HBASE-5221:
--

@Roman Thanks for catching this.

 bin/hbase script doesn't look for Hadoop jars in the right place in trunk 
 layout
 

 Key: HBASE-5221
 URL: https://issues.apache.org/jira/browse/HBASE-5221
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.94.0

 Attachments: hbase-5221.txt


 Running against an 0.24.0-SNAPSHOT hadoop:
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or 
 directory
 ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: 
 No such file or directory
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or 
 directory
 The jars are rooted deeper in the heirarchy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4762) ROOT and META region never be assigned if IOE throws in verifyRootRegionLocation

2012-02-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203811#comment-13203811
 ] 

stack commented on HBASE-4762:
--

This should be less likely in TRUNK, right RAM?, since we retry in TRUNK but 
not in 0.90.x

 ROOT and META region never be assigned if IOE throws in 
 verifyRootRegionLocation
 

 Key: HBASE-4762
 URL: https://issues.apache.org/jira/browse/HBASE-4762
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: mingjian
Assignee: mingjian
 Fix For: 0.90.7


 Patch in HBASE-3914 fixed root assigned in two regionservers. But it seemed 
 like root region will never be assigned if verifyRootRegionLocation throws 
 IOE.
 Like following master logs:
 {noformat}
 2011-10-19 19:13:34,873 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
 Caught throwable while processing event M_META_SERVER_S
 HUTDOWN
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running 
 yet
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1090)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
 at 
 org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:256)
 at $Proxy7.getRegionInfo(Unknown Source)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:471)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:90)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:126)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 After this, -ROOT-'s region won't be assigned, like this:
 {noformat}
 2011-10-19 19:18:40,000 DEBUG 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
 locateRegionInMeta parent
 Table=-ROOT-, metaLocation=address: dw79.kgb.sqa.cm4:60020, regioninfo: 
 -ROOT-,,0.70236052, attempt=0 of 10 failed; retrying after s
 leep of 1000 because: org.apache.hadoop.hbase.NotServingRegionException: 
 Region is not online: -ROOT-,,0
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2771)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1802)
 at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:569)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1091)
 {noformat}
 So we should rewrite the verifyRootRegionLocation method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5353) HA/Distributed HMaster via RegionServers

2012-02-08 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203872#comment-13203872
]

stack commented on HBASE-5353:
--

I'd say just run the master in-process w/ the regionserver. Master doesn't do
much (It used to be heavily loaded when we did log splitting but thats
distributed now or on startup... but even then, should be fine).

Client already tracks master location as you say though we need to undo
this...and just have the client do a read of zk to find master location when it
needs it.

Regards UI, we'd collapse it so that there'd be a single webapp rather than the
two we have now. There'd be a 'master' link. If the current regionserver were
not the master, the master link would redirect you to current master.

HA/Distributed HMaster via RegionServers

Key: HBASE-5353
URL: https://issues.apache.org/jira/browse/HBASE-5353
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.94.0
Reporter: Jesse Yates
Priority: Minor

Currently, the HMaster node must be considered a 'special' node (single point
of failure), meaning that the node must be protected more than the other
commodity machines. It should be possible to instead have the HMaster be much
more available, either in a distributed sense (meaning a bit rewrite) or with
multiple instances and automatic failover.

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-08 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203921#comment-13203921
]

stack commented on HBASE-5270:
--

I was taking a look through HBASE-5179 and HBASE-4748 again, the two issues
that spawned this one (Both are in synopsis about master failover with
concurrent servershutdown handler running). I have also been looking at
HBASE-5344
[89-fb] Scan unassigned region directory on master failover.

HBASE-5179 starts out as we can miss edits if a server is discovered to be dead
AFTER master failover has started up splitting logs because we'll notice it
dead so will assign out its regions but before we've had a chance to split its
logs. The way fb deal with this in hbase-5344 is not to process zookeeper
events that come in during master failover. They queue them instead and only
start in on the processing after master is up.

Chunhui does something like this in his original patch by adding any server
currently being processed by server shutdown to the list of regionservers whose
logs we should not split. The fb way of halting temporarily the callback
processing seems more airtight.

HBASE-5179 is then extended to include as in scope, the processing of servers
carrying root and meta (hbase-4748) that crash during master failover. We need
to consider the cases where a server crashes AFTER master failover distributed
log splitting has started but before we run the verifications of meta and root
locations.

Currently we'll expire the server that is unresponsive when we go to verify
root and meta locations. The notion is that the meta regions will be assigned
by the server shutdown handler. The fb technique of turning off processing zk
events would mess with our existing handling code here -- but I'm not too
confident the code is going to do the right thing since it has no tests of this
predicament and the scenarios look like they could be pretty varied (root is
offline only, meta server has crashed only, a server with both root and meta
has crashed, etc). In hbase-5344, fb will go query each regionserver for the
regions its currently hosting (and look in zk to see what rs are up). Maybe
we need some of this from 89-fb in trunk but I'm not clear on it just yet;
would need more study of the current state of trunk and then of what is
happening over in 89-fb.

One thing I think we should do to lessen the number of code paths we can take
on failover is to do the long-talked of purge of the root region. This should
cut down on the number of states we need to deal with and make reasoning about
failure states on failover easier to reason about.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Fix For: 0.94.0, 0.92.1

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-08 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203923#comment-13203923
]

stack commented on HBASE-5270:
--

HBASE-3171 is the issue to purge root.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Fix For: 0.94.0, 0.92.1

[jira] [Commented] (HBASE-5353) HA/Distributed HMaster via RegionServers

2012-02-08 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204113#comment-13204113
]

stack commented on HBASE-5353:
--

bq. Except it opens a new can of worms: where do you find the master UI? how do
you monitor your master if it moves around? how do you easily find the master
logs when it could be anywhere in the cluster?

Its not a new can of worms, right? We have the above (mostly unsolved)
problems now if you run with more than one master.

bq. And any cron jobs or nagios alerts you write need to first call some HBase
utility to find the active master's IP via ZK in order to get to it?

They should be doing this now, if multiple masters?

If the master function were lightweight enough, it'd be kinda sweet having one
daemon type only I'd think; there'd be no longer need for special treatment of
master. Might be tricky having them running in the same JVM what w/ all the
executors afloat and RPCs (I'd rather do all in the one JVM then have RS
start/stop separate Master processes if we were going to go this route).

HA/Distributed HMaster via RegionServers

[jira] [Commented] (HBASE-5353) HA/Distributed HMaster via RegionServers

2012-02-08 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204268#comment-13204268
]

stack commented on HBASE-5353:
--

bq. ...you only need to look two places if you have an issue. If you have no
idea where the master is, you have to hunt around the cluster to find it.

I'd imagine it'd be hard getting this patch in if no idea where the master is
(And, again, don't we have this problem now if you start up three masters and
one fails? You have to hunt around. We need to build the redirect piece
regardless such as a link to master on each server page which redirects to
current master and such as a history of who was master when in zk).

You could even make the combined master+regionserver daemon work like our
current multimaster system by having there be affinity for a certain set of
servers.

What kind of nagios alerts would be master particular? We need to add
indirection to these now anyways -- ask zk who the master is -- if more than
one master running. Metrics could be a little complicated especially if master
moved servers over the period of interest but generally aren't master metrics
of less interest since they are generally just aggregates and ganglia or
opentsdb do it better job of this anyways?

Logs don't have to be interleaved. Thats just a bit of log4j config?

Yes, could be issue if the daemon is bogged down. The master would be less
responsive which should be fine for short periods but if sustained it could be
issue.

I'm not going to work on this. I do see it as something that could simplify
our deploy story.

HA/Distributed HMaster via RegionServers

Currently, the HMaster node(s) must be considered a 'special' node (though
not a single point of failover), meaning that the node must be protected more
than the other cluster machines or at least specially monitored. Minimally,
we always need to ensure that the master is running, rather than letting the
system handle that internally. It should be possible to instead have the
HMaster be much more available, either in a distributed sense (meaning a bit
rewrite) or multiple, dynamically created instances combined with the hot
fail-over of masters.

[jira] [Commented] (HBASE-5355) Compressed RPC's for HBase

2012-02-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204280#comment-13204280
 ] 

stack commented on HBASE-5355:
--

Could we do same prefix compression on the wire?

Todd in your proposal above we'd convert KVs to your new form before putting 
payload on wire and then undo it on other side?

Do we care about latency here?

Long time back Ryan tried to a custom compression before putting stuff on the 
wire and then undoing it on other end hoping it would help some w/ latency but 
he found it upped latency.  I should dig up a pointer (could be how he went 
about it).

 Compressed RPC's for HBase
 --

 Key: HBASE-5355
 URL: https://issues.apache.org/jira/browse/HBASE-5355
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Affects Versions: 0.89.20100924
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan

 Some application need ability to do large batched writes and reads from a 
 remote MR cluster. These eventually get bottlenecked on the network. These 
 results are also pretty compressible sometimes.
 The aim here is to add the ability to do compressed calls to the server on 
 both the send and receive paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5327) Print a message when an invalid hbase.rootdir is passed

2012-02-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204287#comment-13204287
 ] 

stack commented on HBASE-5327:
--

The difference is that we get a nicer message on master crash out?  Its about 
the bad rootdir passed rather than some complaint about a URI parse?

If so, good enough for me.  +1.

 Print a message when an invalid hbase.rootdir is passed
 ---

 Key: HBASE-5327
 URL: https://issues.apache.org/jira/browse/HBASE-5327
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jimmy Xiang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5327.txt


 As seen on the mailing list: 
 http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24124
 If hbase.rootdir doesn't specify a folder on hdfs we crash while opening a 
 path to .oldlogs:
 {noformat}
 2012-02-02 23:07:26,292 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
 path in absolute URI: hdfs://sv4r11s38:9100.oldlogs
 at org.apache.hadoop.fs.Path.initialize(Path.java:148)
 at org.apache.hadoop.fs.Path.init(Path.java:71)
 at org.apache.hadoop.fs.Path.init(Path.java:50)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
 hdfs://sv4r11s38:9100.oldlogs
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:145)
 ... 6 more
 {noformat}
 It could also crash anywhere else, this just happens to be the first place we 
 use hbase.rootdir. We need to verify that it's an actual folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3171) Drop ROOT and instead store META location(s) directly in ZooKeeper

2012-02-08 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204307#comment-13204307
]

stack commented on HBASE-3171:
--

I took a look at what is involved here. I first started out just hacking
-ROOT- out of hbase completely. It was fun while it lasted (The root location
in zk would instead become the meta location w/ a watcher, etc.). On my bike
ride home I figured it'd be better if I could make it so old clients looking
for a -ROOT- no longer there would still sort of work. I dug around but was
unable to figure a way of making this work, not w/o leaving a -ROOT-
placeholder in place and then adding interception code that would take a
different route if we were scanning the now non-existent to find the meta
location now up in zk rather than in a root table; in other words, adding more,
cryptic code. It looked to get ugly fast too.

So, I see no way of doing this w/o breaking backward compatibility. Old code
would go looking for a non-existent -ROOT- region. Do we want to do this?

Drop ROOT and instead store META location(s) directly in ZooKeeper
--

Key: HBASE-3171
URL: https://issues.apache.org/jira/browse/HBASE-3171
Project: HBase
Issue Type: Improvement
Components: client, master, regionserver, zookeeper
Reporter: Jonathan Gray

Rather than storing the ROOT region location in ZooKeeper, going to ROOT, and
reading the META location, we should just store the META location directly in
ZooKeeper.
The purpose of the root region from the bigtable paper was to support
multiple meta regions. Currently, we explicitly only support a single meta
region, so the translation from our current code of a single root location to
a single meta location will be very simple. Long-term, it seems reasonable
that we could store several meta region locations in ZK. There's been some
discussion in HBASE-1755 about actually moving META into ZK, but I think this
jira is a good step towards taking some of the complexity out of how we have
to deal with catalog tables everywhere.
As-is, a new client already requires ZK to get the root location, so this
would not change those requirements in any way.
The primary motivation for this is to simplify things like CatalogTracker.
The way we can handle root in that class is really simple but the tracking of
meta is difficulty and a bit hacky. This hack on tracking of the meta
location is what caused one of the bugs over in HBASE-3159.

[jira] [Commented] (HBASE-5363) Add rat check to run automatically on mvn build.

2012-02-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204671#comment-13204671
 ] 

stack commented on HBASE-5363:
--

We already have this.  See our pom:

{code}
plugin
  groupIdorg.apache.rat/groupId
  artifactIdapache-rat-plugin/artifactId
  version0.7/version
configuration
  excludes
exclude**/.*/exclude
exclude**/target/**/exclude
exclude**/CHANGES.txt/exclude
exclude**/CHANGES.txt/exclude
exclude**/generated/**/exclude
exclude**/conf/*/exclude
exclude**/*.avpr/exclude
exclude**/control/exclude
exclude**/conffile/exclude
exclude**/8e8ab58dcf39412da19833fcd8f687ac/exclude
!--It don't like freebsd license--
excludesrc/site/resources/css/freebsd_docbook.css/exclude
  /excludes
/configuration
/plugin
{code}

And see the generated report: http://hbase.apache.org/rat-report.html

Here is the issue where I messed w/ this stuff on trunk: HBASE-4647

 Add rat check to run automatically on mvn build.
 

 Key: HBASE-5363
 URL: https://issues.apache.org/jira/browse/HBASE-5363
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.90.5, 0.92.0
Reporter: Jonathan Hsieh

 Some of the recent hbase release failed rat checks (mvn rat:check).  We 
 should add checks likely in the mvn package phase so that this becomes a 
 non-issue in the future.
 Here's an example from Whirr:
 https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5327) Print a message when an invalid hbase.rootdir is passed

2012-02-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204722#comment-13204722
 ] 

stack commented on HBASE-5327:
--

+1

 Print a message when an invalid hbase.rootdir is passed
 ---

 Key: HBASE-5327
 URL: https://issues.apache.org/jira/browse/HBASE-5327
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jimmy Xiang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5327.txt, hbase-5327_v2.txt


 As seen on the mailing list: 
 http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24124
 If hbase.rootdir doesn't specify a folder on hdfs we crash while opening a 
 path to .oldlogs:
 {noformat}
 2012-02-02 23:07:26,292 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
 path in absolute URI: hdfs://sv4r11s38:9100.oldlogs
 at org.apache.hadoop.fs.Path.initialize(Path.java:148)
 at org.apache.hadoop.fs.Path.init(Path.java:71)
 at org.apache.hadoop.fs.Path.init(Path.java:50)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
 hdfs://sv4r11s38:9100.oldlogs
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:145)
 ... 6 more
 {noformat}
 It could also crash anywhere else, this just happens to be the first place we 
 use hbase.rootdir. We need to verify that it's an actual folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5363) Add rat check to run automatically on mvn build.

2012-02-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204723#comment-13204723
 ] 

stack commented on HBASE-5363:
--

The rat plugin is pretty crappy in my experience too.  Its hard to get it 
behave.

 Add rat check to run automatically on mvn build.
 

 Key: HBASE-5363
 URL: https://issues.apache.org/jira/browse/HBASE-5363
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.90.5, 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 Some of the recent hbase release failed rat checks (mvn rat:check).  We 
 should add checks likely in the mvn package phase so that this becomes a 
 non-issue in the future.
 Here's an example from Whirr:
 https://github.com/apache/whirr/blob/trunk/pom.xml line 388 for an example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5365) [book] adding description of compaction file selection to refGuide in Arch/Regions/Store

2012-02-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204744#comment-13204744
 ] 

stack commented on HBASE-5365:
--

This doc patch is excellent.

I thought default flush size in trunk 128M, not (134 mb)?  Its done twice.  
What is '(e.g., 10F)'?  Is that float?



 [book] adding description of compaction file selection to refGuide in 
 Arch/Regions/Store
 

 Key: HBASE-5365
 URL: https://issues.apache.org/jira/browse/HBASE-5365
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
 Attachments: docbkx_hbase_5365.patch


 book.xml
 * adding description of compaction selection algorithm with examples (based 
 on existing unit tests) 
 * also added a few links to the compaction section from other places in the 
 book that already mention compaction.
 configuration.xml
 * added link to compaction section from the entry that discusses configuring 
 major compaction interval.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2375) Make decision to split based on aggregate size of all StoreFiles and revisit related config params

2012-02-09 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204779#comment-13204779
]

stack commented on HBASE-2375:
--

Doing first bullet point only sounds good. Lets file issues for the split
other suggestions.

What about the other recommendations made up in the issue regards
compactionThreshold.

Upping compactionThreshold from 3 to 5 where 5 is than the number of flushes
it would take to make us splittable; i.e. the intent is no compaction before
first split.

Should we do this too as part of this issue? We could make our flush size 256M
and compactionThreshold 5. Or perhaps thats too rad (thats a big Map to be
carrying around)? Instead up the compactionThreshold and down the default
regionsize from 1G to 512M and keep flush at 128M?

I took a look at patch and its pretty stale now given changes that have gone in
since.

Make decision to split based on aggregate size of all StoreFiles and revisit
related config params
--

Key: HBASE-2375
URL: https://issues.apache.org/jira/browse/HBASE-2375
Project: HBase
Issue Type: Improvement
Components: regionserver
Affects Versions: 0.20.3
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
Labels: moved_from_0_20_5
Attachments: HBASE-2375-v8.patch

Currently we will make the decision to split a region when a single StoreFile
in a single family exceeds the maximum region size. This issue is about
changing the decision to split to be based on the aggregate size of all
StoreFiles in a single family (but still not aggregating across families).
This would move a check to split after flushes rather than after compactions.
This issue should also deal with revisiting our default values for some
related configuration parameters.
The motivating factor for this change comes from watching the behavior of
RegionServers during heavy write scenarios.
Today the default behavior goes like this:
- We fill up regions, and as long as you are not under global RS heap
pressure, you will write out 64MB (hbase.hregion.memstore.flush.size)
StoreFiles.
- After we get 3 StoreFiles (hbase.hstore.compactionThreshold) we trigger a
compaction on this region.
- Compaction queues notwithstanding, this will create a 192MB file, not
triggering a split based on max region size (hbase.hregion.max.filesize).
- You'll then flush two more 64MB MemStores and hit the compactionThreshold
and trigger a compaction.
- You end up with 192 + 64 + 64 in a single compaction. This will create a
single 320MB and will trigger a split.
- While you are performing the compaction (which now writes out 64MB more
than the split size, so is about 5X slower than the time it takes to do a
single flush), you are still taking on additional writes into MemStore.
- Compaction finishes, decision to split is made, region is closed. The
region now has to flush whichever edits made it to MemStore while the
compaction ran. This flushing, in our tests, is by far the dominating factor
in how long data is unavailable during a split. We measured about 1 second
to do the region closing, master assignment, reopening. Flushing could take
5-6 seconds, during which time the region is unavailable.
- The daughter regions re-open on the same RS. Immediately when the
StoreFiles are opened, a compaction is triggered across all of their
StoreFiles because they contain references. Since we cannot currently split
a split, we need to not hang on to these references for long.
This described behavior is really bad because of how often we have to rewrite
data onto HDFS. Imports are usually just IO bound as the RS waits to flush
and compact. In the above example, the first cell to be inserted into this
region ends up being written to HDFS 4 times (initial flush, first compaction
w/ no split decision, second compaction w/ split decision, third compaction
on daughter region). In addition, we leave a large window where we take on
edits (during the second compaction of 320MB) and then must make the region
unavailable as we flush it.
If we increased the compactionThreshold to be 5 and determined splits based
on aggregate size, the behavior becomes:
- We fill up regions, and as long as you are not under global RS heap
pressure, you will write out 64MB (hbase.hregion.memstore.flush.size)
StoreFiles.
- After each MemStore flush, we calculate the aggregate size of all
StoreFiles. We can also check the compactionThreshold. For the first three
flushes, both would not hit the limit. On the fourth flush, we would see
total aggregate size = 256MB and determine to make a split.

[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store

2012-02-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204782#comment-13204782
 ] 

stack commented on HBASE-5367:
--

In trunk this is 128M:

{code}
   codehbase.hregion.memstore.flush.size/code (64 mb). /listitem
{code}

{code}
namehbase.hregion.memstore.flush.size/name
value134217728/value
description
{code}

 [book] small formatting changes to compaction description in 
 Arch/Regions/Store
 ---

 Key: HBASE-5367
 URL: https://issues.apache.org/jira/browse/HBASE-5367
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_hbase_5367.xml.patch


 Fixing a few small-but-important things that came out of a post-commit 
 comment in HBASE-5365
 book.xml
 * corrected default region flush size (it's actually 64mb)
 * removed trailing 'F' in a ratio discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5367) [book] small formatting changes to compaction description in Arch/Regions/Store

2012-02-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204800#comment-13204800
 ] 

stack commented on HBASE-5367:
--

{code}
long flushSize = this.htableDescriptor.getMemStoreFlushSize();
if (flushSize == HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE) {
  flushSize = conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE,
 HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE);
}
{code}

So, looks like DEFAULT_MEMSTORE_FLUSH_SIZE is 64M which is confusing and we'll 
use whats in HTD IFF its different from this default.

Yeah, easy to get confused.

 [book] small formatting changes to compaction description in 
 Arch/Regions/Store
 ---

 Key: HBASE-5367
 URL: https://issues.apache.org/jira/browse/HBASE-5367
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_hbase_5367.xml.patch, book_hbase_5367_2.xml.patch


 Fixing a few small-but-important things that came out of a post-commit 
 comment in HBASE-5365
 book.xml
 * corrected default region flush size (it's actually 64mb)
 * removed trailing 'F' in a ratio discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 1551 matches

Mail list logo