from:"stack \\\\\\\(Commented\\\\\\\) \\\\\\\(JIRA\\\\\\\)"

[jira] [Commented] (HBASE-5489) Add HTable accessor to get regions for a key range

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219318#comment-13219318
]

stack commented on HBASE-5489:
--

Add HTable accessor to get regions for a key range
--

Key: HBASE-5489
URL: https://issues.apache.org/jira/browse/HBASE-5489
Project: HBase
Issue Type: Improvement
Components: client
Reporter: David S. Wang
Assignee: David S. Wang
Priority: Minor

It would be nice to have an accessor to find all regions that overlap with a
particular range of keys. Right now, the only way to accomplish that is to
call HTable.getStartEndKeys(), then follow that with calls to
getRegionLocation() for the range of keys you are interested in. This
algorithm has 2 drawbacks:
* It returns more keys than is necessary most of the time. This is
especially evident if there are a lot of regions comprising the table and the
range of keys is small.
* It always does a scan of .META. via MetaScannerVisitor for at least
HTable.getStartEndKeys(), and perhaps for HRegionLocations that are not
already cached by the client.
An accessor that limited its scans to a specified range could avoid scanning
.META. at all if the HRegionLocations being fetched were already cached by
the client, thereby potentially making this operation faster in common cases.
Here's a proposal for the accessor:
/**
* Get the corresponding regions for an arbitrary range of keys.
* p
* @param startRow Starting row in range, inclusive
* @param endRow Ending row in range, inclusive
* @return A list of HRegionLocations corresponding to the regions that
* contain the specified range
* @throws IOException if a remote or network exception occurs
*/
public ListHRegionLocation getRegionsInRange(final byte [] startKey,
final byte [] endKey) throws IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5492) Caching StartKeys and EndKeys of Regions

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219333#comment-13219333
 ] 

stack commented on HBASE-5492:
--

Once filled, how does the cache of table locations get refreshed if a table 
region splits?

 Caching StartKeys and EndKeys of Regions
 

 Key: HBASE-5492
 URL: https://issues.apache.org/jira/browse/HBASE-5492
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.92.0
 Environment: all
Reporter: honghua zhu
 Fix For: 0.92.1

 Attachments: HBASE-5492.patch


 Each call for HTable.getStartEndKeys will read meta table.
 In particular, 
 in the case of client side multi-threaded concurrency statistics, 
 we must call HTable.coprocessorExec==  getStartKeysInRange == 
 getStartEndKeys,
 resulting in the need to always scan the meta table.
 This is not necessary,
 we can implement the 
 HConnectionManager.HConnectionImplementation.locateRegions(byte[] tableName) 
 method,
 then, get the StartKeys and EndKeys from the cachedRegionLocations of 
 HConnectionImplementation.
 Combined with https://issues.apache.org/jira/browse/HBASE-5491, can improve 
 the performance of statistical

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219349#comment-13219349
]

stack commented on HBASE-5270:
--

@Prakash

Presume we pass list from splitLogAfterStartup to joinClusters as you suggest
and presume list of servers included the server that had been hosting .META.

Allow that during or just after splitLogAfterStartup, .META. server 'crashes'
-- it becomes unresponsive. Also allow that somehow, just before it hung up,
during a long running log split, .META. took on a couple of edits saying
regions A, B, and C had split.

In assignRootAndMeta, we'll notice the unresponsiveness, force the expiration
of the server that was carrying .META. (this will queue a ServerShutdownHandler
but will not wait on its completion), and we'll then reassign of .META. Its
very likely that .META. will go to one of the other 'good' servers. Its also
likely that the SSH will not have completed its processing before this assign
happens. Thus, on deploy, the .META. will be missing the above A, B, and C
split edits.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch,
5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch,
5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch,
hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch,
hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt

This JIRA continues the effort from HBASE-5179. Starting with Stack's
comments about patches for 0.92 and TRUNK:
Reviewing 0.92v17
isDeadServerInProgress is a new public method in ServerManager but it does
not seem to be used anywhere.
Does isDeadRootServerInProgress need to be public? Ditto for meta version.
This method param names are not right 'definitiveRootServer'; what is meant
by definitive? Do they need this qualifier?
Is there anything in place to stop us expiring a server twice if its carrying
root and meta?
What is difference between asking assignment manager isCarryingRoot and this
variable that is passed in? Should be doc'd at least. Ditto for meta.
I think I've asked for this a few times - onlineServers needs to be
explained... either in javadoc or in comment. This is the param passed into
joinCluster. How does it arise? I think I know but am unsure. God love the
poor noob that comes awandering this code trying to make sense of it all.
It looks like we get the list by trawling zk for regionserver znodes that
have not checked in. Don't we do this operation earlier in master setup? Are
we doing it again here?
Though distributed split log is configured, we will do in master single
process splitting under some conditions with this patch. Its not explained in
code why we would do this. Why do we think master log splitting 'high
priority' when it could very well be slower. Should we only go this route if
distributed splitting is not going on. Do we know if concurrent distributed
log splitting and master splitting works?
Why would we have dead servers in progress here in master startup? Because a
servershutdownhandler fired?
This patch is different to the patch for 0.90. Should go into trunk first
with tests, then 0.92. Should it be in this issue? This issue is really hard
to follow now. Maybe this issue is for 0.90.x and new issue for more work on
this trunk patch?
This patch needs to have the v18 differences applied.

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219354#comment-13219354
]

stack commented on HBASE-5270:
--

@Chunhui You have a new method splitLogIfOnline which will split the log if the
server was online. Why do you not expire the server? (You remove the
expireIfOnline method).

Now we have this initializing state, do you think we should also stop the
processing of expired servers during this startup phase and instead queue them
up for processing after the master is up? Could do that in another issue maybe
since this issue has been going on too long and your patch is at least an
improvement on what we currently have (This startup sequence needs a big
refactor IMO -- it is way too complicated figuring the sequence in which stuff
runs).

Are there still holes? For example, say the .META. server crashes AFTER we've
verified it up in assignRootAndMeta but before we get to do a scan of .META. to
rebuild user regions list. Could .META. be assigned w/o log splitting
finishing? (I don't think so... .META. would be offline until the
servershutdown handler ran and it would first split logs).

Good stuff.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219355#comment-13219355
 ] 

stack commented on HBASE-4991:
--

@Mubarak After looking at FATE and whats involved, I think it a bit much to 
expect that we build that as a prereq. for this facility.  At the same time, 
lets minimize custom code -- code that is particular to the addition of this 
feature only.  Let me do another review of your last patch w/ that in mind.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219380#comment-13219380
 ] 

stack commented on HBASE-4991:
--

Reviewing this patch again, could we not obtain this patches's objective with 
merge?  Merge could take a flag which said True/false copy the data from old 
regions into the new merge region





 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219412#comment-13219412
]

stack commented on HBASE-4991:
--

After looking again too at the patch, it has too much custom code that is all
about region delete.

It should take a range as Todd suggests earlier rather than list of regions.
This means you can not pass a list of discontinuous regions but thats ok I
think; just do multiple invocations.

This seems to have wrong param name and javadoc:

{code}
+ /**
+ * Gets the count of online regions of the table in a region server.
+ * This method looks at the in-memory onlineRegions.
+ * @param regionName
+ * @return int regions count
+ * @throws IOException
+ */
+ public int getRegionsCount(byte[] regionName) throws IOException;
{code}

When I see MasterDeleteRegionTracker, and the equivalent for regionservers, it
makes me yearn for a generic framework that these things could run on; it
strikes me as too much custom code and custom handlers. This we should fix.
We should come up w/ generics that can be customized to do feature specifics.

Why are we using a janitor to do the delete of regions rather than an executor?

Why we have this getDeleteRegionTracker ?

The generic soln Interface would have a method the balancer would check...

+ if (deleteRegionTracker.isDeleteRegionInProgress()) {

Rather than do the above for every feature we add.

Should this getDeleteRegionStatusFromDeleteRegionTracker be in the
DeleteRegionTracker? And should it be something that is apart from the Master
rather than in the master?

This seems wrong: getDeleteRegionTracker in the MasterServices Interface.

Why we add it there? Why can't it be independent of Master? Having to have a
Master makes it harder to test I'm sure.

DeleteRegionHandler should not be dealing w/ balancer. That seems dirty.

This seems racy: waitForInflightSplit

Do we do this every time? +moveStoreFilesToNewRegionDir(byFamily, fs,
tableDir, newRegionInfo);

If so, is this actually a merge and not a delete?

Do these methods need to be in HREgionInfo?

moveDataFromAdjacentRegionToNewRegion
createNewRegionFromAdjacentRegion

Could be in HRegion or in a RegionUtil class? RS is already bloated.

A bunch of these other methods ... adding new region and deleting old region
... would seem to have overlap with existing code where we add regions to meta
after open and also w/ merge code?

We can't have master package refernced in zookeeper package; i.e. see
MasterDeleteRegionTracker.

I've already commented on other stuff in this patch.

In general the patch is well done. It just adds a bunch of custom facility w/o
genericizing at least some aspects so could be used by other features yet to
come. In particular, this looks to be a specialization on merge. If so, lets
go for merge altogether.

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219421#comment-13219421
]

stack commented on HBASE-4991:
--

bq. Are we going to enhance Merge by allowing to discard data belonging to one
of the regions ?

This feature looks to be adding online merge to me. I need clarification from
Mubarak that that is indeed the case. If so, this issue is mislabeled and the
patch needs redoing.

I was just suggesting that if you want to actually drop a regions data, you
could pass a flag to merge and it would not bother copying over the files from
old regions. That would be an option. This patch as is does not do that. It
seems to copy old region data into new regions. Was just a suggestion.

bq. How should we deal with various failure scenarios in the process of merging
?

Eh... in a manner which is resilient against failures, TBD. I don't get your
question Ted. Are you asking me or the Author of this patch?

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219428#comment-13219428
]

stack commented on HBASE-5270:
--

@Prakash

bq. then the next step, that you outlined in your scenario, cannot be
allowed

How should we do this boss?

bq. The problem you are outlining is probably still there but the scenario has
to be refined.

What should I add? If we allow that the split could take a long time, its
possible that on entry to the log splitting the server was good but by the end
it could have gone AWOL.

bq. And then it should initialize everything based on that knowledge which must
not change during initialization.

I think the root issue is that it needs to scan .META. and -ROOT- as part of
the startup; they need to be assigned and up w/ all edits in place. Thats
whats proving to be a little tough to ensure.

(Thanks for the review P).

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219511#comment-13219511
]

stack commented on HBASE-4991:
--

bq. Online merge requires table needs to be disabled but deleteRegion
(deleteRange) does not require table needs to be disabled.

We've had ongoing conversation -- before your time so you are not expected to
have known about it -- on our doing an online merge. Its actually pretty
critical need. See HBASE-1621 for some history (Ignore its title where it
says table should be offline -- it should be online).

FYI, the current merge code is broke and unused. It works for a unit test but
I'd say its years since anyone tried to use it to actually do anything useful.

So, we are agreed that conceptually, whats going on here is region merging? If
so, that helps understanding around whats going on here. We should also likely
rename what this issue does to be about merging since thats how we've been
describing this operation over the years.

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-5491) Remove HBaseConfiguration.create() call from coprocessor.Exec class

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219527#comment-13219527
 ] 

stack commented on HBASE-5491:
--

@Honghua My suggestion would make your patch smaller.

 Remove HBaseConfiguration.create() call from coprocessor.Exec class
 ---

 Key: HBASE-5491
 URL: https://issues.apache.org/jira/browse/HBASE-5491
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Affects Versions: 0.92.0
 Environment: all
Reporter: honghua zhu
 Fix For: 0.92.1

 Attachments: HBASE-5491.patch


 Exec class has a field: private Configuration conf = 
 HBaseConfiguration.create()
 Client side generates an Exec instance of the class, each initiated 
 Statistics request by ExecRPCInvoker
 Is so HBaseConfiguration.create for each request needs to call
 When the server side deserialize the Exec Called once 
 HBaseConfiguration.create in,
 HBaseConfiguration.create is a time consuming operation.
 private Configuration conf = HBaseConfiguration.create();
 This code is only useful for testing code 
 (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization),
 other places with the Exec class, pass a Configuration come,
 so no need to conf field a default value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219638#comment-13219638
]

stack commented on HBASE-4991:
--

@Lars

bq. This is not the same as merge, right?

Sounds like it is.

bq. The region's data will be gone

This patch seems to copy the data from the deleted region up into the new
hole-plugging region. It doesn't seem to delete it.

As to your 1., 2., 3... yes, thats what this patch does only the operators and
the classes are all named DeleteRegion* blah when what is happening is region
merging.

I think its important to get the concept right else its going to confuse for
ever after.

@Mubarak So, sounds like the command/api could also be named merge rather than
deleteRegion (You are not actually deleting the data, is that right?)?

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219678#comment-13219678
 ] 

stack commented on HBASE-4991:
--

Implementation-wise, this is a merge with the added option that we not copy the 
data of the regions we are merging.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5491) Remove HBaseConfiguration.create() call from coprocessor.Exec class

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219794#comment-13219794
 ] 

stack commented on HBASE-5491:
--

+1

 Remove HBaseConfiguration.create() call from coprocessor.Exec class
 ---

 Key: HBASE-5491
 URL: https://issues.apache.org/jira/browse/HBASE-5491
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Affects Versions: 0.92.0
 Environment: all
Reporter: honghua zhu
 Fix For: 0.92.1

 Attachments: HBASE-5491-2.patch, HBASE-5491.patch


 Exec class has a field: private Configuration conf = 
 HBaseConfiguration.create()
 Client side generates an Exec instance of the class, each initiated 
 Statistics request by ExecRPCInvoker
 Is so HBaseConfiguration.create for each request needs to call
 When the server side deserialize the Exec Called once 
 HBaseConfiguration.create in,
 HBaseConfiguration.create is a time consuming operation.
 private Configuration conf = HBaseConfiguration.create();
 This code is only useful for testing code 
 (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization),
 other places with the Exec class, pass a Configuration come,
 so no need to conf field a default value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219805#comment-13219805
]

stack commented on HBASE-4991:
--

bq. Well, client's deleteRegion call is asynchronous so no fail-over if client
has to do the business.

Fair enough. I was suggesting doing it as a client script because then it'd be
outside of the servers and easier to test. If client dies, restart it, it
looks in zk for work to do and carries on from where the last client was. But
no biggie.

What about my question about why we delegate merge/delete out to the
regionservers? Why not have them do nothing but the close and then have the
master do the remove or merging of fs content and fixup in meta? Would that be
less moving parts?

Let me give some higher level feedback in a sec.

@Jieshan Yes that'll work. How you do it? You have a patch?

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-5454) Refuse operations from Admin before master is initialized

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219808#comment-13219808
 ] 

stack commented on HBASE-5454:
--

I'm +1 on commit for this.  We can figure something similar for handling of zk 
callbacks in another issue.

 Refuse operations from Admin before master is initialized
 -

 Key: HBASE-5454
 URL: https://issues.apache.org/jira/browse/HBASE-5454
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: hbase-5454.patch, hbase-5454v2.patch


 In our testing environment,
 When master is initializing, we found conflict problems between 
 master#assignAllUserRegions and EnableTable event, causing assigning region 
 throw exception so that master abort itself.
 We think we'd better refuse operations from Admin, such as CreateTable, 
 EnableTable,etc, It could reduce error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219839#comment-13219839
]

stack commented on HBASE-4991:
--

@Mubarak I see that this patch is modeled on HBASE-4213, the online schema-edit
patch. I'm not sure that is a good model to follow in the first place -- its
disabled because it does not currently work in the face of splits though it has
handler code supposedly to manage this and secondly, its a bunch of custom code
specific to the schema change only. Your patch does a bunch of copy/paste from
the schema patch duplicating the model and then also repeating code except for
some changes in method names and the znodes we wait on up in zk. Rather don't
you think we should be generalizing the common facility and having these two
features share its use rather than making a copy, especially since we now we
have two clients in need (Its actually three if you count merge, which IMO,
this feature should be built on). For example, in both cases we need to
disable table splitting. In the schema patch it does this with a
waitForInflightSchemaChange check that looks at state in zk and then in the
splitRegion code, we wait by invoking the below:

{code}
waitForSchemaChange(Bytes.toString(regionInfo.getTableName()));
{code}

You come along and do a repeat. You add to the splitRegion code:

{code}
+ waitForDeleteRegion(regionInfo.getEncodedName());
{code}

The list of things to check before we go ahead and split could get pretty long
if we keep on down this route.

Instead we should have a generic disable splitting function that both schema
edit and this patch could use.

Going back to your design, I see this:

{code}
4. DeleteRegionTracker (new class in RS side) will process
nodeChildrenChanged(), get the list of regions_to_be_deleted, check that those
regions are being hosted by the RS, if yes then

doDeleteRegion
call deleteRegion() in HRegionServer
disable the region split
close the region
remove from META
bridge the whole in META (extending the span of before or after region)
remove region directory from HDFS
update state in ZK
(zookeeper.znode.parent/delete-region/encoded-region-name)
{code}

Does the above presume all regions for a range are on a single regionserver (If
not, how is the meta editing done -- in particular the bridging of the hole in
.META.?).

I'm asking because I think its not a good design asking regionservers to do the
merge; it makes this patch more complicated than it need be IMO.

I suggest we go back to the design and work forward from there. Your patch is
fat and has a bunch of good stuff that we can repurpose once we have the design
done.

I suggest a design below. It has some prerequisites, some general function
that this feature could use (and others). The prereqs if you think them good,
could be done outside of this JIRA.

Here's a suggested rough outline of how I think this feature should run. The
feature I'm describing below is merge and deleteRegion for I see them as in
essence the same thing.

# Client calls merge or deleteRegion API. API is a range of rows.
# Master gets call.
# Master obtains a write lock on table so it can't be disabled from under us.
The write lock will also disable splitting. This is one of the prereqs I think.
Its HBASE-5494 (Or we could just do something simpler where we have a flag up
in zk that splitRegion checks but thats less useful I think; OR we do the
dynamic configs issue and set splits to off via a config. change). There'd be
a timer for how long we wait on the table lock.
# If we get the lock, write intent to merge a range up into zk. It also hoists
into zk if its a pure merge or a merge that drops the region data (a
deleteRegion call)
# Return to the client either our failed attempt at locking the table or an id
of some sort used identifying this running operation; can use it querying
status.
# Turn off balancer. TODO/prereq: Do it in a way that is persisted. Balancer
switch currently in memory only so if master crashes, new master will come up
in balancing mode # (If we had dynamic config. could hoist up to zk a config.
that disables the balancer rather than have a balancer-specific flag/znode OR
if a write lock outstanding on a table, then the balancer does not balance
regions in the locked table -- this latter might be the easiest to do)
# Write into zk that just turned off the balancer (If it was on)
# Get regions that are involved in the span
# Hoist the list up into zk.
# Create region to span the range.
# Write that we did this up into zk.
# Close regions in parallel. Confirm close in parallel.
# Write up into zk regions closed (This might not be necessary since can ask if
region is open).
# If a merge and not a delete region, move files under new region. Might
multithread this (moves should go pretty fast). If a deleteregion, we skip this
step.
# On

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219843#comment-13219843
 ] 

stack commented on HBASE-4991:
--

Oh, I forgot to mention that each step in the above should be repickupable -- 
i.e. if the process running the above crashes, on restart it should continue 
where the previous left off -- up until .META. edits (even here, we should make 
it so we can repair).  We should include a cancel facility.  Anything we 
develop would have to be testable; both the individual steps and then the 
process as a whole.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219846#comment-13219846
 ] 

stack commented on HBASE-4991:
--

One more thing, it would be sweet if the above were not hardcoded but instead 
was a set of steps described elsewhere and malleable or even better, if we 
could describe the steps to run on top of some generic operations framework as 
per FATE, but that would be a bunch more work.

How many regions are we talking of merging/deleting at any one time? I think 
above should work for a big table as long was we did stuff in parallel; closes 
and file moving.  To be confirmed.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219845#comment-13219845
 ] 

stack commented on HBASE-4991:
--

One more thing, it would be sweet if the above were not hardcoded but instead 
was a set of steps described elsewhere and malleable or even better, if we 
could describe the steps to run on top of some generic operations framework as 
per FATE, but that would be a bunch more work.

How many regions are we talking of merging/deleting at any one time? I think 
above should work for a big table as long was we did stuff in parallel; closes 
and file moving.  To be confirmed.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219849#comment-13219849
 ] 

stack commented on HBASE-4991:
--

@Mubarak Our comments crossed.  See further up in this issue for more on what 
I'm thinking; it should answer the questions you pose.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-02-29 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219857#comment-13219857
 ] 

stack commented on HBASE-5424:
--

Zhiyuan Why do you reference hbase-5165?  You think it the cause of this 
patch's failures?

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: 5424-v3.patch, 5424-v3.patch, HBASE-5424.patch, 
 HBase-5424_v2.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219858#comment-13219858
]

stack commented on HBASE-5270:
--

bq. I have thought this issue for a long time, and I think preventing
processing of SSH is a clear and simple solution, otherwise we should consider
many cases where meta server died in different time during master initializing.

Would we do the above as part of another issue Chunhui?

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-29 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219861#comment-13219861
]

stack commented on HBASE-5270:
--

Also, do you need to make a new version of this patch now hbase-5454 has gone
in? Thanks.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-03-01 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220165#comment-13220165
]

stack commented on HBASE-4991:
--

@Ted

bq. Do you think we should continue discussion on the new framework under
HBASE-5487 ?

It might be time to kill this issue and start up a new one. Not under 5487
though. What you think Mubarak? I'd think that if we started a new issue,
it'd be called online merge and would first work out the design.

@Mubarak

bq. 10 to 18 goes to RS sideWe need ZK trackers in both sides, isn't?

I don't think so. Master is just asking the regionservers to close regions.
Master can create regions and do the moving of data from old regions into new.
Its just fs operations. No need of regionserver context, especially not live
regionserver context.

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220166#comment-13220166
 ] 

stack commented on HBASE-5399:
--

@N 1 out of 29 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java.rej

Is it because I just committed fat 4403 patch?

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v14.patch, 
 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220167#comment-13220167
 ] 

stack commented on HBASE-5399:
--

@N Probably should post the patch up on reviewboard.. its certainly fat enough

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v14.patch, 
 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220168#comment-13220168
 ] 

stack commented on HBASE-5424:
--

Should we close this issue then Zhiyuan as fixed or will be fixed by 5165?  
Thanks.

 HTable meet NPE when call getRegionInfo()
 -

 Key: HBASE-5424
 URL: https://issues.apache.org/jira/browse/HBASE-5424
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1, 0.90.5
Reporter: junhua yang
 Attachments: 5424-v3.patch, 5424-v3.patch, HBASE-5424.patch, 
 HBase-5424_v2.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We meet NPE when call getRegionInfo() in testing environment.
 Exception in thread main java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
 at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
 at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
 at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418)
 This NPE also make the table.jsp can't show the region information of this 
 table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5501) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220178#comment-13220178
 ] 

stack commented on HBASE-5501:
--

Whats going on here Chunhui?  You've opened a new issue w/ same title as 5270?  
How do they relate?  Shouldn't this patch be on 5270, not on this new issue?

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5501
 URL: https://issues.apache.org/jira/browse/HBASE-5501
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5501.patch


 In a live cluster, we do the following step
 1.kill the master;
 1.start the master, and master is initializing；
 3.master complete splitLog
 4.kill the META server
 5.master start assigning ROOT and META
 6.Now meta region data will loss since we may assign meta region before SSH 
 finish split log for dead META server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5501) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220180#comment-13220180
 ] 

stack commented on HBASE-5501:
--

Is this patch right?  You have a disablessh flag.  What about other callbacks 
that could come in during the initialization?  I don't see where you queue up 
callbacks for processing post-initialization.

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5501
 URL: https://issues.apache.org/jira/browse/HBASE-5501
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5501.patch


 In a live cluster, we do the following step
 1.kill the master;
 1.start the master, and master is initializing；
 3.master complete splitLog
 4.kill the META server
 5.master start assigning ROOT and META
 6.Now meta region data will loss since we may assign meta region before SSH 
 finish split log for dead META server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-01 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220184#comment-13220184
]

stack commented on HBASE-5270:
--

bq. I have created a new issue and submit the new patch, what about move
discussion to HBASE-5501 since this one has too long comments

That the issue is long is not a good reason to open new issue. We need to keep
the decisions and discussion in one place. It makes it easier on the folks who
are following behind us to make sense of why we did what. Would suggest you
close hbase-5501.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
Fix For: 0.92.1, 0.94.0

[jira] [Commented] (HBASE-5502) region_mover.rb fails to load regions back to original server for regions only containing empty tables.

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220191#comment-13220191
 ] 

stack commented on HBASE-5502:
--

For those following, this patch was committed to trunk mistakenly as part of 
the fat commit of HBASE-4403.

 region_mover.rb fails to load regions back to original server for regions 
 only containing empty tables.
 ---

 Key: HBASE-5502
 URL: https://issues.apache.org/jira/browse/HBASE-5502
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0
 Environment: Ubuntu precise
Reporter: James Page
Priority: Minor
 Fix For: 0.92.1, 0.94.0

 Attachments: HDFS-5502.patch


 The region_mover loadRegion function incorrectly uses 'isSuccessfulScan':
 {noformat} 
   for r in regions
 exists = false
 begin
   exists = isSuccessfulScan(admin, r)
 rescue org.apache.hadoop.hbase.NotServingRegionException = e
   $LOG.info(Failed scan of  + e.message)
 end
 {noformat} 
 isSuccessfulScan throws an exception when it fails rather than returning 
 status.
 As a result empty regions don't get restored - this is the case in a fresh 
 install (which is how I discovered this) with no user table.
 Modifying the code to set exists IF isSuccessfulScan does not throw an 
 exception worked for me:
 {noformat}
   for r in regions
 exists = false
 begin
   isSuccessfulScan(admin, r)
   exists = true
 rescue org.apache.hadoop.hbase.NotServingRegionException = e
   $LOG.info(Failed scan of  + e.message)
 end
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220192#comment-13220192
 ] 

stack commented on HBASE-4403:
--

For those following behind, this patch went in in two batches.  First the doc 
change and then the bulk went in mistakenly as a commit against hbase-5502.  I 
left the commit but then edited the commit message to say the below:

{code}
Author: stack
Revision: 1295710
Modified property: svn:log

Modified: svn:log at Thu Mar  1 17:57:34 2012
--
--- svn:log (original)
+++ svn:log Thu Mar  1 17:57:34 2012
@@ -1 +1 @@
-HBASE-5502 region_mover.rb fails to load regions back to original server for 
regions only containing empty tables.
+HBASE-4403 Adopt interface stability/audience classifications from Hadoop AND 
HBASE-5502 region_mover.rb fails to load regions back to original server for 
regions only containing empty tables
{code}


 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, 
 hbase-4403-interface_v3.txt, hbase-4403-nowhere-near-done.txt, 
 hbase-4403.patch, hbase-4403.patch


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220199#comment-13220199
 ] 

stack commented on HBASE-4991:
--

Ted: This has been raised already above as a concern and some suggestions have 
been made that we // certain ops.  Your suggestion that we farm out the closing 
of regions to the regionservers themselves will make no difference regards how 
fast regions close and ditto regards deletes.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220219#comment-13220219
 ] 

stack commented on HBASE-5451:
--

I wonder how Jimmy is doing stuff like the below:

{code}
+import 
org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.ConnectionHeaderProto;
+import 
org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcRequestWithHeaderProto;
+import 
org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcResponseWithHeaderProto;
+import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcExceptionProto;
+import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcRequestProto;
+import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcResponseProto;
{code}

... over in his patch (he's just done the proto stuff so far -- not sure about 
what the generated stuff will look like just yet).

We need to have convention around this stuff.   Might be worth chat up on dev 
list.

For example, I wonder if we need the protobuf package here since all proto 
classes are contained in RPCMessageProtos (and its got a suffix to identify it 
as proto generated).

The below is a pity but as you say, shouldn't live long:

{code}
+  result.write(d);
+  //makes a copy; but this part of code is not going to live long
+  //hopefully (only until we move all the protocols to protobuf)
+  response.setResponse(ByteString.copyFrom(d.getData()));
{code}

Is that maybeTranslate method repeated?

Exceptions still strings then?

+response.getException().getStackTrace()));

Looks like your pom edit clashes with Jimmys over in the HRegionInterface redo. 
 May the first commit win.

The doc on the proto file is great.

This is shaping up nice.

Jimmy, you should review!

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-03-01 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220345#comment-13220345
]

stack commented on HBASE-4991:
--

bq. When we change the location of rename() call from master to region server
for distributed log splitting, the duration was shortened from 22 minutes to 7
minutes for the same dataset.

Because rename was done via multiple clients rather than in parallel on master?
You sure it wasn't because of something else? (Distributed splitting is a
different type of process to what is going on here)

What do you want to farm out to the regionservers? We are already farming out
work in the design above. We ask the regionservers to close regions for us.
You want to farm out more than this? Control? To what end other than
complicating the design?

bq. I wonder if you have statistics showing that master-side operation (for
moving/deleting data of old regions) makes no difference in performance w.r.t.
distributed operations.

Well, stands to reason I'd think. I'd put it on you to come up w/ proof that
what seems reasonable actually isn't, at least when talking about the tens of
regions at most which is what I think this issue is about.

Provide capability to delete named region
-

Key: HBASE-4991
URL: https://issues.apache.org/jira/browse/HBASE-4991
Project: HBase
Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
Fix For: 0.94.0

Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch

See discussion titled 'Able to control routing to Solr shards or not' on
lily-discuss
User may want to quickly dispose of out of date records by deleting specific
regions.

[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220346#comment-13220346
 ] 

stack commented on HBASE-4348:
--

@Himanshu That would be useful

 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob

 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220366#comment-13220366
 ] 

stack commented on HBASE-4348:
--

Can you make it so the table doesn't bump up against the first one?

Should it be a separate table?  Why not add sum on end and columns to the first 
table showing time in transition?  Flag red the one that has been in transition 
the longest?


 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob
 Attachments: 4348-v1.patch, 4348-v2.patch, RITs.png


 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5492) Caching StartKeys and EndKeys of Regions

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220370#comment-13220370
 ] 

stack commented on HBASE-5492:
--

bq. Your question can be converted into who is responsible for refreshing of 
cachedRegionLocations?

You can do that.  Its done as the HTable runs but there are no guarantees the 
cache is complete at any one time, right?

Change +  if (tableLocations.size() == 0) { to tableLocations.isEmpty().. 
the former can be costly compared.

So, IIUC, you are asking to pull all region locations local in locateRegions?

Does this get all table regions?  Or just the configured next ten regions?

 +prefetchRegionCache(tableName, null);



 Caching StartKeys and EndKeys of Regions
 

 Key: HBASE-5492
 URL: https://issues.apache.org/jira/browse/HBASE-5492
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.92.0
 Environment: all
Reporter: honghua zhu
 Fix For: 0.92.1

 Attachments: HBASE-5492.patch


 Each call for HTable.getStartEndKeys will read meta table.
 In particular, 
 in the case of client side multi-threaded concurrency statistics, 
 we must call HTable.coprocessorExec==  getStartKeysInRange == 
 getStartEndKeys,
 resulting in the need to always scan the meta table.
 This is not necessary,
 we can implement the 
 HConnectionManager.HConnectionImplementation.locateRegions(byte[] tableName) 
 method,
 then, get the StartKeys and EndKeys from the cachedRegionLocations of 
 HConnectionImplementation.
 Combined with https://issues.apache.org/jira/browse/HBASE-5491, can improve 
 the performance of statistical

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220377#comment-13220377
 ] 

stack commented on HBASE-4991:
--

I did already at '01/Mar/12 17:17'

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3909) Add dynamic config

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220383#comment-13220383
 ] 

stack commented on HBASE-3909:
--

@Jimmy

bq. we don't have to poll fs to find changes. We can just put the 
lastmodifieddate of the file in ZK. Once the last modified date is changed, we 
can load the file again.

Why have the fs involved at all.  What would be the advantage?  Why not just 
put the changed configs. up in zk?  Because we could lose them (because we do 
not keep permanent data up in zk)?  That'd be ok I think.  If you don't move 
the configs to hbase-site, then its your own fault if they are lost when zk 
data is cleared.



 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.96.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time

2012-03-01 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220680#comment-13220680
]

stack commented on HBASE-5494:
--

I think maybe region locks is for later. Meantime, getting a read lock on the
table when doing a split or merge might carry us a long way.before we need
region specific locks.

@Mubarak Todd has a good point that respecting locking order is important.
Maybe we should use the facility where zk can add the seqid to the name? So
maybe locks are named:

zookeeper.znode.parent/locks/table_name/r-seqid

zookeeper.znode.parent/locks/table_name/w-seqid

where the data then is details on who took the lock?

And you can't take a write lock if any instances of read lock outstanding?

I was thinking that maybe clients would be prepared to wait some time obtaining
a read lock but that they might fail fast if they could not get a read lock?

Introduce a zk hosted table-wide read/write lock so only one table operation
at a time
--

Key: HBASE-5494
URL: https://issues.apache.org/jira/browse/HBASE-5494
Project: HBase
Issue Type: Improvement
Reporter: stack

I saw this facility over in the accumulo code base.
Currently we just try to sort out the mess when splits come in during an
online schema edit; somehow we figure we can figure all possible region
transition combinations and make the right call.
We could try and narrow the number of combinations by taking out a zk table
lock when doing table operations.
For example, on split or merge, we could take a read-only lock meaning the
table can't be disabled while these are running.
We could then take a write only lock if we want to ensure the table doesn't
change while disabling or enabling process is happening.
Shouldn't be too hard to add.

[jira] [Commented] (HBASE-5504) Online Merge

2012-03-01 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220689#comment-13220689
]

stack commented on HBASE-5504:
--

bq. Since region splitting is disabled after step 3, how do we deal with the
case where the start of row range lies in the middle of a region ?

Good question. How about, if a merge, we include the hosting region in the
merge. If a delete region, we throw an exception saying you need to specify
region edges.

bq. For step 14, we still need to move data for delete region request because
we should have chosen one of the neighbor regions to cover the hole in .META.

I think this comes of a misunderstanding that you might have Ted. You can't
alter region edges once created. For example, the directory in hdfs is the
hash of regionname which includes at least the startkey and should one day
include the endkey... If you change the delimiting keys, you have to make a new
region. Were you thinking we could change the delimiting keys on an existing
region?

On 'What if the master crashes anywhere between step 6 and step 19 ?', its what
Mubarak says; the new master comes up and after initializing, tries to pick up
merge/delete from where the previous master left off... Or simpler, it could
just undo it all?

bq. How do we get around if merge/delete-range get stuck (it should not but if
it happens???)

I think we need to make the operation cancelable? In shell/api, there'd be a
cancel operation on table! (Since you need a write lock to do one of these
operations, this would mean one operation at a time per table only. maybe
no need of there being a lock transaction id because only one happening at a
time?)

Later we can do something better where if an operation does not complete, the
master operation runner would do the cancel.. but that we could do later?

Online Merge

Key: HBASE-5504
URL: https://issues.apache.org/jira/browse/HBASE-5504
Project: HBase
Issue Type: Brainstorming
Components: client, master, shell, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Fix For: 0.96.0

As discussed, please refer the discussion at
[HBASE-4991|https://issues.apache.org/jira/browse/HBASE-4991]
Design suggestion from Stack:
{quote}
I suggest a design below. It has some prerequisites, some general function
that this feature could use (and others). The prereqs if you think them good,
could be done outside of this JIRA.
Here's a suggested rough outline of how I think this feature should run. The
feature I'm describing below is merge and deleteRegion for I see them as in
essence the same thing.
(C) Client, (M) Master, RS (Region server), ZK (ZooKeeper)
1. Client calls merge or deleteRegion API. API is a range of rows. (C)
2. Master gets call. (M)
3. Master obtains a write lock on table so it can't be disabled from under
us. The write lock will also disable splitting. This is one of the prereqs I
think. Its HBASE-5494 (Or we could just do something simpler where we have a
flag up in zk that splitRegion checks but thats less useful I think; OR we do
the dynamic configs issue and set splits to off via a config. change).
There'd be a timer for how long we wait on the table lock. (M - ZK)
4. If we get the lock, write intent to merge a range up into zk. It also
hoists into zk if its a pure merge or a merge that drops the region data (a
deleteRegion call) (M)
5. Return to the client either our failed attempt at locking the table or an
id of some sort used identifying this running operation; can use it querying
status. (M - C)
6. Turn off balancer. TODO/prereq: Do it in a way that is persisted. Balancer
switch currently in memory only so if master crashes, new master will come up
in balancing mode # (If we had dynamic config. could hoist up to zk a config.
that disables the balancer rather than have a balancer-specific flag/znode OR
if a write lock outstanding on a table, then the balancer does not balance
regions in the locked table - this latter might be the easiest to do) (M)
7. Write into zk that just turned off the balancer (If it was on) (M - ZK)
8. Get regions that are involved in the span (M)
9. Hoist the list up into zk. (M - ZK)
10. Create region to span the range. (M)
11. Write that we did this up into zk. (M - ZK)
12. Close regions in parallel. Confirm close in parallel. (M - RS)
13. Write up into zk regions closed (This might not be necessary since can
ask if region is open). (M - ZK)
14. If a merge and not a delete region, move files under new region. Might
multithread this (moves should go pretty fast). If a deleteregion, we skip
this step. (M)
15. On completion mark zk (though may not be necessary since its easy to look
in fs to see state of move). (M - ZK)

[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220693#comment-13220693
 ] 

stack commented on HBASE-4348:
--

@Himanshu Yeah,  a line on the end w/ count of regions  one minute up in RIT 
would be good enough.  You could make them yellow in the listing.  But yeah, 
would be great if they came out as metrics as per Mr. Todd.

 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob
 Attachments: 4348-v1.patch, 4348-v2.patch, RITs.png


 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220699#comment-13220699
 ] 

stack commented on HBASE-4991:
--

@Mubarak Should we then close this issue as subsumed by the new issue (we can 
reuse your code pasted here over in the new issue after we work through 
design).  Good stuff.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220721#comment-13220721
 ] 

stack commented on HBASE-5399:
--

Thats a wide variety in the types of failures. You get same kind of variance 
absent your patch N?



 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v14.patch, 
 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 
 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5419) FileAlreadyExistsException has moved from mapred to fs package

2012-03-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220739#comment-13220739
 ] 

stack commented on HBASE-5419:
--

Well, maybe this is for 0.96 then?  That ok w/ you Dhruba?  0.96 will be the 
singularity, the release that gets the protobuf rpcs will require cluster 
shutdown and restart but thereafter, we should be able to upgrade running hbase 
across major versions.

 FileAlreadyExistsException has moved from mapred to fs package
 --

 Key: HBASE-5419
 URL: https://issues.apache.org/jira/browse/HBASE-5419
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Fix For: 0.94.0

 Attachments: D1767.1.patch, D1767.1.patch


 The FileAlreadyExistsException has moved from 
 org.apache.hadoop.mapred.FileAlreadyExistsException to 
 org.apache.hadoop.fs.FileAlreadyExistsException. HBase is currently using a 
 class that is deprecated in hadoop trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5504) Online Merge

2012-03-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221023#comment-13221023
 ] 

stack commented on HBASE-5504:
--

bq. I meant that data for the neighbor region we choose should be copied. The 
neighbor region would have new delimiting key.

Sorry, I'm not following Ted.  You need to bring me a long.  Thanks.

 Online Merge
 

 Key: HBASE-5504
 URL: https://issues.apache.org/jira/browse/HBASE-5504
 Project: HBase
  Issue Type: Brainstorming
  Components: client, master, shell, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Fix For: 0.96.0


 As discussed, please refer the discussion at 
 [HBASE-4991|https://issues.apache.org/jira/browse/HBASE-4991]
 Design suggestion from Stack:
 {quote}
 I suggest a design below. It has some prerequisites, some general function 
 that this feature could use (and others). The prereqs if you think them good, 
 could be done outside of this JIRA.
 Here's a suggested rough outline of how I think this feature should run. The 
 feature I'm describing below is merge and deleteRegion for I see them as in 
 essence the same thing.
 (C) Client, (M) Master, RS (Region server), ZK (ZooKeeper)
 1. Client calls merge or deleteRegion API. API is a range of rows. (C)
 2. Master gets call. (M)
 3. Master obtains a write lock on table so it can't be disabled from under 
 us. The write lock will also disable splitting. This is one of the prereqs I 
 think. Its HBASE-5494 (Or we could just do something simpler where we have a 
 flag up in zk that splitRegion checks but thats less useful I think; OR we do 
 the dynamic configs issue and set splits to off via a config. change). 
 There'd be a timer for how long we wait on the table lock. (M - ZK)
 4. If we get the lock, write intent to merge a range up into zk. It also 
 hoists into zk if its a pure merge or a merge that drops the region data (a 
 deleteRegion call) (M)
 5. Return to the client either our failed attempt at locking the table or an 
 id of some sort used identifying this running operation; can use it querying 
 status. (M - C)
 6. Turn off balancer. TODO/prereq: Do it in a way that is persisted. Balancer 
 switch currently in memory only so if master crashes, new master will come up 
 in balancing mode # (If we had dynamic config. could hoist up to zk a config. 
 that disables the balancer rather than have a balancer-specific flag/znode OR 
 if a write lock outstanding on a table, then the balancer does not balance 
 regions in the locked table - this latter might be the easiest to do) (M)
 7. Write into zk that just turned off the balancer (If it was on) (M - ZK)
 8. Get regions that are involved in the span (M)
 9. Hoist the list up into zk. (M - ZK)
 10. Create region to span the range. (M)
 11. Write that we did this up into zk. (M - ZK)
 12. Close regions in parallel. Confirm close in parallel. (M - RS)
 13. Write up into zk regions closed (This might not be necessary since can 
 ask if region is open). (M - ZK)
 14. If a merge and not a delete region, move files under new region. Might 
 multithread this (moves should go pretty fast). If a deleteregion, we skip 
 this step. (M)
 15. On completion mark zk (though may not be necessary since its easy to look 
 in fs to see state of move). (M - ZK)
 16. Edit .META. (M)
 17. Confirm edits went in. (M)
 18. Move old regions to hbase trash folder TODO: There is no trash folder 
 under /hbase currently. We should add one. (M)
 19. Enable balancer (if it was off) (M)
 20. Unlock table (M)
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4890) fix possible NPE in HConnectionManager

2012-03-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221112#comment-13221112
 ] 

stack commented on HBASE-4890:
--

Any more luck w/ this one J-D (or you got distracted?)

 fix possible NPE in HConnectionManager
 --

 Key: HBASE-4890
 URL: https://issues.apache.org/jira/browse/HBASE-4890
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Priority: Blocker
 Fix For: 0.92.1


 I was running YCSB against a 0.92 branch and encountered this error message:
 {code}
 11/11/29 08:47:16 WARN client.HConnectionManager$HConnectionImplementation: 
 Failed all from 
 region=usertable,user3917479014967760871,1322555655231.f78d161e5724495a9723bcd972f97f41.,
  hostname=c0316.hal.cloudera.com, port=57020
 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
 java.lang.NullPointerException
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1501)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1353)
 at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:898)
 at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:775)
 at org.apache.hadoop.hbase.client.HTable.put(HTable.java:750)
 at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source)
 at com.yahoo.ycsb.DBWrapper.update(Unknown Source)
 at com.yahoo.ycsb.workloads.CoreWorkload.doTransactionUpdate(Unknown 
 Source)
 at com.yahoo.ycsb.workloads.CoreWorkload.doTransaction(Unknown Source)
 at com.yahoo.ycsb.ClientThread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1315)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1327)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1325)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158)
 at $Proxy4.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1330)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1328)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1309)
 ... 7 more
 {code}
 It looks like the NPE is caused by server being null in the MultiRespone 
 call() method.
 {code}
  public MultiResponse call() throws IOException {
  return getRegionServerWithoutRetries(
  new ServerCallableMultiResponse(connection, tableName, null) {
public MultiResponse call() throws IOException {
  return server.multi(multi);
}
@Override
public void connect(boolean reload) throws IOException {
  server =
connection.getHRegionConnection(loc.getHostname(), 
 loc.getPort());
}
  }
  );
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-03-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221157#comment-13221157
 ] 

stack commented on HBASE-5451:
--

Can we have hbase go all-pb for hbase 0.96.0?

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-03-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221159#comment-13221159
 ] 

stack commented on HBASE-5451:
--

Thats a dumb question.  Let me rephrase.  Won't hbase be all pb natively by 
0.96.0?

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-03-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221170#comment-13221170
 ] 

stack commented on HBASE-5451:
--

Client APIs should be the same but yeah, lets get up on pb before 0.96.0; a 
blocker as per DD.

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5504) Online Merge

2012-03-02 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221179#comment-13221179
 ] 

stack commented on HBASE-5504:
--

@Ted that takes me to a comment I made.  I still am without understanding.

 Online Merge
 

 Key: HBASE-5504
 URL: https://issues.apache.org/jira/browse/HBASE-5504
 Project: HBase
  Issue Type: Brainstorming
  Components: client, master, shell, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Fix For: 0.96.0


 As discussed, please refer the discussion at 
 [HBASE-4991|https://issues.apache.org/jira/browse/HBASE-4991]
 Design suggestion from Stack:
 {quote}
 I suggest a design below. It has some prerequisites, some general function 
 that this feature could use (and others). The prereqs if you think them good, 
 could be done outside of this JIRA.
 Here's a suggested rough outline of how I think this feature should run. The 
 feature I'm describing below is merge and deleteRegion for I see them as in 
 essence the same thing.
 (C) Client, (M) Master, RS (Region server), ZK (ZooKeeper)
 1. Client calls merge or deleteRegion API. API is a range of rows. (C)
 2. Master gets call. (M)
 3. Master obtains a write lock on table so it can't be disabled from under 
 us. The write lock will also disable splitting. This is one of the prereqs I 
 think. Its HBASE-5494 (Or we could just do something simpler where we have a 
 flag up in zk that splitRegion checks but thats less useful I think; OR we do 
 the dynamic configs issue and set splits to off via a config. change). 
 There'd be a timer for how long we wait on the table lock. (M - ZK)
 4. If we get the lock, write intent to merge a range up into zk. It also 
 hoists into zk if its a pure merge or a merge that drops the region data (a 
 deleteRegion call) (M)
 5. Return to the client either our failed attempt at locking the table or an 
 id of some sort used identifying this running operation; can use it querying 
 status. (M - C)
 6. Turn off balancer. TODO/prereq: Do it in a way that is persisted. Balancer 
 switch currently in memory only so if master crashes, new master will come up 
 in balancing mode # (If we had dynamic config. could hoist up to zk a config. 
 that disables the balancer rather than have a balancer-specific flag/znode OR 
 if a write lock outstanding on a table, then the balancer does not balance 
 regions in the locked table - this latter might be the easiest to do) (M)
 7. Write into zk that just turned off the balancer (If it was on) (M - ZK)
 8. Get regions that are involved in the span (M)
 9. Hoist the list up into zk. (M - ZK)
 10. Create region to span the range. (M)
 11. Write that we did this up into zk. (M - ZK)
 12. Close regions in parallel. Confirm close in parallel. (M - RS)
 13. Write up into zk regions closed (This might not be necessary since can 
 ask if region is open). (M - ZK)
 14. If a merge and not a delete region, move files under new region. Might 
 multithread this (moves should go pretty fast). If a deleteregion, we skip 
 this step. (M)
 15. On completion mark zk (though may not be necessary since its easy to look 
 in fs to see state of move). (M - ZK)
 16. Edit .META. (M)
 17. Confirm edits went in. (M)
 18. Move old regions to hbase trash folder TODO: There is no trash folder 
 under /hbase currently. We should add one. (M)
 19. Enable balancer (if it was off) (M)
 20. Unlock table (M)
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4890) fix possible NPE in HConnectionManager

2012-03-03 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221643#comment-13221643
 ] 

stack commented on HBASE-4890:
--

This NPE is a bit too easy to manufacture.  Should we hold up 0.92.1 till 
fixed?  Can work on it monday?

 fix possible NPE in HConnectionManager
 --

 Key: HBASE-4890
 URL: https://issues.apache.org/jira/browse/HBASE-4890
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Priority: Blocker
 Fix For: 0.92.1


 I was running YCSB against a 0.92 branch and encountered this error message:
 {code}
 11/11/29 08:47:16 WARN client.HConnectionManager$HConnectionImplementation: 
 Failed all from 
 region=usertable,user3917479014967760871,1322555655231.f78d161e5724495a9723bcd972f97f41.,
  hostname=c0316.hal.cloudera.com, port=57020
 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
 java.lang.NullPointerException
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at java.util.concurrent.FutureTask.get(FutureTask.java:83)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1501)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1353)
 at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:898)
 at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:775)
 at org.apache.hadoop.hbase.client.HTable.put(HTable.java:750)
 at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source)
 at com.yahoo.ycsb.DBWrapper.update(Unknown Source)
 at com.yahoo.ycsb.workloads.CoreWorkload.doTransactionUpdate(Unknown 
 Source)
 at com.yahoo.ycsb.workloads.CoreWorkload.doTransaction(Unknown Source)
 at com.yahoo.ycsb.ClientThread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1315)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1327)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1325)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158)
 at $Proxy4.multi(Unknown Source)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1330)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1328)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1309)
 ... 7 more
 {code}
 It looks like the NPE is caused by server being null in the MultiRespone 
 call() method.
 {code}
  public MultiResponse call() throws IOException {
  return getRegionServerWithoutRetries(
  new ServerCallableMultiResponse(connection, tableName, null) {
public MultiResponse call() throws IOException {
  return server.multi(multi);
}
@Override
public void connect(boolean reload) throws IOException {
  server =
connection.getHRegionConnection(loc.getHostname(), 
 loc.getPort());
}
  }
  );
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5371) Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API

2012-03-05 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222634#comment-13222634
 ] 

stack commented on HBASE-5371:
--

Please integrate into 0.92 Ted.  Thanks.

 Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) 
 API
 

 Key: HBASE-5371
 URL: https://issues.apache.org/jira/browse/HBASE-5371
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.94.0

 Attachments: HBASE-5371-addendum_v1.patch, HBASE-5371_v2.patch, 
 HBASE-5371_v3-noprefix.patch, HBASE-5371_v3.patch


 We need to introduce something like 
 AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
 that clients can check access rights before carrying out the operations. We 
 need this kind of operation for HCATALOG-245, which introduces authorization 
 providers for hbase over hcat. We cannot use getUserPermissions() since it 
 requires ADMIN permissions on the global/table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5358) HBaseObjectWritable should be able to serialize/deserialize generic arrays

2012-03-05 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222701#comment-13222701
 ] 

stack commented on HBASE-5358:
--

This will do.  +1

 HBaseObjectWritable should be able to serialize/deserialize generic arrays
 --

 Key: HBASE-5358
 URL: https://issues.apache.org/jira/browse/HBASE-5358
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors, io
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.94.0

 Attachments: 5358-92.txt, HBASE-5358_v3.patch


 HBaseObjectWritable can encode Writable[]'s but, but cannot encode A[] where 
 A extends Writable. This becomes an issue for example when adding a 
 coprocessor method which takes A[] (see HBASE-5352). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223350#comment-13223350
 ] 

stack commented on HBASE-5494:
--

@Ram Yes sir.  Thanks.  Resolved hbase-5373 as duplicate of this.

 Introduce a zk hosted table-wide read/write lock so only one table operation 
 at a time
 --

 Key: HBASE-5494
 URL: https://issues.apache.org/jira/browse/HBASE-5494
 Project: HBase
  Issue Type: Improvement
Reporter: stack

 I saw this facility over in the accumulo code base.
 Currently we just try to sort out the mess when splits come in during an 
 online schema edit; somehow we figure we can figure all possible region 
 transition combinations and make the right call.
 We could try and narrow the number of combinations by taking out a zk table 
 lock when doing table operations.
 For example, on split or merge, we could take a read-only lock meaning the 
 table can't be disabled while these are running.
 We could then take a write only lock if we want to ensure the table doesn't 
 change while disabling or enabling process is happening.
 Shouldn't be too hard to add.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223401#comment-13223401
 ] 

stack commented on HBASE-5531:
--

+1 on patch and +1 on commit to all of the branches cited above.  Thanks Ram.

 Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
 -

 Key: HBASE-5531
 URL: https://issues.apache.org/jira/browse/HBASE-5531
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.92.2
Reporter: Laxman
  Labels: build
 Fix For: 0.92.2, 0.96.0

 Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch


 Current profile is still pointing to 0.23.1-SNAPSHOT. 
 This is failing to build as 23.1 is already released and snapshot is not 
 available anymore.
 We can update this to 0.23.2-SNAPSHOT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5436) Right-size the map when reading attributes.

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223446#comment-13223446
 ] 

stack commented on HBASE-5436:
--

Please commit to both Lars.  Thanks.

 Right-size the map when reading attributes.
 ---

 Key: HBASE-5436
 URL: https://issues.apache.org/jira/browse/HBASE-5436
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Trivial
  Labels: performance
 Fix For: 0.94.0

 Attachments: 0001-Right-size-the-map-when-reading-attributes.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223493#comment-13223493
 ] 

stack commented on HBASE-5531:
--

I added him (See 'Administration' in JIRA.  You should have access.  Once in 
administration screens, look for people along the left.. the rest should be 
plain... bug me if you can't figure it).


 Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
 -

 Key: HBASE-5531
 URL: https://issues.apache.org/jira/browse/HBASE-5531
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.92.2
Reporter: Laxman
  Labels: build
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch


 Current profile is still pointing to 0.23.1-SNAPSHOT. 
 This is failing to build as 23.1 is already released and snapshot is not 
 available anymore.
 We can update this to 0.23.2-SNAPSHOT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5533) Add more metrics to HBase

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223523#comment-13223523
 ] 

stack commented on HBASE-5533:
--

Did you mean the below:

{code}
-# hbase.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext
-# hbase.period=10
-# hbase.fileName=/tmp/metrics_hbase.log
+hbase.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext
+hbase.period=10
+hbase.fileName=/tmp/metrics_hbase.log
{code}

Will there be a bunch of contention on these additions:

{code}
+  static volatile BlockingQueueLong fsReadLatenciesNanos = new 
ArrayBlockingQueueLong(LATENCY_BUFFER_SIZE);
{code}

Could this fill the logs with thousands of repeated messages:

{code}
+  if (!stored) {
+LOG.warn(Dropping fs latency stat since buffer is full);
+  }
{code}

Could we use the cliff click counters instead of AtomicLong?  They are on the 
classpath IIRC:

{code}
+  private final MapString, AtomicLong counts;
{code}

These additions would be great to have.

 Add more metrics to HBase
 -

 Key: HBASE-5533
 URL: https://issues.apache.org/jira/browse/HBASE-5533
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
 Attachments: hbase-5533-0.92.patch


 To debub/monitor production clusters, there are some more metrics I wish I 
 had available.
 In particular:
 - Although the average FS latencies are useful, a 'histogram' of recent 
 latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) 
 would be more useful
 - Similar histograms of latencies on common operations (GET, PUT, DELETE) 
 would be useful
 - Counting the number of accesses to each region to detect hotspotting
 - Exposing the current number of HLog files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223615#comment-13223615
 ] 

stack commented on HBASE-5399:
--

@LarsH I think its too radical a change in client behavior for 0.94.  If we 
target it for 0.96, it'll be a ripple only compared to rpc changes; it won't be 
noticed.

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 
 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 
 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 
 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v32.patch, 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223660#comment-13223660
 ] 

stack commented on HBASE-5399:
--

@nkeywal yes, agree, good to deprecate in 0.94 rather than 0.96 so more time to 
move off the old methods

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 
 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 
 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 
 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v32.patch, 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223662#comment-13223662
 ] 

stack commented on HBASE-5399:
--

... so it seems like there needs to be a separate patch for 0.94?

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 
 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 
 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 
 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v32.patch, 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223705#comment-13223705
 ] 

stack commented on HBASE-5074:
--

@Dhruba Try resubmitting your patch too.  We regularly see three of these mr 
tests fail.  Fixed in hadoop 1.0.2 apparently.

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, 
 D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, 
 D1521.11.patch, D1521.11.patch, D1521.12.patch, D1521.12.patch, 
 D1521.13.patch, D1521.13.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, 
 D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, 
 D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, 
 D1521.8.patch, D1521.9.patch, D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5534) HBase shell's return value is almost always 0

2012-03-06 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223887#comment-13223887
 ] 

stack commented on HBASE-5534:
--

You should be able to do:

{code}
hbase shell script.rb
{code}

... if thats of any help Alex (may still have the issue w/ exit code).

 HBase shell's return value is almost always 0
 -

 Key: HBASE-5534
 URL: https://issues.apache.org/jira/browse/HBASE-5534
 Project: HBase
  Issue Type: Improvement
Reporter: Alex Newman
Assignee: Alex Newman

 So I was trying to write some simple scripts to verify client connections to 
 HBase using the shell and I noticed that the HBase shell always returns 0 
 even when it can't connect to an HBase server. I'm not sure if this is the 
 best option. What would be neat is if you had some capability to run commands 
 like
 hbase shell --command='disable table;\ndrop table;' and it would error out if 
 any of the commands fail to succeed. echo disable table | hbase shell could 
 continue to work as it does now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-03-07 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225041#comment-13225041
 ] 

stack commented on HBASE-5480:
--

I don't have this class in 0.92 Andrew, so 0.92 should be fine (Thanks for 
flagging it)

 Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
 -

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.94.0

 Attachments: HBASE-5480.patch


 There are two issues:
 - StatusReporter has a new method getProgress()
 - Mapper and reducer context objects can no longer be directly instantiated.
 See attached patch. I'm not thrilled with the added reflection but it was the 
 minimally intrusive change.
 Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5540) Update apache jenkins to include 0.94 and builds against Hadoop 0.23

2012-03-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225322#comment-13225322
 ] 

stack commented on HBASE-5540:
--

Done.  I missed adding it to the hbase group.

 Update apache jenkins to include 0.94 and builds against Hadoop 0.23
 

 Key: HBASE-5540
 URL: https://issues.apache.org/jira/browse/HBASE-5540
 Project: HBase
  Issue Type: Task
  Components: build, test
Reporter: Jonathan Hsieh
  Labels: jenkins

 Currently there is no hbase 0.94 apache jenkins build and the trunk on hadoop 
 0.23 builds are disabled.   Ideally we should add the former and re-enable 
 the latter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

2012-03-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225566#comment-13225566
 ] 

stack commented on HBASE-5543:
--

Yeah, it looks like its inevitable that we'll ask the server to do legitimate 
stuff that will take longer than the rpctimeout yet the server is making 
headway: e.g. the reproducing test case, though a little artificial, for 
HBASE-4890  fix possible NPE in HConnectionManager was asking the 
regionserver to open 3k regions.

If its a task like the above, there should be a facility for telling client 
we're alive still or we should just refuse the request because it will take too 
long (The latter we need to do t -- from Benoiit.  If server is going to 
take too long servicing a request, so long the client will be gone by the time 
its done its work, then refuse the request... don't do the increment or update 
that the updating client will not be around to see).

 Add a keepalive option for IPC connections
 --

 Key: HBASE-5543
 URL: https://issues.apache.org/jira/browse/HBASE-5543
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors, ipc
Reporter: Andrew Purtell

 On the user list someone wrote in with a connection failure due to a long 
 running coprocessor:
 {quote}
 On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, 
 client version=0, methodsFingerPrint=0), rpc version=1, client version=29, 
 methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
 {quote}
 I suggested in response we might consider give our RPC a keepalive option for 
 calls that may run for a long time (like execCoprocessor).
 LarsH +1ed the idea:
 {quote}
 +1 on keepalive. It's a shame (especially for long running server code) to 
 do all the work, just to find out at the end that the client has given up.
 Or maybe there should be a way to cancel an operation if the clients decides 
 it does not want to wait any longer (PostgreSQL does that for example). Here 
 that would mean the server would need to check periodically and coprocessors 
 would need to be written to support that - so maybe that's no-starter.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-03-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225654#comment-13225654
 ] 

stack commented on HBASE-5074:
--

Wahoo!!

Lars, you want to pull it into 0.94? (Does this mean 0.94 is good to go?  
Should we put up an RC?)

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, 
 D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, 
 D1521.11.patch, D1521.11.patch, D1521.12.patch, D1521.12.patch, 
 D1521.13.patch, D1521.13.patch, D1521.14.patch, D1521.14.patch, 
 D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, 
 D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, 
 D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, 
 D1521.9.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections

2012-03-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225818#comment-13225818
 ] 

stack commented on HBASE-5543:
--

bq. Instead of adding to the rpc to make it keep alive longer, maybe be make it 
async, returning some sort of uuid token that the client can poll (or get 
notified) for progress instead?

I like this idea.

 Add a keepalive option for IPC connections
 --

 Key: HBASE-5543
 URL: https://issues.apache.org/jira/browse/HBASE-5543
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors, ipc
Reporter: Andrew Purtell

 On the user list someone wrote in with a connection failure due to a long 
 running coprocessor:
 {quote}
 On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote:
 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, 
 client version=0, methodsFingerPrint=0), rpc version=1, client version=29, 
 methodsFingerPrint=54742778 from 10.184.17.26:46472: output error
 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 7 on 60020 caught: java.nio.channels.ClosedChannelException
 {quote}
 I suggested in response we might consider give our RPC a keepalive option for 
 calls that may run for a long time (like execCoprocessor).
 LarsH +1ed the idea:
 {quote}
 +1 on keepalive. It's a shame (especially for long running server code) to 
 do all the work, just to find out at the end that the client has given up.
 Or maybe there should be a way to cancel an operation if the clients decides 
 it does not want to wait any longer (PostgreSQL does that for example). Here 
 that would mean the server would need to check periodically and coprocessors 
 would need to be written to support that - so maybe that's no-starter.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-03-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225821#comment-13225821
 ] 

stack commented on HBASE-5325:
--

This patch adds beans at org.apache.hbase, right?  HBase and RegionServer are 
bad names for mbeans.

 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, 
 HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5541) Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks

2012-03-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225882#comment-13225882
 ] 

stack commented on HBASE-5541:
--

Commit is fine.

What about case where the edits are applied, sync succeeds, but the cp 
postdelete and postput fail?  Then mvcc will have been updated but we have 
removed the edits from memstore.

i suppose its ok?  The read point is advanced but if we take this edit in 
isolation, no mutations made it through?

 Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks
 

 Key: HBASE-5541
 URL: https://issues.apache.org/jira/browse/HBASE-5541
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5541-v2.txt, 5541.txt


 Currently mutateRowsWithLocks holds the row lock while the HLog is sync'ed.
 Similar to what we do in doMiniBatchPut, we should create the log entry with 
 the lock held, but only sync the HLog after the lock is released, along with 
 rollback logic in case the sync'ing fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-03-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225892#comment-13225892
 ] 

stack commented on HBASE-5325:
--

HBase and RegionServer should be under org.apache.hbase too I'd think.

 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, 
 HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5541) Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks

2012-03-08 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225900#comment-13225900
 ] 

stack commented on HBASE-5541:
--

Grand

 Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks
 

 Key: HBASE-5541
 URL: https://issues.apache.org/jira/browse/HBASE-5541
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5541-v2.txt, 5541.txt


 Currently mutateRowsWithLocks holds the row lock while the HLog is sync'ed.
 Similar to what we do in doMiniBatchPut, we should create the log entry with 
 the lock held, but only sync the HLog after the lock is released, along with 
 rollback logic in case the sync'ing fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5552) Clean up our jmx view; its a bit of a mess

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226172#comment-13226172
 ] 

stack commented on HBASE-5552:
--

Let me fix formatting:

{code}
hadoop
  HBase
  Master
  RegionServer
{code}

It should be...

{code}
hadoop
  hbase
master
regionserver
{code}

 Clean up our jmx view; its a bit of a mess
 --

 Key: HBASE-5552
 URL: https://issues.apache.org/jira/browse/HBASE-5552
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Blocker
 Attachments: 0.92.0jmx.png


 Fix before we release 0.92.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5552) Clean up our jmx view; its a bit of a mess

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226182#comment-13226182
 ] 

stack commented on HBASE-5552:
--

As is, our naming is just broke.  You can't have instances of different 
clusters on one machine.  Our master bean is not uniquely named so you can't 
have multiple masters on the one machine ditto regionservers (The rpc 
servers publish their own bean distingushed by the port they run on which is 
better only should probably have master or regionserver prefix).

The name of our master bean is 'MasterStatistics' though its metrics only (even 
the operation is a reset on metrics).

Doing minimum so can get 0.92.1 in this issue but this stuff needs a revamp.

 Clean up our jmx view; its a bit of a mess
 --

 Key: HBASE-5552
 URL: https://issues.apache.org/jira/browse/HBASE-5552
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Blocker
 Attachments: 0.92.0jmx.png


 Fix before we release 0.92.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226200#comment-13226200
 ] 

stack commented on HBASE-5325:
--

I took a look.  Its broke (I'm to blame I'd say for it being broke).  Did basic 
fixup and committed in hbase-5552.  Made new issue to revisit our jmx view.  
Its always been broke.

 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, 
 HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5533) Add more metrics to HBase

2012-03-09 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226238#comment-13226238
]

stack commented on HBASE-5533:
--

Line lengths are usually 80 chars in our src code Shaneal.

Minor, assign and declare on the one line the below rather than wait till
constructor?

{code}
+this.counts = new MapMaker().makeComputingMap(new FunctionString,
Counter() {
+ @Override
+ public Counter apply(String input) {
+return new Counter();
+ }
+});
{code}

This looks cool... ExponentiallyDecayingSample

Snapshot looks excellent.

UniformSample looks sweet

On this...

{code}
+final long startTime = System.nanoTime();
{code}

... aren't we getting a currentTimeMillis soon after? Do we want to be doing
all this time getting? We should minimize as much as we can? Check it out I'd
say. Also, there is EdgeEnvironment or EnvironmentEdge that we use for system
things like getting time because we want to have a layer between us and system
especially when testing ... so we can mess things up. You might want to check
it out.

Tests look good.

How does this stuff look for a server under load? Will say tsdb be able to
make sense of it? Hows it look in the ui? I suppose I could just apply the
patch and try it but you might have some pictures laying around.

How did we make it this far w/o this stuff?

Add more metrics to HBase
-

Key: HBASE-5533
URL: https://issues.apache.org/jira/browse/HBASE-5533
Project: HBase
Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
Attachments: BlockingQueueContention.java, hbase-5533-0.92.patch,
hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch

To debub/monitor production clusters, there are some more metrics I wish I
had available.
In particular:
- Although the average FS latencies are useful, a 'histogram' of recent
latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc)
would be more useful
- Similar histograms of latencies on common operations (GET, PUT, DELETE)
would be useful
- Counting the number of accesses to each region to detect hotspotting
- Exposing the current number of HLog files

[jira] [Commented] (HBASE-5552) Clean up our jmx view; its a bit of a mess

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226296#comment-13226296
 ] 

stack commented on HBASE-5552:
--

Its not incompatible change Todd because we've not had a release w/ these new 
mbeans yet.

On the rename, yeah, over in 'HBASE-5553 Revisit our jmx view', it suggests 
we'd have to wait till the singularity.

 Clean up our jmx view; its a bit of a mess
 --

 Key: HBASE-5552
 URL: https://issues.apache.org/jira/browse/HBASE-5552
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.92.1, 0.94.0

 Attachments: 0.92.0jmx.png, 5552.txt, currentjmxview.png, 
 patchedjmxview.png


 Fix before we release 0.92.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5552) Clean up our jmx view; its a bit of a mess

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226341#comment-13226341
 ] 

stack commented on HBASE-5552:
--

bq. Oh, does this not affect the other metrics published at /jmx in the 
servlet? Sorry for confusion.

It should, but it doesn't (smile).  Its just a renaming of the beans added by 
HBASE-5325 (0.92.1 is their first airing in a release).

I should have been more clear.

 Clean up our jmx view; its a bit of a mess
 --

 Key: HBASE-5552
 URL: https://issues.apache.org/jira/browse/HBASE-5552
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.92.1, 0.94.0

 Attachments: 0.92.0jmx.png, 5552.txt, currentjmxview.png, 
 patchedjmxview.png


 Fix before we release 0.92.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4542) add filter info to slow query logging

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226356#comment-13226356
 ] 

stack commented on HBASE-4542:
--

+1 on commit (Thanks for taking care of the this Mikhail)

 add filter info to slow query logging
 -

 Key: HBASE-4542
 URL: https://issues.apache.org/jira/browse/HBASE-4542
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: 
 0001-jira-HBASE-4542-Add-filter-info-to-slow-query-loggin.patch, 
 Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch, 
 D1263.2.patch, D1539.1.patch


 Slow query log doesn't report filters in effect.
 For example:
 {code}
 (operationTooSlow): \
 {processingtimems:3468,client:10.138.43.206:40035,timeRange: 
 [0,9223372036854775807],\
 starttimems:1317772005821,responsesize:42411, \
 class:HRegionServer,table:myTable,families:{CF1:ALL]},\
 row:6c3b8efa132f0219b7621ed1e5c8c70b,queuetimems:0,\
 method:get,totalColumns:1,maxVersions:1,storeLimit:-1}
 {code}
 the above would suggest that all columns of myTable:CF1 are being requested 
 for the given row. But in reality there could be filters in effect (such as 
 ColumnPrefixFilter, ColumnRangeFilter, TimestampsFilter() etc.). We should 
 enhance the slow query log to capture  report this information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226379#comment-13226379
 ] 

stack commented on HBASE-5542:
--

I took a look.  Patch looks good.  I think this kinda work is really important 
to do -- now while its fresh in everyone's minds whats going on here -- if 
we're not to let the mess get out of hand.  Thanks Scott.

Yeah Lars, this is your old stomping ground.  Could do w/ a review by you.

 Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
 

 Key: HBASE-5542
 URL: https://issues.apache.org/jira/browse/HBASE-5542
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.2.patch


 mutateRowsWithLocks() does atomic mutations on multiple rows.
 processRow() does atomic read-modify-writes on a single row.
 It will be useful to generalize both and have a
 processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
 This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5213) hbase master stop does not bring down backup masters

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226385#comment-13226385
 ] 

stack commented on HBASE-5213:
--

Thanks G.

 hbase master stop does not bring down backup masters
 --

 Key: HBASE-5213
 URL: https://issues.apache.org/jira/browse/HBASE-5213
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.92.2

 Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
 HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch


 Typing hbase master stop produces the following message:
 stop   Start cluster shutdown; Master signals RegionServer shutdown
 It seems like backup masters should be considered part of the cluster, but 
 they are not brought down by hbase master stop.
 stop-hbase.sh does correctly bring down the backup masters.
 The same behavior is observed when a client app makes use of the client API 
 HBaseAdmin.shutdown() 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
  -- this isn't too surprising since I think hbase master stop just calls 
 this API.
 It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5520) Support reseek() at RegionScanner

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226414#comment-13226414
 ] 

stack commented on HBASE-5520:
--

{code}
+  if (kv == null) {$
+throw new IllegalArgumentException(Row cannot be null.);$
+  }$
{code}

Should be kv cannot be null?

I don't understand what this means:

{code}
The kv that is required to seek must be given$
+   * explicitly for reseek. Should not be used to seek to a key which may come$
+   * before the current position.$
+   * Note : Recommended to use knowing how seek can be done on different kvs.$
+   * Suggested to seek to row boundaries like start of a row or end of a row.$
+   * Seeking to the middle of a row may lead to inconsistencies across stores.$
{code}

... its probably because I'm slow but I'm sure there will be slow fellows 
following behind me.  Is it saying you need to create a KV to do this?

How would I know what the next row is in advance?  I suppose with the mvcc, you 
have some hope of isolating this row... but  if we are going for row 
boundaries, this patch looks to be missing some heft.

 Support reseek() at RegionScanner
 -

 Key: HBASE-5520
 URL: https://issues.apache.org/jira/browse/HBASE-5520
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch


 reseek() is not supported currently at the RegionScanner level. We can 
 support the same.
 This is created following the discussion under HBASE-2038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226477#comment-13226477
 ] 

stack commented on HBASE-4608:
--

The TestLRUDictionary test looks like it could be fatter.  Looks like you 
should be able to throw at it a bunch more combinations.  And better 
excercising of new BidirectionalLRUMap  type.  Better to find the issues here 
in unit test than

Whats the difference between

{code}
+  public static int hashBytes(byte[] bytes, int offset, int length) {
{code}

and the existing

{code}
  public static int hashCode(final byte [] b, final int length) {
{code}

They look to do the same thing?  We should remove the new one if so.

We will have a keycontext when we are deserializing?  Hows that work?

So we compress at the individual entry level?  Why not file at a time? (Sorry 
if this has been explained earlier)

Is this right in the WALReader?

{code}
+compression = conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false);
{code}

How does that work if the WAL was written compressed but this flag is false?  
We break?  Shouldn't this instead be keyed off the entries themselves?  Should 
it be a sequence file attribute saying this a compressed file?

Do we foresee replication being able to use this facility?  Seems like a 
natural having it ship compressed entries.

Good stuff.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 
 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 
 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226522#comment-13226522
 ] 

stack commented on HBASE-5399:
--

That looks ok N?  Will I commit?

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-09 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226532#comment-13226532
]

stack commented on HBASE-4608:
--

bq. The above method allows to start computation at specified offset while
existing hashCode() doesn't have this parameter.

Should have at least the same name as the other two methods that do same (pity
WritableComparator.hashBytes w/ start offset doesn't exist).

bq. Looking at SequenceFile.Sorter.cloneFileAttributes(), I don't see a
convenient way for doing above.

When you create a write on a sequencefile, you can pass metadata:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Metadata.html

bq. For HLogKey, can we designate version of -2 for representing compressed
HLogKey ? If HLogKey isn't compressed, we write -1.

I don't know what this is in response to.

What about my other items?

HLog Compression

Key: HBASE-4608
URL: https://issues.apache.org/jira/browse/HBASE-4608
Project: HBase
Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
Fix For: 0.94.0

Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt,
4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt,
4608v6.txt, 4608v7.txt, 4608v8fixed.txt

The current bottleneck to HBase write speed is replicating the WAL appends
across different datanodes. We can speed up this process by compressing the
HLog. Current plan involves using a dictionary to compress table name, region
id, cf name, and possibly other bits of repeated data. Also, HLog format may
be changed in other ways to produce a smaller HLog.

[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226552#comment-13226552
 ] 

stack commented on HBASE-5548:
--

I don't get this comment:

{code}
+#define the command name in 'where' namespace
+#which actually just delegates to the shell instance
{code}

Instead of
{code}
+ret = translate_hbase_exceptions(*args) { command(*args) }
+return ret
{code}

.. why not just
{code}
+return translate_hbase_exceptions(*args) { command(*args) }
{code}

Don't need this anymore

+# Copyright 2010 The Apache Software Foundation


I don't see table.help.  Is it missing from this patch?

Can you include snippet of a session using this new facility in shell?

Good on you Jesse

 Add ability to get a table in the shell
 ---

 Key: HBASE-5548
 URL: https://issues.apache.org/jira/browse/HBASE-5548
 Project: HBase
  Issue Type: Improvement
  Components: shell
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: ruby_HBASE-5528-v0.patch


 Currently, all the commands that operate on a table in the shell first have 
 to take the table as name as input. 
 There are two main considerations:
 * It is annoying to have to write the table name every time, when you should 
 just be able to get a reference to a table
 * the current implementation is very wasteful - it creates a new HTable for 
 each call (but reuses the connection since it uses the same configuration)
 We should be able to get a handle to a single HTable and then operate on that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226569#comment-13226569
 ] 

stack commented on HBASE-5480:
--

bq. But this should go into 0.94, no?

Yes.

 Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
 -

 Key: HBASE-5480
 URL: https://issues.apache.org/jira/browse/HBASE-5480
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Critical
 Fix For: 0.94.0

 Attachments: HBASE-5480.patch


 There are two issues:
 - StatusReporter has a new method getProgress()
 - Mapper and reducer context objects can no longer be directly instantiated.
 See attached patch. I'm not thrilled with the added reflection but it was the 
 minimally intrusive change.
 Raised the priority to critical because compilation fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5555) add a pointer to a dns verification utility in hbase book/dns

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226616#comment-13226616
 ] 

stack commented on HBASE-:
--

It should come in as a utilty so you could do ./bin/hbase dnscheck or 
something.  Also, I don't think we need reverse dns to work in 0.92+ so we 
wouldn't want this tool screaming reverse is needed when app doesn't require it.

 add a pointer to a dns verification utility in hbase book/dns
 -

 Key: HBASE-
 URL: https://issues.apache.org/jira/browse/HBASE-
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Sujee Maniyam
Assignee: Sujee Maniyam
Priority: Minor
 Fix For: 0.96.0

 Attachments: .txt


 DNS should work correctly in a Hbase cluster.  I have a simple DNS checker 
 utility, that verifies DNS on all machines of the cluster. 
 https://github.com/sujee/hadoop-dns-checker
 add a pointer to the tool in hbase book : 
 http://hbase.apache.org/book.html#dns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5555) add a pointer to a dns verification utility in hbase book/dns

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226613#comment-13226613
 ] 

stack commented on HBASE-:
--

It should come in as a utilty so you could do ./bin/hbase dnscheck or 
something.  Also, I don't think we need reverse dns to work in 0.92+ so we 
wouldn't want this tool screaming reverse is needed when app doesn't require it.

 add a pointer to a dns verification utility in hbase book/dns
 -

 Key: HBASE-
 URL: https://issues.apache.org/jira/browse/HBASE-
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Sujee Maniyam
Assignee: Sujee Maniyam
Priority: Minor
 Fix For: 0.96.0

 Attachments: .txt


 DNS should work correctly in a Hbase cluster.  I have a simple DNS checker 
 utility, that verifies DNS on all machines of the cluster. 
 https://github.com/sujee/hadoop-dns-checker
 add a pointer to the tool in hbase book : 
 http://hbase.apache.org/book.html#dns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226617#comment-13226617
 ] 

stack commented on HBASE-5509:
--

I added some.

 MR based copier for copying HFiles (trunk version)
 --

 Key: HBASE-5509
 URL: https://issues.apache.org/jira/browse/HBASE-5509
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Lars Hofhansl
 Fix For: 0.94.0, 0.96.0

 Attachments: 5509-v2.txt, 5509.txt


 This copier is a modification of the distcp tool in HDFS. It does the 
 following:
 1. List out all the regions in the HBase cluster for the required table
 2. Write the above out to a file
 3. Each mapper 
3.1 lists all the HFiles for a given region by querying the regionserver
3.2 copies all the HFiles
3.3 outputs success if the copy succeeded, failure otherwise. Failed 
 regions are retried in another loop
 4. Mappers are placed on nodes which have maximum locality for a given region 
 to speed up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5539) asynchbase PerformanceEvaluation

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226620#comment-13226620
 ] 

stack commented on HBASE-5539:
--

So in below

{code}
+@Override
+void testRow(final int i) throws IOException {
+  final GetRequest get = new GetRequest(TABLE_NAME, 
getRandomRow(this.rand, this.totalRows));
+  get.family(FAMILY_NAME).qualifier(QUALIFIER_NAME);
+
+  client().get(get).addCallback(readCallback).addErrback(errback);
+}
{code}

...we don't complete the callback (or error callback) are called?

 asynchbase PerformanceEvaluation
 

 Key: HBASE-5539
 URL: https://issues.apache.org/jira/browse/HBASE-5539
 Project: HBase
  Issue Type: New Feature
  Components: performance
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Minor
  Labels: benchmark
 Attachments: 0001-asynchbase-PerformanceEvaluation.patch


 I plugged [asynchbase|https://github.com/stumbleupon/asynchbase] into 
 {{PerformanceEvaluation}}.  This enables testing asynchbase from 
 {{PerformanceEvaluation}} and comparing its performance to {{HTable}}.  Also 
 asynchbase doesn't come with any benchmark, so it was good that I was able to 
 plug it into {{PerformanceEvaluation}} relatively easily.
 I am in the processing of collecting results on a dev cluster running 0.92.1 
 and will publish them once they're ready.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226631#comment-13226631
 ] 

stack commented on HBASE-5292:
--

+1 on commit.

 getsize per-CF metric incorrectly counts compaction related reads as well 
 --

 Key: HBASE-5292
 URL: https://issues.apache.org/jira/browse/HBASE-5292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
 Attachments: 
 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, 
 D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch, 
 jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch


 The per-CF getsize metric's intent was to track bytes returned (to HBase 
 clients) per-CF. [Note: We already have metrics to track # of HFileBlock's 
 read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt 
 vs. fsblockreadcnt.]
 Currently, the getsize metric gets updated for both client initiated 
 Get/Scan operations as well for compaction related reads. The metric is 
 updated in StoreScanner.java:next() when the Scan query matcher returns an 
 INCLUDE* code via a:
  HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength());
 We should not do the above in case of compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5528) Change retrying splitting log forever if throws IOException to numbered times, and abort master when retries exhausted

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226637#comment-13226637
 ] 

stack commented on HBASE-5528:
--

This is for 0.90?

 Change retrying splitting log forever  if throws IOException to numbered 
 times, and abort master when retries exhausted
 ---

 Key: HBASE-5528
 URL: https://issues.apache.org/jira/browse/HBASE-5528
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5528.patch, hbase-5528v2.patch, hbase-5528v3.patch


 In current log-splitting retry logic, it will retry forever if throws 
 IOException, I think we'd better change it to numbered times, and abort 
 master when retries exhausted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226643#comment-13226643
 ] 

stack commented on HBASE-5209:
--

Sorry.  I missed that it was resolved.

 HConnection/HMasterInterface should allow for way to get hostname of 
 currently active master in multi-master HBase setup
 

 Key: HBASE-5209
 URL: https://issues.apache.org/jira/browse/HBASE-5209
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.5, 0.92.0, 0.94.0
Reporter: Aditya Acharya
Assignee: David S. Wang
 Fix For: 0.92.1, 0.94.0

 Attachments: 5209.addendum, HBASE_5209_v5.diff


 I have a multi-master HBase set up, and I'm trying to programmatically 
 determine which of the masters is currently active. But the API does not 
 allow me to do this. There is a getMaster() method in the HConnection class, 
 but it returns an HMasterInterface, whose methods do not allow me to find out 
 which master won the last race. The API should have a 
 getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5525) Truncate and preserve region boundaries option

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226649#comment-13226649
 ] 

stack commented on HBASE-5525:
--

Disable table, remove all content from fs under the table's dir, then reenable?

 Truncate and preserve region boundaries option
 --

 Key: HBASE-5525
 URL: https://issues.apache.org/jira/browse/HBASE-5525
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0


 A tool that would be useful for testing (and maybe in prod too) would be a 
 truncate option to keep the current region boundaries. Right now what you 
 have to do is completely kill the table and recreate it with the correct 
 regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226647#comment-13226647
 ] 

stack commented on HBASE-4608:
--

Does v21 fix the bad decompress that you found above testing with PE?

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 
 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 
 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2817) Allow separate HBASE_REGIONSERVER_HEAPSIZE and HBASE_MASTER_HEAPSIZE

2012-03-09 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226653#comment-13226653
 ] 

stack commented on HBASE-2817:
--

@Adrian So, IIRC, doesn't the first -Xmx win so if you pass it in as an *_OPTS, 
will that give you want you need?  Else, want to make a patch w/ what you want 
in it?  Thanks.

 Allow separate HBASE_REGIONSERVER_HEAPSIZE and HBASE_MASTER_HEAPSIZE
 

 Key: HBASE-2817
 URL: https://issues.apache.org/jira/browse/HBASE-2817
 Project: HBase
  Issue Type: Improvement
  Components: scripts
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Minor

 Right now we have a single HBASE_HEAPSIZE configuration. This isn't that 
 great, since the HMaster doesn't really need much ram compared to the region 
 servers. We should allow different java options and heapsize for the 
 different daemon types.
 Probably worth breaking out THRIFT, REST, AVRO, etc, as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1551 matches

Mail list logo