[jira] [Commented] (HBASE-5489) Add HTable accessor to get regions for a key range
[ https://issues.apache.org/jira/browse/HBASE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219318#comment-13219318 ] stack commented on HBASE-5489: -- +1 Add HTable accessor to get regions for a key range -- Key: HBASE-5489 URL: https://issues.apache.org/jira/browse/HBASE-5489 Project: HBase Issue Type: Improvement Components: client Reporter: David S. Wang Assignee: David S. Wang Priority: Minor It would be nice to have an accessor to find all regions that overlap with a particular range of keys. Right now, the only way to accomplish that is to call HTable.getStartEndKeys(), then follow that with calls to getRegionLocation() for the range of keys you are interested in. This algorithm has 2 drawbacks: * It returns more keys than is necessary most of the time. This is especially evident if there are a lot of regions comprising the table and the range of keys is small. * It always does a scan of .META. via MetaScannerVisitor for at least HTable.getStartEndKeys(), and perhaps for HRegionLocations that are not already cached by the client. An accessor that limited its scans to a specified range could avoid scanning .META. at all if the HRegionLocations being fetched were already cached by the client, thereby potentially making this operation faster in common cases. Here's a proposal for the accessor: /** * Get the corresponding regions for an arbitrary range of keys. * p * @param startRow Starting row in range, inclusive * @param endRow Ending row in range, inclusive * @return A list of HRegionLocations corresponding to the regions that * contain the specified range * @throws IOException if a remote or network exception occurs */ public ListHRegionLocation getRegionsInRange(final byte [] startKey, final byte [] endKey) throws IOException -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5492) Caching StartKeys and EndKeys of Regions
[ https://issues.apache.org/jira/browse/HBASE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219333#comment-13219333 ] stack commented on HBASE-5492: -- Once filled, how does the cache of table locations get refreshed if a table region splits? Caching StartKeys and EndKeys of Regions Key: HBASE-5492 URL: https://issues.apache.org/jira/browse/HBASE-5492 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.92.0 Environment: all Reporter: honghua zhu Fix For: 0.92.1 Attachments: HBASE-5492.patch Each call for HTable.getStartEndKeys will read meta table. In particular, in the case of client side multi-threaded concurrency statistics, we must call HTable.coprocessorExec== getStartKeysInRange == getStartEndKeys, resulting in the need to always scan the meta table. This is not necessary, we can implement the HConnectionManager.HConnectionImplementation.locateRegions(byte[] tableName) method, then, get the StartKeys and EndKeys from the cachedRegionLocations of HConnectionImplementation. Combined with https://issues.apache.org/jira/browse/HBASE-5491, can improve the performance of statistical -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219349#comment-13219349 ] stack commented on HBASE-5270: -- @Prakash Presume we pass list from splitLogAfterStartup to joinClusters as you suggest and presume list of servers included the server that had been hosting .META. Allow that during or just after splitLogAfterStartup, .META. server 'crashes' -- it becomes unresponsive. Also allow that somehow, just before it hung up, during a long running log split, .META. took on a couple of edits saying regions A, B, and C had split. In assignRootAndMeta, we'll notice the unresponsiveness, force the expiration of the server that was carrying .META. (this will queue a ServerShutdownHandler but will not wait on its completion), and we'll then reassign of .META. Its very likely that .META. will go to one of the other 'good' servers. Its also likely that the SSH will not have completed its processing before this assign happens. Thus, on deploy, the .META. will be missing the above A, B, and C split edits. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219354#comment-13219354 ] stack commented on HBASE-5270: -- @Chunhui You have a new method splitLogIfOnline which will split the log if the server was online. Why do you not expire the server? (You remove the expireIfOnline method). Now we have this initializing state, do you think we should also stop the processing of expired servers during this startup phase and instead queue them up for processing after the master is up? Could do that in another issue maybe since this issue has been going on too long and your patch is at least an improvement on what we currently have (This startup sequence needs a big refactor IMO -- it is way too complicated figuring the sequence in which stuff runs). Are there still holes? For example, say the .META. server crashes AFTER we've verified it up in assignRootAndMeta but before we get to do a scan of .META. to rebuild user regions list. Could .META. be assigned w/o log splitting finishing? (I don't think so... .META. would be offline until the servershutdown handler ran and it would first split logs). Good stuff. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219355#comment-13219355 ] stack commented on HBASE-4991: -- @Mubarak After looking at FATE and whats involved, I think it a bit much to expect that we build that as a prereq. for this facility. At the same time, lets minimize custom code -- code that is particular to the addition of this feature only. Let me do another review of your last patch w/ that in mind. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219380#comment-13219380 ] stack commented on HBASE-4991: -- Reviewing this patch again, could we not obtain this patches's objective with merge? Merge could take a flag which said True/false copy the data from old regions into the new merge region Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219412#comment-13219412 ] stack commented on HBASE-4991: -- After looking again too at the patch, it has too much custom code that is all about region delete. It should take a range as Todd suggests earlier rather than list of regions. This means you can not pass a list of discontinuous regions but thats ok I think; just do multiple invocations. This seems to have wrong param name and javadoc: {code} + /** + * Gets the count of online regions of the table in a region server. + * This method looks at the in-memory onlineRegions. + * @param regionName + * @return int regions count + * @throws IOException + */ + public int getRegionsCount(byte[] regionName) throws IOException; {code} When I see MasterDeleteRegionTracker, and the equivalent for regionservers, it makes me yearn for a generic framework that these things could run on; it strikes me as too much custom code and custom handlers. This we should fix. We should come up w/ generics that can be customized to do feature specifics. Why are we using a janitor to do the delete of regions rather than an executor? Why we have this getDeleteRegionTracker ? The generic soln Interface would have a method the balancer would check... + if (deleteRegionTracker.isDeleteRegionInProgress()) { Rather than do the above for every feature we add. Should this getDeleteRegionStatusFromDeleteRegionTracker be in the DeleteRegionTracker? And should it be something that is apart from the Master rather than in the master? This seems wrong: getDeleteRegionTracker in the MasterServices Interface. Why we add it there? Why can't it be independent of Master? Having to have a Master makes it harder to test I'm sure. DeleteRegionHandler should not be dealing w/ balancer. That seems dirty. This seems racy: waitForInflightSplit Do we do this every time? +moveStoreFilesToNewRegionDir(byFamily, fs, tableDir, newRegionInfo); If so, is this actually a merge and not a delete? Do these methods need to be in HREgionInfo? moveDataFromAdjacentRegionToNewRegion createNewRegionFromAdjacentRegion Could be in HRegion or in a RegionUtil class? RS is already bloated. A bunch of these other methods ... adding new region and deleting old region ... would seem to have overlap with existing code where we add regions to meta after open and also w/ merge code? We can't have master package refernced in zookeeper package; i.e. see MasterDeleteRegionTracker. I've already commented on other stuff in this patch. In general the patch is well done. It just adds a bunch of custom facility w/o genericizing at least some aspects so could be used by other features yet to come. In particular, this looks to be a specialization on merge. If so, lets go for merge altogether. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219421#comment-13219421 ] stack commented on HBASE-4991: -- bq. Are we going to enhance Merge by allowing to discard data belonging to one of the regions ? This feature looks to be adding online merge to me. I need clarification from Mubarak that that is indeed the case. If so, this issue is mislabeled and the patch needs redoing. I was just suggesting that if you want to actually drop a regions data, you could pass a flag to merge and it would not bother copying over the files from old regions. That would be an option. This patch as is does not do that. It seems to copy old region data into new regions. Was just a suggestion. bq. How should we deal with various failure scenarios in the process of merging ? Eh... in a manner which is resilient against failures, TBD. I don't get your question Ted. Are you asking me or the Author of this patch? Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219428#comment-13219428 ] stack commented on HBASE-5270: -- @Prakash bq. then the next step, that you outlined in your scenario, cannot be allowed How should we do this boss? bq. The problem you are outlining is probably still there but the scenario has to be refined. What should I add? If we allow that the split could take a long time, its possible that on entry to the log splitting the server was good but by the end it could have gone AWOL. bq. And then it should initialize everything based on that knowledge which must not change during initialization. I think the root issue is that it needs to scan .META. and -ROOT- as part of the startup; they need to be assigned and up w/ all edits in place. Thats whats proving to be a little tough to ensure. (Thanks for the review P). Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219511#comment-13219511 ] stack commented on HBASE-4991: -- bq. Online merge requires table needs to be disabled but deleteRegion (deleteRange) does not require table needs to be disabled. We've had ongoing conversation -- before your time so you are not expected to have known about it -- on our doing an online merge. Its actually pretty critical need. See HBASE-1621 for some history (Ignore its title where it says table should be offline -- it should be online). FYI, the current merge code is broke and unused. It works for a unit test but I'd say its years since anyone tried to use it to actually do anything useful. So, we are agreed that conceptually, whats going on here is region merging? If so, that helps understanding around whats going on here. We should also likely rename what this issue does to be about merging since thats how we've been describing this operation over the years. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5491) Remove HBaseConfiguration.create() call from coprocessor.Exec class
[ https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219527#comment-13219527 ] stack commented on HBASE-5491: -- @Honghua My suggestion would make your patch smaller. Remove HBaseConfiguration.create() call from coprocessor.Exec class --- Key: HBASE-5491 URL: https://issues.apache.org/jira/browse/HBASE-5491 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.92.0 Environment: all Reporter: honghua zhu Fix For: 0.92.1 Attachments: HBASE-5491.patch Exec class has a field: private Configuration conf = HBaseConfiguration.create() Client side generates an Exec instance of the class, each initiated Statistics request by ExecRPCInvoker Is so HBaseConfiguration.create for each request needs to call When the server side deserialize the Exec Called once HBaseConfiguration.create in, HBaseConfiguration.create is a time consuming operation. private Configuration conf = HBaseConfiguration.create(); This code is only useful for testing code (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization), other places with the Exec class, pass a Configuration come, so no need to conf field a default value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219638#comment-13219638 ] stack commented on HBASE-4991: -- @Lars bq. This is not the same as merge, right? Sounds like it is. bq. The region's data will be gone This patch seems to copy the data from the deleted region up into the new hole-plugging region. It doesn't seem to delete it. As to your 1., 2., 3... yes, thats what this patch does only the operators and the classes are all named DeleteRegion* blah when what is happening is region merging. I think its important to get the concept right else its going to confuse for ever after. @Mubarak So, sounds like the command/api could also be named merge rather than deleteRegion (You are not actually deleting the data, is that right?)? Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219678#comment-13219678 ] stack commented on HBASE-4991: -- Implementation-wise, this is a merge with the added option that we not copy the data of the regions we are merging. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5491) Remove HBaseConfiguration.create() call from coprocessor.Exec class
[ https://issues.apache.org/jira/browse/HBASE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219794#comment-13219794 ] stack commented on HBASE-5491: -- +1 Remove HBaseConfiguration.create() call from coprocessor.Exec class --- Key: HBASE-5491 URL: https://issues.apache.org/jira/browse/HBASE-5491 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.92.0 Environment: all Reporter: honghua zhu Fix For: 0.92.1 Attachments: HBASE-5491-2.patch, HBASE-5491.patch Exec class has a field: private Configuration conf = HBaseConfiguration.create() Client side generates an Exec instance of the class, each initiated Statistics request by ExecRPCInvoker Is so HBaseConfiguration.create for each request needs to call When the server side deserialize the Exec Called once HBaseConfiguration.create in, HBaseConfiguration.create is a time consuming operation. private Configuration conf = HBaseConfiguration.create(); This code is only useful for testing code (org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint.testExecDeserialization), other places with the Exec class, pass a Configuration come, so no need to conf field a default value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219805#comment-13219805 ] stack commented on HBASE-4991: -- bq. Well, client's deleteRegion call is asynchronous so no fail-over if client has to do the business. Fair enough. I was suggesting doing it as a client script because then it'd be outside of the servers and easier to test. If client dies, restart it, it looks in zk for work to do and carries on from where the last client was. But no biggie. What about my question about why we delegate merge/delete out to the regionservers? Why not have them do nothing but the close and then have the master do the remove or merging of fs content and fixup in meta? Would that be less moving parts? Let me give some higher level feedback in a sec. @Jieshan Yes that'll work. How you do it? You have a patch? Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5454) Refuse operations from Admin before master is initialized
[ https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219808#comment-13219808 ] stack commented on HBASE-5454: -- I'm +1 on commit for this. We can figure something similar for handling of zk callbacks in another issue. Refuse operations from Admin before master is initialized - Key: HBASE-5454 URL: https://issues.apache.org/jira/browse/HBASE-5454 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: hbase-5454.patch, hbase-5454v2.patch In our testing environment, When master is initializing, we found conflict problems between master#assignAllUserRegions and EnableTable event, causing assigning region throw exception so that master abort itself. We think we'd better refuse operations from Admin, such as CreateTable, EnableTable,etc, It could reduce error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219839#comment-13219839 ] stack commented on HBASE-4991: -- @Mubarak I see that this patch is modeled on HBASE-4213, the online schema-edit patch. I'm not sure that is a good model to follow in the first place -- its disabled because it does not currently work in the face of splits though it has handler code supposedly to manage this and secondly, its a bunch of custom code specific to the schema change only. Your patch does a bunch of copy/paste from the schema patch duplicating the model and then also repeating code except for some changes in method names and the znodes we wait on up in zk. Rather don't you think we should be generalizing the common facility and having these two features share its use rather than making a copy, especially since we now we have two clients in need (Its actually three if you count merge, which IMO, this feature should be built on). For example, in both cases we need to disable table splitting. In the schema patch it does this with a waitForInflightSchemaChange check that looks at state in zk and then in the splitRegion code, we wait by invoking the below: {code} waitForSchemaChange(Bytes.toString(regionInfo.getTableName())); {code} You come along and do a repeat. You add to the splitRegion code: {code} + waitForDeleteRegion(regionInfo.getEncodedName()); {code} The list of things to check before we go ahead and split could get pretty long if we keep on down this route. Instead we should have a generic disable splitting function that both schema edit and this patch could use. Going back to your design, I see this: {code} 4. DeleteRegionTracker (new class in RS side) will process nodeChildrenChanged(), get the list of regions_to_be_deleted, check that those regions are being hosted by the RS, if yes then doDeleteRegion call deleteRegion() in HRegionServer disable the region split close the region remove from META bridge the whole in META (extending the span of before or after region) remove region directory from HDFS update state in ZK (zookeeper.znode.parent/delete-region/encoded-region-name) {code} Does the above presume all regions for a range are on a single regionserver (If not, how is the meta editing done -- in particular the bridging of the hole in .META.?). I'm asking because I think its not a good design asking regionservers to do the merge; it makes this patch more complicated than it need be IMO. I suggest we go back to the design and work forward from there. Your patch is fat and has a bunch of good stuff that we can repurpose once we have the design done. I suggest a design below. It has some prerequisites, some general function that this feature could use (and others). The prereqs if you think them good, could be done outside of this JIRA. Here's a suggested rough outline of how I think this feature should run. The feature I'm describing below is merge and deleteRegion for I see them as in essence the same thing. # Client calls merge or deleteRegion API. API is a range of rows. # Master gets call. # Master obtains a write lock on table so it can't be disabled from under us. The write lock will also disable splitting. This is one of the prereqs I think. Its HBASE-5494 (Or we could just do something simpler where we have a flag up in zk that splitRegion checks but thats less useful I think; OR we do the dynamic configs issue and set splits to off via a config. change). There'd be a timer for how long we wait on the table lock. # If we get the lock, write intent to merge a range up into zk. It also hoists into zk if its a pure merge or a merge that drops the region data (a deleteRegion call) # Return to the client either our failed attempt at locking the table or an id of some sort used identifying this running operation; can use it querying status. # Turn off balancer. TODO/prereq: Do it in a way that is persisted. Balancer switch currently in memory only so if master crashes, new master will come up in balancing mode # (If we had dynamic config. could hoist up to zk a config. that disables the balancer rather than have a balancer-specific flag/znode OR if a write lock outstanding on a table, then the balancer does not balance regions in the locked table -- this latter might be the easiest to do) # Write into zk that just turned off the balancer (If it was on) # Get regions that are involved in the span # Hoist the list up into zk. # Create region to span the range. # Write that we did this up into zk. # Close regions in parallel. Confirm close in parallel. # Write up into zk regions closed (This might not be necessary since can ask if region is open). # If a merge and not a delete region, move files under new region. Might multithread this (moves should go pretty fast). If a deleteregion, we skip this step. # On
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219843#comment-13219843 ] stack commented on HBASE-4991: -- Oh, I forgot to mention that each step in the above should be repickupable -- i.e. if the process running the above crashes, on restart it should continue where the previous left off -- up until .META. edits (even here, we should make it so we can repair). We should include a cancel facility. Anything we develop would have to be testable; both the individual steps and then the process as a whole. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219846#comment-13219846 ] stack commented on HBASE-4991: -- One more thing, it would be sweet if the above were not hardcoded but instead was a set of steps described elsewhere and malleable or even better, if we could describe the steps to run on top of some generic operations framework as per FATE, but that would be a bunch more work. How many regions are we talking of merging/deleting at any one time? I think above should work for a big table as long was we did stuff in parallel; closes and file moving. To be confirmed. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219845#comment-13219845 ] stack commented on HBASE-4991: -- One more thing, it would be sweet if the above were not hardcoded but instead was a set of steps described elsewhere and malleable or even better, if we could describe the steps to run on top of some generic operations framework as per FATE, but that would be a bunch more work. How many regions are we talking of merging/deleting at any one time? I think above should work for a big table as long was we did stuff in parallel; closes and file moving. To be confirmed. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219849#comment-13219849 ] stack commented on HBASE-4991: -- @Mubarak Our comments crossed. See further up in this issue for more on what I'm thinking; it should answer the questions you pose. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219857#comment-13219857 ] stack commented on HBASE-5424: -- Zhiyuan Why do you reference hbase-5165? You think it the cause of this patch's failures? HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: 5424-v3.patch, 5424-v3.patch, HBASE-5424.patch, HBase-5424_v2.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219858#comment-13219858 ] stack commented on HBASE-5270: -- bq. I have thought this issue for a long time, and I think preventing processing of SSH is a clear and simple solution, otherwise we should consider many cases where meta server died in different time during master initializing. Would we do the above as part of another issue Chunhui? Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219861#comment-13219861 ] stack commented on HBASE-5270: -- Also, do you need to make a new version of this patch now hbase-5454 has gone in? Thanks. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220165#comment-13220165 ] stack commented on HBASE-4991: -- @Ted bq. Do you think we should continue discussion on the new framework under HBASE-5487 ? It might be time to kill this issue and start up a new one. Not under 5487 though. What you think Mubarak? I'd think that if we started a new issue, it'd be called online merge and would first work out the design. @Mubarak bq. 10 to 18 goes to RS sideWe need ZK trackers in both sides, isn't? I don't think so. Master is just asking the regionservers to close regions. Master can create regions and do the moving of data from old regions into new. Its just fs operations. No need of regionserver context, especially not live regionserver context. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220166#comment-13220166 ] stack commented on HBASE-5399: -- @N 1 out of 29 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java.rej Is it because I just committed fat 4403 patch? Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220167#comment-13220167 ] stack commented on HBASE-5399: -- @N Probably should post the patch up on reviewboard.. its certainly fat enough Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5424) HTable meet NPE when call getRegionInfo()
[ https://issues.apache.org/jira/browse/HBASE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220168#comment-13220168 ] stack commented on HBASE-5424: -- Should we close this issue then Zhiyuan as fixed or will be fixed by 5165? Thanks. HTable meet NPE when call getRegionInfo() - Key: HBASE-5424 URL: https://issues.apache.org/jira/browse/HBASE-5424 Project: HBase Issue Type: Bug Affects Versions: 0.90.1, 0.90.5 Reporter: junhua yang Attachments: 5424-v3.patch, 5424-v3.patch, HBASE-5424.patch, HBase-5424_v2.patch Original Estimate: 48h Remaining Estimate: 48h We meet NPE when call getRegionInfo() in testing environment. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) at org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119) at org.apache.hadoop.hbase.client.HTable$2.processRow(HTable.java:395) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73) at org.apache.hadoop.hbase.client.HTable.getRegionsInfo(HTable.java:418) This NPE also make the table.jsp can't show the region information of this table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5501) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220178#comment-13220178 ] stack commented on HBASE-5501: -- Whats going on here Chunhui? You've opened a new issue w/ same title as 5270? How do they relate? Shouldn't this patch be on 5270, not on this new issue? Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5501 URL: https://issues.apache.org/jira/browse/HBASE-5501 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5501.patch In a live cluster, we do the following step 1.kill the master; 1.start the master, and master is initializingï¼› 3.master complete splitLog 4.kill the META server 5.master start assigning ROOT and META 6.Now meta region data will loss since we may assign meta region before SSH finish split log for dead META server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5501) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220180#comment-13220180 ] stack commented on HBASE-5501: -- Is this patch right? You have a disablessh flag. What about other callbacks that could come in during the initialization? I don't see where you queue up callbacks for processing post-initialization. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5501 URL: https://issues.apache.org/jira/browse/HBASE-5501 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5501.patch In a live cluster, we do the following step 1.kill the master; 1.start the master, and master is initializingï¼› 3.master complete splitLog 4.kill the META server 5.master start assigning ROOT and META 6.Now meta region data will loss since we may assign meta region before SSH finish split log for dead META server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220184#comment-13220184 ] stack commented on HBASE-5270: -- bq. I have created a new issue and submit the new patch, what about move discussion to HBASE-5501 since this one has too long comments That the issue is long is not a good reason to open new issue. We need to keep the decisions and discussion in one place. It makes it easier on the folks who are following behind us to make sense of why we did what. Would suggest you close hbase-5501. Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler - Key: HBASE-5270 URL: https://issues.apache.org/jira/browse/HBASE-5270 Project: HBase Issue Type: Sub-task Components: master Reporter: Zhihong Yu Assignee: chunhui shen Fix For: 0.92.1, 0.94.0 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt This JIRA continues the effort from HBASE-5179. Starting with Stack's comments about patches for 0.92 and TRUNK: Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times - onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5502) region_mover.rb fails to load regions back to original server for regions only containing empty tables.
[ https://issues.apache.org/jira/browse/HBASE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220191#comment-13220191 ] stack commented on HBASE-5502: -- For those following, this patch was committed to trunk mistakenly as part of the fat commit of HBASE-4403. region_mover.rb fails to load regions back to original server for regions only containing empty tables. --- Key: HBASE-5502 URL: https://issues.apache.org/jira/browse/HBASE-5502 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.0 Environment: Ubuntu precise Reporter: James Page Priority: Minor Fix For: 0.92.1, 0.94.0 Attachments: HDFS-5502.patch The region_mover loadRegion function incorrectly uses 'isSuccessfulScan': {noformat} for r in regions exists = false begin exists = isSuccessfulScan(admin, r) rescue org.apache.hadoop.hbase.NotServingRegionException = e $LOG.info(Failed scan of + e.message) end {noformat} isSuccessfulScan throws an exception when it fails rather than returning status. As a result empty regions don't get restored - this is the case in a fresh install (which is how I discovered this) with no user table. Modifying the code to set exists IF isSuccessfulScan does not throw an exception worked for me: {noformat} for r in regions exists = false begin isSuccessfulScan(admin, r) exists = true rescue org.apache.hadoop.hbase.NotServingRegionException = e $LOG.info(Failed scan of + e.message) end {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop
[ https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220192#comment-13220192 ] stack commented on HBASE-4403: -- For those following behind, this patch went in in two batches. First the doc change and then the bulk went in mistakenly as a commit against hbase-5502. I left the commit but then edited the commit message to say the below: {code} Author: stack Revision: 1295710 Modified property: svn:log Modified: svn:log at Thu Mar 1 17:57:34 2012 -- --- svn:log (original) +++ svn:log Thu Mar 1 17:57:34 2012 @@ -1 +1 @@ -HBASE-5502 region_mover.rb fails to load regions back to original server for regions only containing empty tables. +HBASE-4403 Adopt interface stability/audience classifications from Hadoop AND HBASE-5502 region_mover.rb fails to load regions back to original server for regions only containing empty tables {code} Adopt interface stability/audience classifications from Hadoop -- Key: HBASE-4403 URL: https://issues.apache.org/jira/browse/HBASE-4403 Project: HBase Issue Type: Task Affects Versions: 0.90.5, 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, hbase-4403-interface_v3.txt, hbase-4403-nowhere-near-done.txt, hbase-4403.patch, hbase-4403.patch As HBase gets more widely used, we need to be more explicit about which APIs are stable and not expected to break between versions, which APIs are still evolving, etc. We also have many public classes that are really internal to the RS or Master and not meant to be used by users. Hadoop has adopted a classification scheme for audience (public, private, or limited-private) as well as stability (stable, evolving, unstable). I think we should copy these annotations to HBase and start to classify our public classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220199#comment-13220199 ] stack commented on HBASE-4991: -- Ted: This has been raised already above as a concern and some suggestions have been made that we // certain ops. Your suggestion that we farm out the closing of regions to the regionservers themselves will make no difference regards how fast regions close and ditto regards deletes. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220219#comment-13220219 ] stack commented on HBASE-5451: -- I wonder how Jimmy is doing stuff like the below: {code} +import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.ConnectionHeaderProto; +import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcRequestWithHeaderProto; +import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcResponseWithHeaderProto; +import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcExceptionProto; +import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcRequestProto; +import org.apache.hadoop.hbase.ipc.protobuf.RPCMessageProtos.RpcResponseProto; {code} ... over in his patch (he's just done the proto stuff so far -- not sure about what the generated stuff will look like just yet). We need to have convention around this stuff. Might be worth chat up on dev list. For example, I wonder if we need the protobuf package here since all proto classes are contained in RPCMessageProtos (and its got a suffix to identify it as proto generated). The below is a pity but as you say, shouldn't live long: {code} + result.write(d); + //makes a copy; but this part of code is not going to live long + //hopefully (only until we move all the protocols to protobuf) + response.setResponse(ByteString.copyFrom(d.getData())); {code} Is that maybeTranslate method repeated? Exceptions still strings then? +response.getException().getStackTrace())); Looks like your pom edit clashes with Jimmys over in the HRegionInterface redo. May the first commit win. The doc on the proto file is great. This is shaping up nice. Jimmy, you should review! Switch RPC call envelope/headers to PBs --- Key: HBASE-5451 URL: https://issues.apache.org/jira/browse/HBASE-5451 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Affects Versions: 0.94.0 Reporter: Todd Lipcon Assignee: Devaraj Das Fix For: 0.96.0 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220345#comment-13220345 ] stack commented on HBASE-4991: -- bq. When we change the location of rename() call from master to region server for distributed log splitting, the duration was shortened from 22 minutes to 7 minutes for the same dataset. Because rename was done via multiple clients rather than in parallel on master? You sure it wasn't because of something else? (Distributed splitting is a different type of process to what is going on here) What do you want to farm out to the regionservers? We are already farming out work in the design above. We ask the regionservers to close regions for us. You want to farm out more than this? Control? To what end other than complicating the design? bq. I wonder if you have statistics showing that master-side operation (for moving/deleting data of old regions) makes no difference in performance w.r.t. distributed operations. Well, stands to reason I'd think. I'd put it on you to come up w/ proof that what seems reasonable actually isn't, at least when talking about the tens of regions at most which is what I think this issue is about. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4348) Add metrics for regions in transition
[ https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220346#comment-13220346 ] stack commented on HBASE-4348: -- @Himanshu That would be useful Add metrics for regions in transition - Key: HBASE-4348 URL: https://issues.apache.org/jira/browse/HBASE-4348 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Himanshu Vashishtha Priority: Minor Labels: noob The following metrics would be useful for monitoring the master: - the number of regions in transition - the number of regions in transition that have been in transition for more than a minute - how many seconds has the oldest region-in-transition been in transition -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4348) Add metrics for regions in transition
[ https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220366#comment-13220366 ] stack commented on HBASE-4348: -- Can you make it so the table doesn't bump up against the first one? Should it be a separate table? Why not add sum on end and columns to the first table showing time in transition? Flag red the one that has been in transition the longest? Add metrics for regions in transition - Key: HBASE-4348 URL: https://issues.apache.org/jira/browse/HBASE-4348 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Himanshu Vashishtha Priority: Minor Labels: noob Attachments: 4348-v1.patch, 4348-v2.patch, RITs.png The following metrics would be useful for monitoring the master: - the number of regions in transition - the number of regions in transition that have been in transition for more than a minute - how many seconds has the oldest region-in-transition been in transition -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5492) Caching StartKeys and EndKeys of Regions
[ https://issues.apache.org/jira/browse/HBASE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220370#comment-13220370 ] stack commented on HBASE-5492: -- bq. Your question can be converted into who is responsible for refreshing of cachedRegionLocations? You can do that. Its done as the HTable runs but there are no guarantees the cache is complete at any one time, right? Change + if (tableLocations.size() == 0) { to tableLocations.isEmpty().. the former can be costly compared. So, IIUC, you are asking to pull all region locations local in locateRegions? Does this get all table regions? Or just the configured next ten regions? +prefetchRegionCache(tableName, null); Caching StartKeys and EndKeys of Regions Key: HBASE-5492 URL: https://issues.apache.org/jira/browse/HBASE-5492 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.92.0 Environment: all Reporter: honghua zhu Fix For: 0.92.1 Attachments: HBASE-5492.patch Each call for HTable.getStartEndKeys will read meta table. In particular, in the case of client side multi-threaded concurrency statistics, we must call HTable.coprocessorExec== getStartKeysInRange == getStartEndKeys, resulting in the need to always scan the meta table. This is not necessary, we can implement the HConnectionManager.HConnectionImplementation.locateRegions(byte[] tableName) method, then, get the StartKeys and EndKeys from the cachedRegionLocations of HConnectionImplementation. Combined with https://issues.apache.org/jira/browse/HBASE-5491, can improve the performance of statistical -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220377#comment-13220377 ] stack commented on HBASE-4991: -- I did already at '01/Mar/12 17:17' Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220383#comment-13220383 ] stack commented on HBASE-3909: -- @Jimmy bq. we don't have to poll fs to find changes. We can just put the lastmodifieddate of the file in ZK. Once the last modified date is changed, we can load the file again. Why have the fs involved at all. What would be the advantage? Why not just put the changed configs. up in zk? Because we could lose them (because we do not keep permanent data up in zk)? That'd be ok I think. If you don't move the configs to hbase-site, then its your own fault if they are lost when zk data is cleared. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.96.0 I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220680#comment-13220680 ] stack commented on HBASE-5494: -- I think maybe region locks is for later. Meantime, getting a read lock on the table when doing a split or merge might carry us a long way.before we need region specific locks. @Mubarak Todd has a good point that respecting locking order is important. Maybe we should use the facility where zk can add the seqid to the name? So maybe locks are named: zookeeper.znode.parent/locks/table_name/r-seqid or zookeeper.znode.parent/locks/table_name/w-seqid where the data then is details on who took the lock? And you can't take a write lock if any instances of read lock outstanding? I was thinking that maybe clients would be prepared to wait some time obtaining a read lock but that they might fail fast if they could not get a read lock? Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5504) Online Merge
[ https://issues.apache.org/jira/browse/HBASE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220689#comment-13220689 ] stack commented on HBASE-5504: -- bq. Since region splitting is disabled after step 3, how do we deal with the case where the start of row range lies in the middle of a region ? Good question. How about, if a merge, we include the hosting region in the merge. If a delete region, we throw an exception saying you need to specify region edges. bq. For step 14, we still need to move data for delete region request because we should have chosen one of the neighbor regions to cover the hole in .META. I think this comes of a misunderstanding that you might have Ted. You can't alter region edges once created. For example, the directory in hdfs is the hash of regionname which includes at least the startkey and should one day include the endkey... If you change the delimiting keys, you have to make a new region. Were you thinking we could change the delimiting keys on an existing region? On 'What if the master crashes anywhere between step 6 and step 19 ?', its what Mubarak says; the new master comes up and after initializing, tries to pick up merge/delete from where the previous master left off... Or simpler, it could just undo it all? bq. How do we get around if merge/delete-range get stuck (it should not but if it happens???) I think we need to make the operation cancelable? In shell/api, there'd be a cancel operation on table! (Since you need a write lock to do one of these operations, this would mean one operation at a time per table only. maybe no need of there being a lock transaction id because only one happening at a time?) Later we can do something better where if an operation does not complete, the master operation runner would do the cancel.. but that we could do later? Online Merge Key: HBASE-5504 URL: https://issues.apache.org/jira/browse/HBASE-5504 Project: HBase Issue Type: Brainstorming Components: client, master, shell, zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Fix For: 0.96.0 As discussed, please refer the discussion at [HBASE-4991|https://issues.apache.org/jira/browse/HBASE-4991] Design suggestion from Stack: {quote} I suggest a design below. It has some prerequisites, some general function that this feature could use (and others). The prereqs if you think them good, could be done outside of this JIRA. Here's a suggested rough outline of how I think this feature should run. The feature I'm describing below is merge and deleteRegion for I see them as in essence the same thing. (C) Client, (M) Master, RS (Region server), ZK (ZooKeeper) 1. Client calls merge or deleteRegion API. API is a range of rows. (C) 2. Master gets call. (M) 3. Master obtains a write lock on table so it can't be disabled from under us. The write lock will also disable splitting. This is one of the prereqs I think. Its HBASE-5494 (Or we could just do something simpler where we have a flag up in zk that splitRegion checks but thats less useful I think; OR we do the dynamic configs issue and set splits to off via a config. change). There'd be a timer for how long we wait on the table lock. (M - ZK) 4. If we get the lock, write intent to merge a range up into zk. It also hoists into zk if its a pure merge or a merge that drops the region data (a deleteRegion call) (M) 5. Return to the client either our failed attempt at locking the table or an id of some sort used identifying this running operation; can use it querying status. (M - C) 6. Turn off balancer. TODO/prereq: Do it in a way that is persisted. Balancer switch currently in memory only so if master crashes, new master will come up in balancing mode # (If we had dynamic config. could hoist up to zk a config. that disables the balancer rather than have a balancer-specific flag/znode OR if a write lock outstanding on a table, then the balancer does not balance regions in the locked table - this latter might be the easiest to do) (M) 7. Write into zk that just turned off the balancer (If it was on) (M - ZK) 8. Get regions that are involved in the span (M) 9. Hoist the list up into zk. (M - ZK) 10. Create region to span the range. (M) 11. Write that we did this up into zk. (M - ZK) 12. Close regions in parallel. Confirm close in parallel. (M - RS) 13. Write up into zk regions closed (This might not be necessary since can ask if region is open). (M - ZK) 14. If a merge and not a delete region, move files under new region. Might multithread this (moves should go pretty fast). If a deleteregion, we skip this step. (M) 15. On completion mark zk (though may not be necessary since its easy to look in fs to see state of move). (M - ZK)
[jira] [Commented] (HBASE-4348) Add metrics for regions in transition
[ https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220693#comment-13220693 ] stack commented on HBASE-4348: -- @Himanshu Yeah, a line on the end w/ count of regions one minute up in RIT would be good enough. You could make them yellow in the listing. But yeah, would be great if they came out as metrics as per Mr. Todd. Add metrics for regions in transition - Key: HBASE-4348 URL: https://issues.apache.org/jira/browse/HBASE-4348 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Himanshu Vashishtha Priority: Minor Labels: noob Attachments: 4348-v1.patch, 4348-v2.patch, RITs.png The following metrics would be useful for monitoring the master: - the number of regions in transition - the number of regions in transition that have been in transition for more than a minute - how many seconds has the oldest region-in-transition been in transition -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220699#comment-13220699 ] stack commented on HBASE-4991: -- @Mubarak Should we then close this issue as subsumed by the new issue (we can reuse your code pasted here over in the new issue after we work through design). Good stuff. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220721#comment-13220721 ] stack commented on HBASE-5399: -- Thats a wide variety in the types of failures. You get same kind of variance absent your patch N? Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5419) FileAlreadyExistsException has moved from mapred to fs package
[ https://issues.apache.org/jira/browse/HBASE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220739#comment-13220739 ] stack commented on HBASE-5419: -- Well, maybe this is for 0.96 then? That ok w/ you Dhruba? 0.96 will be the singularity, the release that gets the protobuf rpcs will require cluster shutdown and restart but thereafter, we should be able to upgrade running hbase across major versions. FileAlreadyExistsException has moved from mapred to fs package -- Key: HBASE-5419 URL: https://issues.apache.org/jira/browse/HBASE-5419 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Fix For: 0.94.0 Attachments: D1767.1.patch, D1767.1.patch The FileAlreadyExistsException has moved from org.apache.hadoop.mapred.FileAlreadyExistsException to org.apache.hadoop.fs.FileAlreadyExistsException. HBase is currently using a class that is deprecated in hadoop trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5504) Online Merge
[ https://issues.apache.org/jira/browse/HBASE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221023#comment-13221023 ] stack commented on HBASE-5504: -- bq. I meant that data for the neighbor region we choose should be copied. The neighbor region would have new delimiting key. Sorry, I'm not following Ted. You need to bring me a long. Thanks. Online Merge Key: HBASE-5504 URL: https://issues.apache.org/jira/browse/HBASE-5504 Project: HBase Issue Type: Brainstorming Components: client, master, shell, zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Fix For: 0.96.0 As discussed, please refer the discussion at [HBASE-4991|https://issues.apache.org/jira/browse/HBASE-4991] Design suggestion from Stack: {quote} I suggest a design below. It has some prerequisites, some general function that this feature could use (and others). The prereqs if you think them good, could be done outside of this JIRA. Here's a suggested rough outline of how I think this feature should run. The feature I'm describing below is merge and deleteRegion for I see them as in essence the same thing. (C) Client, (M) Master, RS (Region server), ZK (ZooKeeper) 1. Client calls merge or deleteRegion API. API is a range of rows. (C) 2. Master gets call. (M) 3. Master obtains a write lock on table so it can't be disabled from under us. The write lock will also disable splitting. This is one of the prereqs I think. Its HBASE-5494 (Or we could just do something simpler where we have a flag up in zk that splitRegion checks but thats less useful I think; OR we do the dynamic configs issue and set splits to off via a config. change). There'd be a timer for how long we wait on the table lock. (M - ZK) 4. If we get the lock, write intent to merge a range up into zk. It also hoists into zk if its a pure merge or a merge that drops the region data (a deleteRegion call) (M) 5. Return to the client either our failed attempt at locking the table or an id of some sort used identifying this running operation; can use it querying status. (M - C) 6. Turn off balancer. TODO/prereq: Do it in a way that is persisted. Balancer switch currently in memory only so if master crashes, new master will come up in balancing mode # (If we had dynamic config. could hoist up to zk a config. that disables the balancer rather than have a balancer-specific flag/znode OR if a write lock outstanding on a table, then the balancer does not balance regions in the locked table - this latter might be the easiest to do) (M) 7. Write into zk that just turned off the balancer (If it was on) (M - ZK) 8. Get regions that are involved in the span (M) 9. Hoist the list up into zk. (M - ZK) 10. Create region to span the range. (M) 11. Write that we did this up into zk. (M - ZK) 12. Close regions in parallel. Confirm close in parallel. (M - RS) 13. Write up into zk regions closed (This might not be necessary since can ask if region is open). (M - ZK) 14. If a merge and not a delete region, move files under new region. Might multithread this (moves should go pretty fast). If a deleteregion, we skip this step. (M) 15. On completion mark zk (though may not be necessary since its easy to look in fs to see state of move). (M - ZK) 16. Edit .META. (M) 17. Confirm edits went in. (M) 18. Move old regions to hbase trash folder TODO: There is no trash folder under /hbase currently. We should add one. (M) 19. Enable balancer (if it was off) (M) 20. Unlock table (M) {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4890) fix possible NPE in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221112#comment-13221112 ] stack commented on HBASE-4890: -- Any more luck w/ this one J-D (or you got distracted?) fix possible NPE in HConnectionManager -- Key: HBASE-4890 URL: https://issues.apache.org/jira/browse/HBASE-4890 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.92.1 I was running YCSB against a 0.92 branch and encountered this error message: {code} 11/11/29 08:47:16 WARN client.HConnectionManager$HConnectionImplementation: Failed all from region=usertable,user3917479014967760871,1322555655231.f78d161e5724495a9723bcd972f97f41., hostname=c0316.hal.cloudera.com, port=57020 java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1501) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1353) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:898) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:775) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:750) at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source) at com.yahoo.ycsb.DBWrapper.update(Unknown Source) at com.yahoo.ycsb.workloads.CoreWorkload.doTransactionUpdate(Unknown Source) at com.yahoo.ycsb.workloads.CoreWorkload.doTransaction(Unknown Source) at com.yahoo.ycsb.ClientThread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1315) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1327) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1325) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158) at $Proxy4.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1330) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1328) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1309) ... 7 more {code} It looks like the NPE is caused by server being null in the MultiRespone call() method. {code} public MultiResponse call() throws IOException { return getRegionServerWithoutRetries( new ServerCallableMultiResponse(connection, tableName, null) { public MultiResponse call() throws IOException { return server.multi(multi); } @Override public void connect(boolean reload) throws IOException { server = connection.getHRegionConnection(loc.getHostname(), loc.getPort()); } } ); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221157#comment-13221157 ] stack commented on HBASE-5451: -- Can we have hbase go all-pb for hbase 0.96.0? Switch RPC call envelope/headers to PBs --- Key: HBASE-5451 URL: https://issues.apache.org/jira/browse/HBASE-5451 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Affects Versions: 0.94.0 Reporter: Todd Lipcon Assignee: Devaraj Das Fix For: 0.96.0 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221159#comment-13221159 ] stack commented on HBASE-5451: -- Thats a dumb question. Let me rephrase. Won't hbase be all pb natively by 0.96.0? Switch RPC call envelope/headers to PBs --- Key: HBASE-5451 URL: https://issues.apache.org/jira/browse/HBASE-5451 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Affects Versions: 0.94.0 Reporter: Todd Lipcon Assignee: Devaraj Das Fix For: 0.96.0 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221170#comment-13221170 ] stack commented on HBASE-5451: -- Client APIs should be the same but yeah, lets get up on pb before 0.96.0; a blocker as per DD. Switch RPC call envelope/headers to PBs --- Key: HBASE-5451 URL: https://issues.apache.org/jira/browse/HBASE-5451 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Affects Versions: 0.94.0 Reporter: Todd Lipcon Assignee: Devaraj Das Fix For: 0.96.0 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5504) Online Merge
[ https://issues.apache.org/jira/browse/HBASE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221179#comment-13221179 ] stack commented on HBASE-5504: -- @Ted that takes me to a comment I made. I still am without understanding. Online Merge Key: HBASE-5504 URL: https://issues.apache.org/jira/browse/HBASE-5504 Project: HBase Issue Type: Brainstorming Components: client, master, shell, zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Fix For: 0.96.0 As discussed, please refer the discussion at [HBASE-4991|https://issues.apache.org/jira/browse/HBASE-4991] Design suggestion from Stack: {quote} I suggest a design below. It has some prerequisites, some general function that this feature could use (and others). The prereqs if you think them good, could be done outside of this JIRA. Here's a suggested rough outline of how I think this feature should run. The feature I'm describing below is merge and deleteRegion for I see them as in essence the same thing. (C) Client, (M) Master, RS (Region server), ZK (ZooKeeper) 1. Client calls merge or deleteRegion API. API is a range of rows. (C) 2. Master gets call. (M) 3. Master obtains a write lock on table so it can't be disabled from under us. The write lock will also disable splitting. This is one of the prereqs I think. Its HBASE-5494 (Or we could just do something simpler where we have a flag up in zk that splitRegion checks but thats less useful I think; OR we do the dynamic configs issue and set splits to off via a config. change). There'd be a timer for how long we wait on the table lock. (M - ZK) 4. If we get the lock, write intent to merge a range up into zk. It also hoists into zk if its a pure merge or a merge that drops the region data (a deleteRegion call) (M) 5. Return to the client either our failed attempt at locking the table or an id of some sort used identifying this running operation; can use it querying status. (M - C) 6. Turn off balancer. TODO/prereq: Do it in a way that is persisted. Balancer switch currently in memory only so if master crashes, new master will come up in balancing mode # (If we had dynamic config. could hoist up to zk a config. that disables the balancer rather than have a balancer-specific flag/znode OR if a write lock outstanding on a table, then the balancer does not balance regions in the locked table - this latter might be the easiest to do) (M) 7. Write into zk that just turned off the balancer (If it was on) (M - ZK) 8. Get regions that are involved in the span (M) 9. Hoist the list up into zk. (M - ZK) 10. Create region to span the range. (M) 11. Write that we did this up into zk. (M - ZK) 12. Close regions in parallel. Confirm close in parallel. (M - RS) 13. Write up into zk regions closed (This might not be necessary since can ask if region is open). (M - ZK) 14. If a merge and not a delete region, move files under new region. Might multithread this (moves should go pretty fast). If a deleteregion, we skip this step. (M) 15. On completion mark zk (though may not be necessary since its easy to look in fs to see state of move). (M - ZK) 16. Edit .META. (M) 17. Confirm edits went in. (M) 18. Move old regions to hbase trash folder TODO: There is no trash folder under /hbase currently. We should add one. (M) 19. Enable balancer (if it was off) (M) 20. Unlock table (M) {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4890) fix possible NPE in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221643#comment-13221643 ] stack commented on HBASE-4890: -- This NPE is a bit too easy to manufacture. Should we hold up 0.92.1 till fixed? Can work on it monday? fix possible NPE in HConnectionManager -- Key: HBASE-4890 URL: https://issues.apache.org/jira/browse/HBASE-4890 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.92.1 I was running YCSB against a 0.92 branch and encountered this error message: {code} 11/11/29 08:47:16 WARN client.HConnectionManager$HConnectionImplementation: Failed all from region=usertable,user3917479014967760871,1322555655231.f78d161e5724495a9723bcd972f97f41., hostname=c0316.hal.cloudera.com, port=57020 java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1501) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1353) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:898) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:775) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:750) at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source) at com.yahoo.ycsb.DBWrapper.update(Unknown Source) at com.yahoo.ycsb.workloads.CoreWorkload.doTransactionUpdate(Unknown Source) at com.yahoo.ycsb.workloads.CoreWorkload.doTransaction(Unknown Source) at com.yahoo.ycsb.ClientThread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1315) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1327) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1325) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158) at $Proxy4.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1330) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1328) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1309) ... 7 more {code} It looks like the NPE is caused by server being null in the MultiRespone call() method. {code} public MultiResponse call() throws IOException { return getRegionServerWithoutRetries( new ServerCallableMultiResponse(connection, tableName, null) { public MultiResponse call() throws IOException { return server.multi(multi); } @Override public void connect(boolean reload) throws IOException { server = connection.getHRegionConnection(loc.getHostname(), loc.getPort()); } } ); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5371) Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API
[ https://issues.apache.org/jira/browse/HBASE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222634#comment-13222634 ] stack commented on HBASE-5371: -- Please integrate into 0.92 Ted. Thanks. Introduce AccessControllerProtocol.checkPermissions(Permission[] permissons) API Key: HBASE-5371 URL: https://issues.apache.org/jira/browse/HBASE-5371 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.0 Attachments: HBASE-5371-addendum_v1.patch, HBASE-5371_v2.patch, HBASE-5371_v3-noprefix.patch, HBASE-5371_v3.patch We need to introduce something like AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so that clients can check access rights before carrying out the operations. We need this kind of operation for HCATALOG-245, which introduces authorization providers for hbase over hcat. We cannot use getUserPermissions() since it requires ADMIN permissions on the global/table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5358) HBaseObjectWritable should be able to serialize/deserialize generic arrays
[ https://issues.apache.org/jira/browse/HBASE-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222701#comment-13222701 ] stack commented on HBASE-5358: -- This will do. +1 HBaseObjectWritable should be able to serialize/deserialize generic arrays -- Key: HBASE-5358 URL: https://issues.apache.org/jira/browse/HBASE-5358 Project: HBase Issue Type: Improvement Components: coprocessors, io Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.0 Attachments: 5358-92.txt, HBASE-5358_v3.patch HBaseObjectWritable can encode Writable[]'s but, but cannot encode A[] where A extends Writable. This becomes an issue for example when adding a coprocessor method which takes A[] (see HBASE-5352). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223350#comment-13223350 ] stack commented on HBASE-5494: -- @Ram Yes sir. Thanks. Resolved hbase-5373 as duplicate of this. Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
[ https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223401#comment-13223401 ] stack commented on HBASE-5531: -- +1 on patch and +1 on commit to all of the branches cited above. Thanks Ram. Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot - Key: HBASE-5531 URL: https://issues.apache.org/jira/browse/HBASE-5531 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.92.2 Reporter: Laxman Labels: build Fix For: 0.92.2, 0.96.0 Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch Current profile is still pointing to 0.23.1-SNAPSHOT. This is failing to build as 23.1 is already released and snapshot is not available anymore. We can update this to 0.23.2-SNAPSHOT. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5436) Right-size the map when reading attributes.
[ https://issues.apache.org/jira/browse/HBASE-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223446#comment-13223446 ] stack commented on HBASE-5436: -- Please commit to both Lars. Thanks. Right-size the map when reading attributes. --- Key: HBASE-5436 URL: https://issues.apache.org/jira/browse/HBASE-5436 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Trivial Labels: performance Fix For: 0.94.0 Attachments: 0001-Right-size-the-map-when-reading-attributes.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
[ https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223493#comment-13223493 ] stack commented on HBASE-5531: -- I added him (See 'Administration' in JIRA. You should have access. Once in administration screens, look for people along the left.. the rest should be plain... bug me if you can't figure it). Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot - Key: HBASE-5531 URL: https://issues.apache.org/jira/browse/HBASE-5531 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.92.2 Reporter: Laxman Labels: build Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch Current profile is still pointing to 0.23.1-SNAPSHOT. This is failing to build as 23.1 is already released and snapshot is not available anymore. We can update this to 0.23.2-SNAPSHOT. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223523#comment-13223523 ] stack commented on HBASE-5533: -- Did you mean the below: {code} -# hbase.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext -# hbase.period=10 -# hbase.fileName=/tmp/metrics_hbase.log +hbase.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext +hbase.period=10 +hbase.fileName=/tmp/metrics_hbase.log {code} Will there be a bunch of contention on these additions: {code} + static volatile BlockingQueueLong fsReadLatenciesNanos = new ArrayBlockingQueueLong(LATENCY_BUFFER_SIZE); {code} Could this fill the logs with thousands of repeated messages: {code} + if (!stored) { +LOG.warn(Dropping fs latency stat since buffer is full); + } {code} Could we use the cliff click counters instead of AtomicLong? They are on the classpath IIRC: {code} + private final MapString, AtomicLong counts; {code} These additions would be great to have. Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Attachments: hbase-5533-0.92.patch To debub/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223615#comment-13223615 ] stack commented on HBASE-5399: -- @LarsH I think its too radical a change in client behavior for 0.94. If we target it for 0.96, it'll be a ripple only compared to rpc changes; it won't be noticed. Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 5399.v40.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223660#comment-13223660 ] stack commented on HBASE-5399: -- @nkeywal yes, agree, good to deprecate in 0.94 rather than 0.96 so more time to move off the old methods Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 5399.v40.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223662#comment-13223662 ] stack commented on HBASE-5399: -- ... so it seems like there needs to be a separate patch for 0.94? Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 5399.v40.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223705#comment-13223705 ] stack commented on HBASE-5074: -- @Dhruba Try resubmitting your patch too. We regularly see three of these mr tests fail. Fixed in hadoop 1.0.2 apparently. support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.11.patch, D1521.11.patch, D1521.12.patch, D1521.12.patch, D1521.13.patch, D1521.13.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5534) HBase shell's return value is almost always 0
[ https://issues.apache.org/jira/browse/HBASE-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223887#comment-13223887 ] stack commented on HBASE-5534: -- You should be able to do: {code} hbase shell script.rb {code} ... if thats of any help Alex (may still have the issue w/ exit code). HBase shell's return value is almost always 0 - Key: HBASE-5534 URL: https://issues.apache.org/jira/browse/HBASE-5534 Project: HBase Issue Type: Improvement Reporter: Alex Newman Assignee: Alex Newman So I was trying to write some simple scripts to verify client connections to HBase using the shell and I noticed that the HBase shell always returns 0 even when it can't connect to an HBase server. I'm not sure if this is the best option. What would be neat is if you had some capability to run commands like hbase shell --command='disable table;\ndrop table;' and it would error out if any of the commands fail to succeed. echo disable table | hbase shell could continue to work as it does now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
[ https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225041#comment-13225041 ] stack commented on HBASE-5480: -- I don't have this class in 0.92 Andrew, so 0.92 should be fine (Thanks for flagging it) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-5480.patch There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5540) Update apache jenkins to include 0.94 and builds against Hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225322#comment-13225322 ] stack commented on HBASE-5540: -- Done. I missed adding it to the hbase group. Update apache jenkins to include 0.94 and builds against Hadoop 0.23 Key: HBASE-5540 URL: https://issues.apache.org/jira/browse/HBASE-5540 Project: HBase Issue Type: Task Components: build, test Reporter: Jonathan Hsieh Labels: jenkins Currently there is no hbase 0.94 apache jenkins build and the trunk on hadoop 0.23 builds are disabled. Ideally we should add the former and re-enable the latter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections
[ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225566#comment-13225566 ] stack commented on HBASE-5543: -- Yeah, it looks like its inevitable that we'll ask the server to do legitimate stuff that will take longer than the rpctimeout yet the server is making headway: e.g. the reproducing test case, though a little artificial, for HBASE-4890 fix possible NPE in HConnectionManager was asking the regionserver to open 3k regions. If its a task like the above, there should be a facility for telling client we're alive still or we should just refuse the request because it will take too long (The latter we need to do t -- from Benoiit. If server is going to take too long servicing a request, so long the client will be gone by the time its done its work, then refuse the request... don't do the increment or update that the updating client will not be around to see). Add a keepalive option for IPC connections -- Key: HBASE-5543 URL: https://issues.apache.org/jira/browse/HBASE-5543 Project: HBase Issue Type: Improvement Components: client, coprocessors, ipc Reporter: Andrew Purtell On the user list someone wrote in with a connection failure due to a long running coprocessor: {quote} On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote: 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException {quote} I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor). LarsH +1ed the idea: {quote} +1 on keepalive. It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up. Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225654#comment-13225654 ] stack commented on HBASE-5074: -- Wahoo!! Lars, you want to pull it into 0.94? (Does this mean 0.94 is good to go? Should we put up an RC?) support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.11.patch, D1521.11.patch, D1521.12.patch, D1521.12.patch, D1521.13.patch, D1521.13.patch, D1521.14.patch, D1521.14.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5543) Add a keepalive option for IPC connections
[ https://issues.apache.org/jira/browse/HBASE-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225818#comment-13225818 ] stack commented on HBASE-5543: -- bq. Instead of adding to the rpc to make it keep alive longer, maybe be make it async, returning some sort of uuid token that the client can poll (or get notified) for progress instead? I like this idea. Add a keepalive option for IPC connections -- Key: HBASE-5543 URL: https://issues.apache.org/jira/browse/HBASE-5543 Project: HBase Issue Type: Improvement Components: client, coprocessors, ipc Reporter: Andrew Purtell On the user list someone wrote in with a connection failure due to a long running coprocessor: {quote} On Wed, Mar 7, 2012 at 10:59 PM, raghavendhra rahul wrote: 2012-03-08 12:03:09,475 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call execCoprocessor([B@50cb21, getProjection(), rpc version=1, client version=0, methodsFingerPrint=0), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.184.17.26:46472: output error 2012-03-08 12:03:09,476 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020 caught: java.nio.channels.ClosedChannelException {quote} I suggested in response we might consider give our RPC a keepalive option for calls that may run for a long time (like execCoprocessor). LarsH +1ed the idea: {quote} +1 on keepalive. It's a shame (especially for long running server code) to do all the work, just to find out at the end that the client has given up. Or maybe there should be a way to cancel an operation if the clients decides it does not want to wait any longer (PostgreSQL does that for example). Here that would mean the server would need to check periodically and coprocessors would need to be written to support that - so maybe that's no-starter. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225821#comment-13225821 ] stack commented on HBASE-5325: -- This patch adds beans at org.apache.hbase, right? HBase and RegionServer are bad names for mbeans. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5541) Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks
[ https://issues.apache.org/jira/browse/HBASE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225882#comment-13225882 ] stack commented on HBASE-5541: -- Commit is fine. What about case where the edits are applied, sync succeeds, but the cp postdelete and postput fail? Then mvcc will have been updated but we have removed the edits from memstore. i suppose its ok? The read point is advanced but if we take this edit in isolation, no mutations made it through? Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks Key: HBASE-5541 URL: https://issues.apache.org/jira/browse/HBASE-5541 Project: HBase Issue Type: Sub-task Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5541-v2.txt, 5541.txt Currently mutateRowsWithLocks holds the row lock while the HLog is sync'ed. Similar to what we do in doMiniBatchPut, we should create the log entry with the lock held, but only sync the HLog after the lock is released, along with rollback logic in case the sync'ing fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225892#comment-13225892 ] stack commented on HBASE-5325: -- HBase and RegionServer should be under org.apache.hbase too I'd think. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5541) Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks
[ https://issues.apache.org/jira/browse/HBASE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225900#comment-13225900 ] stack commented on HBASE-5541: -- Grand Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks Key: HBASE-5541 URL: https://issues.apache.org/jira/browse/HBASE-5541 Project: HBase Issue Type: Sub-task Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5541-v2.txt, 5541.txt Currently mutateRowsWithLocks holds the row lock while the HLog is sync'ed. Similar to what we do in doMiniBatchPut, we should create the log entry with the lock held, but only sync the HLog after the lock is released, along with rollback logic in case the sync'ing fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5552) Clean up our jmx view; its a bit of a mess
[ https://issues.apache.org/jira/browse/HBASE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226172#comment-13226172 ] stack commented on HBASE-5552: -- Let me fix formatting: {code} hadoop HBase Master RegionServer {code} It should be... {code} hadoop hbase master regionserver {code} Clean up our jmx view; its a bit of a mess -- Key: HBASE-5552 URL: https://issues.apache.org/jira/browse/HBASE-5552 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker Attachments: 0.92.0jmx.png Fix before we release 0.92.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5552) Clean up our jmx view; its a bit of a mess
[ https://issues.apache.org/jira/browse/HBASE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226182#comment-13226182 ] stack commented on HBASE-5552: -- As is, our naming is just broke. You can't have instances of different clusters on one machine. Our master bean is not uniquely named so you can't have multiple masters on the one machine ditto regionservers (The rpc servers publish their own bean distingushed by the port they run on which is better only should probably have master or regionserver prefix). The name of our master bean is 'MasterStatistics' though its metrics only (even the operation is a reset on metrics). Doing minimum so can get 0.92.1 in this issue but this stuff needs a revamp. Clean up our jmx view; its a bit of a mess -- Key: HBASE-5552 URL: https://issues.apache.org/jira/browse/HBASE-5552 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker Attachments: 0.92.0jmx.png Fix before we release 0.92.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226200#comment-13226200 ] stack commented on HBASE-5325: -- I took a look. Its broke (I'm to blame I'd say for it being broke). Did basic fixup and committed in hbase-5552. Made new issue to revisit our jmx view. Its always been broke. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226238#comment-13226238 ] stack commented on HBASE-5533: -- Line lengths are usually 80 chars in our src code Shaneal. Minor, assign and declare on the one line the below rather than wait till constructor? {code} +this.counts = new MapMaker().makeComputingMap(new FunctionString, Counter() { + @Override + public Counter apply(String input) { +return new Counter(); + } +}); {code} This looks cool... ExponentiallyDecayingSample Snapshot looks excellent. UniformSample looks sweet On this... {code} +final long startTime = System.nanoTime(); {code} ... aren't we getting a currentTimeMillis soon after? Do we want to be doing all this time getting? We should minimize as much as we can? Check it out I'd say. Also, there is EdgeEnvironment or EnvironmentEdge that we use for system things like getting time because we want to have a layer between us and system especially when testing ... so we can mess things up. You might want to check it out. Tests look good. How does this stuff look for a server under load? Will say tsdb be able to make sense of it? Hows it look in the ui? I suppose I could just apply the patch and try it but you might have some pictures laying around. How did we make it this far w/o this stuff? Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Attachments: BlockingQueueContention.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch To debub/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5552) Clean up our jmx view; its a bit of a mess
[ https://issues.apache.org/jira/browse/HBASE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226296#comment-13226296 ] stack commented on HBASE-5552: -- Its not incompatible change Todd because we've not had a release w/ these new mbeans yet. On the rename, yeah, over in 'HBASE-5553 Revisit our jmx view', it suggests we'd have to wait till the singularity. Clean up our jmx view; its a bit of a mess -- Key: HBASE-5552 URL: https://issues.apache.org/jira/browse/HBASE-5552 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.92.1, 0.94.0 Attachments: 0.92.0jmx.png, 5552.txt, currentjmxview.png, patchedjmxview.png Fix before we release 0.92.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5552) Clean up our jmx view; its a bit of a mess
[ https://issues.apache.org/jira/browse/HBASE-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226341#comment-13226341 ] stack commented on HBASE-5552: -- bq. Oh, does this not affect the other metrics published at /jmx in the servlet? Sorry for confusion. It should, but it doesn't (smile). Its just a renaming of the beans added by HBASE-5325 (0.92.1 is their first airing in a release). I should have been more clear. Clean up our jmx view; its a bit of a mess -- Key: HBASE-5552 URL: https://issues.apache.org/jira/browse/HBASE-5552 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.92.1, 0.94.0 Attachments: 0.92.0jmx.png, 5552.txt, currentjmxview.png, patchedjmxview.png Fix before we release 0.92.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4542) add filter info to slow query logging
[ https://issues.apache.org/jira/browse/HBASE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226356#comment-13226356 ] stack commented on HBASE-4542: -- +1 on commit (Thanks for taking care of the this Mikhail) add filter info to slow query logging - Key: HBASE-4542 URL: https://issues.apache.org/jira/browse/HBASE-4542 Project: HBase Issue Type: Improvement Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: 0001-jira-HBASE-4542-Add-filter-info-to-slow-query-loggin.patch, Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch, D1263.2.patch, D1539.1.patch Slow query log doesn't report filters in effect. For example: {code} (operationTooSlow): \ {processingtimems:3468,client:10.138.43.206:40035,timeRange: [0,9223372036854775807],\ starttimems:1317772005821,responsesize:42411, \ class:HRegionServer,table:myTable,families:{CF1:ALL]},\ row:6c3b8efa132f0219b7621ed1e5c8c70b,queuetimems:0,\ method:get,totalColumns:1,maxVersions:1,storeLimit:-1} {code} the above would suggest that all columns of myTable:CF1 are being requested for the given row. But in reality there could be filters in effect (such as ColumnPrefixFilter, ColumnRangeFilter, TimestampsFilter() etc.). We should enhance the slow query log to capture report this information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
[ https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226379#comment-13226379 ] stack commented on HBASE-5542: -- I took a look. Patch looks good. I think this kinda work is really important to do -- now while its fresh in everyone's minds whats going on here -- if we're not to let the mess get out of hand. Thanks Scott. Yeah Lars, this is your old stomping ground. Could do w/ a review by you. Unify HRegion.mutateRowsWithLocks() and HRegion.processRow() Key: HBASE-5542 URL: https://issues.apache.org/jira/browse/HBASE-5542 Project: HBase Issue Type: Improvement Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.96.0 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.2.patch mutateRowsWithLocks() does atomic mutations on multiple rows. processRow() does atomic read-modify-writes on a single row. It will be useful to generalize both and have a processRowsWithLocks() that does atomic read-modify-writes on multiple rows. This also helps reduce some redundancy in the codes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5213) hbase master stop does not bring down backup masters
[ https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226385#comment-13226385 ] stack commented on HBASE-5213: -- Thanks G. hbase master stop does not bring down backup masters -- Key: HBASE-5213 URL: https://issues.apache.org/jira/browse/HBASE-5213 Project: HBase Issue Type: Bug Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.92.2 Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch Typing hbase master stop produces the following message: stop Start cluster shutdown; Master signals RegionServer shutdown It seems like backup masters should be considered part of the cluster, but they are not brought down by hbase master stop. stop-hbase.sh does correctly bring down the backup masters. The same behavior is observed when a client app makes use of the client API HBaseAdmin.shutdown() http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown() -- this isn't too surprising since I think hbase master stop just calls this API. It seems like HBASE-1448 address this; perhaps there was a regression? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5520) Support reseek() at RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226414#comment-13226414 ] stack commented on HBASE-5520: -- {code} + if (kv == null) {$ +throw new IllegalArgumentException(Row cannot be null.);$ + }$ {code} Should be kv cannot be null? I don't understand what this means: {code} The kv that is required to seek must be given$ + * explicitly for reseek. Should not be used to seek to a key which may come$ + * before the current position.$ + * Note : Recommended to use knowing how seek can be done on different kvs.$ + * Suggested to seek to row boundaries like start of a row or end of a row.$ + * Seeking to the middle of a row may lead to inconsistencies across stores.$ {code} ... its probably because I'm slow but I'm sure there will be slow fellows following behind me. Is it saying you need to create a KV to do this? How would I know what the next row is in advance? I suppose with the mvcc, you have some hope of isolating this row... but if we are going for row boundaries, this patch looks to be missing some heft. Support reseek() at RegionScanner - Key: HBASE-5520 URL: https://issues.apache.org/jira/browse/HBASE-5520 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch reseek() is not supported currently at the RegionScanner level. We can support the same. This is created following the discussion under HBASE-2038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226477#comment-13226477 ] stack commented on HBASE-4608: -- The TestLRUDictionary test looks like it could be fatter. Looks like you should be able to throw at it a bunch more combinations. And better excercising of new BidirectionalLRUMap type. Better to find the issues here in unit test than Whats the difference between {code} + public static int hashBytes(byte[] bytes, int offset, int length) { {code} and the existing {code} public static int hashCode(final byte [] b, final int length) { {code} They look to do the same thing? We should remove the new one if so. We will have a keycontext when we are deserializing? Hows that work? So we compress at the individual entry level? Why not file at a time? (Sorry if this has been explained earlier) Is this right in the WALReader? {code} +compression = conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false); {code} How does that work if the WAL was written compressed but this flag is false? We break? Shouldn't this instead be keyed off the entries themselves? Should it be a sequence file attribute saying this a compressed file? Do we foresee replication being able to use this facility? Seems like a natural having it ship compressed entries. Good stuff. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226522#comment-13226522 ] stack commented on HBASE-5399: -- That looks ok N? Will I commit? Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226532#comment-13226532 ] stack commented on HBASE-4608: -- bq. The above method allows to start computation at specified offset while existing hashCode() doesn't have this parameter. Should have at least the same name as the other two methods that do same (pity WritableComparator.hashBytes w/ start offset doesn't exist). bq. Looking at SequenceFile.Sorter.cloneFileAttributes(), I don't see a convenient way for doing above. When you create a write on a sequencefile, you can pass metadata: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Metadata.html bq. For HLogKey, can we designate version of -2 for representing compressed HLogKey ? If HLogKey isn't compressed, we write -1. I don't know what this is in response to. What about my other items? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226552#comment-13226552 ] stack commented on HBASE-5548: -- I don't get this comment: {code} +#define the command name in 'where' namespace +#which actually just delegates to the shell instance {code} Instead of {code} +ret = translate_hbase_exceptions(*args) { command(*args) } +return ret {code} .. why not just {code} +return translate_hbase_exceptions(*args) { command(*args) } {code} Don't need this anymore +# Copyright 2010 The Apache Software Foundation I don't see table.help. Is it missing from this patch? Can you include snippet of a session using this new facility in shell? Good on you Jesse Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: ruby_HBASE-5528-v0.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5480) Fixups to MultithreadedTableMapper for Hadoop 0.23.2+
[ https://issues.apache.org/jira/browse/HBASE-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226569#comment-13226569 ] stack commented on HBASE-5480: -- bq. But this should go into 0.94, no? Yes. Fixups to MultithreadedTableMapper for Hadoop 0.23.2+ - Key: HBASE-5480 URL: https://issues.apache.org/jira/browse/HBASE-5480 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-5480.patch There are two issues: - StatusReporter has a new method getProgress() - Mapper and reducer context objects can no longer be directly instantiated. See attached patch. I'm not thrilled with the added reflection but it was the minimally intrusive change. Raised the priority to critical because compilation fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5555) add a pointer to a dns verification utility in hbase book/dns
[ https://issues.apache.org/jira/browse/HBASE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226616#comment-13226616 ] stack commented on HBASE-: -- It should come in as a utilty so you could do ./bin/hbase dnscheck or something. Also, I don't think we need reverse dns to work in 0.92+ so we wouldn't want this tool screaming reverse is needed when app doesn't require it. add a pointer to a dns verification utility in hbase book/dns - Key: HBASE- URL: https://issues.apache.org/jira/browse/HBASE- Project: HBase Issue Type: Improvement Components: documentation Reporter: Sujee Maniyam Assignee: Sujee Maniyam Priority: Minor Fix For: 0.96.0 Attachments: .txt DNS should work correctly in a Hbase cluster. I have a simple DNS checker utility, that verifies DNS on all machines of the cluster. https://github.com/sujee/hadoop-dns-checker add a pointer to the tool in hbase book : http://hbase.apache.org/book.html#dns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5555) add a pointer to a dns verification utility in hbase book/dns
[ https://issues.apache.org/jira/browse/HBASE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226613#comment-13226613 ] stack commented on HBASE-: -- It should come in as a utilty so you could do ./bin/hbase dnscheck or something. Also, I don't think we need reverse dns to work in 0.92+ so we wouldn't want this tool screaming reverse is needed when app doesn't require it. add a pointer to a dns verification utility in hbase book/dns - Key: HBASE- URL: https://issues.apache.org/jira/browse/HBASE- Project: HBase Issue Type: Improvement Components: documentation Reporter: Sujee Maniyam Assignee: Sujee Maniyam Priority: Minor Fix For: 0.96.0 Attachments: .txt DNS should work correctly in a Hbase cluster. I have a simple DNS checker utility, that verifies DNS on all machines of the cluster. https://github.com/sujee/hadoop-dns-checker add a pointer to the tool in hbase book : http://hbase.apache.org/book.html#dns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)
[ https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226617#comment-13226617 ] stack commented on HBASE-5509: -- I added some. MR based copier for copying HFiles (trunk version) -- Key: HBASE-5509 URL: https://issues.apache.org/jira/browse/HBASE-5509 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Lars Hofhansl Fix For: 0.94.0, 0.96.0 Attachments: 5509-v2.txt, 5509.txt This copier is a modification of the distcp tool in HDFS. It does the following: 1. List out all the regions in the HBase cluster for the required table 2. Write the above out to a file 3. Each mapper 3.1 lists all the HFiles for a given region by querying the regionserver 3.2 copies all the HFiles 3.3 outputs success if the copy succeeded, failure otherwise. Failed regions are retried in another loop 4. Mappers are placed on nodes which have maximum locality for a given region to speed up copying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5539) asynchbase PerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226620#comment-13226620 ] stack commented on HBASE-5539: -- So in below {code} +@Override +void testRow(final int i) throws IOException { + final GetRequest get = new GetRequest(TABLE_NAME, getRandomRow(this.rand, this.totalRows)); + get.family(FAMILY_NAME).qualifier(QUALIFIER_NAME); + + client().get(get).addCallback(readCallback).addErrback(errback); +} {code} ...we don't complete the callback (or error callback) are called? asynchbase PerformanceEvaluation Key: HBASE-5539 URL: https://issues.apache.org/jira/browse/HBASE-5539 Project: HBase Issue Type: New Feature Components: performance Reporter: Benoit Sigoure Assignee: Benoit Sigoure Priority: Minor Labels: benchmark Attachments: 0001-asynchbase-PerformanceEvaluation.patch I plugged [asynchbase|https://github.com/stumbleupon/asynchbase] into {{PerformanceEvaluation}}. This enables testing asynchbase from {{PerformanceEvaluation}} and comparing its performance to {{HTable}}. Also asynchbase doesn't come with any benchmark, so it was good that I was able to plug it into {{PerformanceEvaluation}} relatively easily. I am in the processing of collecting results on a dev cluster running 0.92.1 and will publish them once they're ready. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well
[ https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226631#comment-13226631 ] stack commented on HBASE-5292: -- +1 on commit. getsize per-CF metric incorrectly counts compaction related reads as well -- Key: HBASE-5292 URL: https://issues.apache.org/jira/browse/HBASE-5292 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Attachments: 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch, jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch The per-CF getsize metric's intent was to track bytes returned (to HBase clients) per-CF. [Note: We already have metrics to track # of HFileBlock's read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt vs. fsblockreadcnt.] Currently, the getsize metric gets updated for both client initiated Get/Scan operations as well for compaction related reads. The metric is updated in StoreScanner.java:next() when the Scan query matcher returns an INCLUDE* code via a: HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength()); We should not do the above in case of compactions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5528) Change retrying splitting log forever if throws IOException to numbered times, and abort master when retries exhausted
[ https://issues.apache.org/jira/browse/HBASE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226637#comment-13226637 ] stack commented on HBASE-5528: -- This is for 0.90? Change retrying splitting log forever if throws IOException to numbered times, and abort master when retries exhausted --- Key: HBASE-5528 URL: https://issues.apache.org/jira/browse/HBASE-5528 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5528.patch, hbase-5528v2.patch, hbase-5528v3.patch In current log-splitting retry logic, it will retry forever if throws IOException, I think we'd better change it to numbered times, and abort master when retries exhausted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup
[ https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226643#comment-13226643 ] stack commented on HBASE-5209: -- Sorry. I missed that it was resolved. HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup Key: HBASE-5209 URL: https://issues.apache.org/jira/browse/HBASE-5209 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.5, 0.92.0, 0.94.0 Reporter: Aditya Acharya Assignee: David S. Wang Fix For: 0.92.1, 0.94.0 Attachments: 5209.addendum, HBASE_5209_v5.diff I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5525) Truncate and preserve region boundaries option
[ https://issues.apache.org/jira/browse/HBASE-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226649#comment-13226649 ] stack commented on HBASE-5525: -- Disable table, remove all content from fs under the table's dir, then reenable? Truncate and preserve region boundaries option -- Key: HBASE-5525 URL: https://issues.apache.org/jira/browse/HBASE-5525 Project: HBase Issue Type: New Feature Reporter: Jean-Daniel Cryans Fix For: 0.96.0 A tool that would be useful for testing (and maybe in prod too) would be a truncate option to keep the current region boundaries. Right now what you have to do is completely kill the table and recreate it with the correct regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226647#comment-13226647 ] stack commented on HBASE-4608: -- Does v21 fix the bad decompress that you found above testing with PE? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2817) Allow separate HBASE_REGIONSERVER_HEAPSIZE and HBASE_MASTER_HEAPSIZE
[ https://issues.apache.org/jira/browse/HBASE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13226653#comment-13226653 ] stack commented on HBASE-2817: -- @Adrian So, IIRC, doesn't the first -Xmx win so if you pass it in as an *_OPTS, will that give you want you need? Else, want to make a patch w/ what you want in it? Thanks. Allow separate HBASE_REGIONSERVER_HEAPSIZE and HBASE_MASTER_HEAPSIZE Key: HBASE-2817 URL: https://issues.apache.org/jira/browse/HBASE-2817 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.90.0 Reporter: Todd Lipcon Priority: Minor Right now we have a single HBASE_HEAPSIZE configuration. This isn't that great, since the HMaster doesn't really need much ram compared to the region servers. We should allow different java options and heapsize for the different daemon types. Probably worth breaking out THRIFT, REST, AVRO, etc, as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira