[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640748#comment-13640748 ] Karthik Ranganathan commented on HBASE-6874: This has been implemented and checked in into the 0.89-fb branch. Implement prefetching for scanners -- Key: HBASE-6874 URL: https://issues.apache.org/jira/browse/HBASE-6874 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan I did some quick experiments by scanning data that should be completely in memory and found that adding pre-fetching increases the throughput by about 50% from 26MB/s to 39MB/s. The idea is to perform the next in a background thread, and keep the result ready. When the scanner's next comes in, return the pre-computed result and issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562811#comment-13562811 ] Karthik Ranganathan commented on HBASE-6770: Hey Terry, we're not working actively on the trunk port... [~saint@gmail.com] would be able to tell you if some is. If you are interested in trying to port the patch, I can definitely help out with reviews. Allow scanner setCaching to specify size instead of number of rows -- Key: HBASE-6770 URL: https://issues.apache.org/jira/browse/HBASE-6770 Project: HBase Issue Type: Sub-task Components: Client, regionserver Reporter: Karthik Ranganathan Assignee: Chen Jin Currently, we have the following api's to customize the behavior of scans: setCaching() - how many rows to cache on client to speed up scans setBatch() - max columns per row to return per row to prevent a very large response. Ideally, we should be able to specify a memory buffer size because: 1. that would take care of both of these use cases. 2. it does not need any knowledge of the size of the rows or cells, as the final thing we are worried about is the available memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7478) Create a multi-threaded responder
[ https://issues.apache.org/jira/browse/HBASE-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562817#comment-13562817 ] Karthik Ranganathan commented on HBASE-7478: Interesting... I thought the processResponse(..., false) does not write to the channel when there are a lot of writes, only the processResponse(..., true) variant does. So in effect we are only single threaded when we are pumping out a lot of info using multiple connections. Create a multi-threaded responder - Key: HBASE-7478 URL: https://issues.apache.org/jira/browse/HBASE-7478 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Currently, we have multi-threaded readers and handlers, but a single threaded responder which is a bottleneck. ipc.server.reader.count : number of reader threads to read data off the wire ipc.server.handler.count : number of handler threads that process the request We need to have the ability to specify a ipc.server.responder.count to be able to specify the number of responder threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7477) Remove Proxy instance from HBase RPC
[ https://issues.apache.org/jira/browse/HBASE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13551205#comment-13551205 ] Karthik Ranganathan commented on HBASE-7477: [~saint@gmail.com] Totally, feel free to open a new one for trunk. Will definitely check out HBASE-7460. One other change that has happened in the past (which makes this easier) is that we have done away with proxy objects per conf on the client side - it used to be a singleton. Now with this patch, its just a straight up object instance. Remove Proxy instance from HBase RPC Key: HBASE-7477 URL: https://issues.apache.org/jira/browse/HBASE-7477 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Attachments: 7477experiment.txt, HBASE-7477.patch Currently, we use HBaseRPC.getProxy() to get an Invoker object to serialize the RPC parameters. This is pretty inefficient as it uses reflection to lookup the current method name. The aim is to break up the proxy into an actual proxy implementation so that: 1. we can make it more efficient by eliminating reflection 2. can re-write some parts of the protocol to make it even better -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548688#comment-13548688 ] Karthik Ranganathan commented on HBASE-5416: I think the specific description (of making filters apply to only some CF's) is a good idea.But we continue down this path of generalizing filters, it could lead to an explosion of ad-hoc filters. In that case, it might be better to expose more co-processor hooks. Overall, +1 (only skimmed the changes though). Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: Filters, Performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Sergey Shelukhin Fix For: 0.96.0 Attachments: 5416-0.94-v1.txt, 5416-0.94-v2.txt, 5416-Filtered_scans_v6.patch, 5416-v13.patch, 5416-v14.patch, 5416-v15.patch, 5416-v16.patch, 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.1.patch, Filtered_scans_v5.patch, Filtered_scans_v7.patch, HBASE-5416-v10.patch, HBASE-5416-v11.patch, HBASE-5416-v12.patch, HBASE-5416-v12.patch, HBASE-5416-v7-rebased.patch, HBASE-5416-v8.patch, HBASE-5416-v9.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7477) Remove Proxy instance from HBase RPC
[ https://issues.apache.org/jira/browse/HBASE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548711#comment-13548711 ] Karthik Ranganathan commented on HBASE-7477: The pb Service 'fit' is not perfect though – it drags along some other stuff we do not want and it is missing a means of passing extra stuff unless we do some hackery – so reluctant to take it on though it does away with reflection. Couldn't agree more. My thought was that HBase only exposes simple API's like get, put, delete and scan. Each of these in turn takes in 1 object (Get/Put/Delete/Scan), and a couple of filters. The serialization of the latter objects already seems to be versioned. So protobufs might be expensive for just eliminating reflection, but it might help with the automatic versioning for future enhancements. I think you said the same thing here: Given the above, protobuf Service starts to look better. It has kinks but would enforce a strong pattern – and we are most of the way there already with our use of the Service#BlockingInterface. I can do better than just explaining - can put up an initial patch that works for gets only. Will upload it next, but the changes are actually not very invasive. Here is an outline of steps: - Replace the proxy with a HRegionInterfaceSerializerV1. It implements the RPC serialization when the method calls are made. - On the server side, you would have the HRegionInterfaceDeserializerV1 object. You would use the method name to call the right function in this object, which deserializes the params. In the current incarnation, every method would do the same thing (read the params count, param classes, etc). - Change the ser and deser objects to v2, bump up RPC version, substitute the method names for byte codes and make the serialization/deserialization of the params specific to the method that is called. IMO, if you look at my next diff (where I reconstructed the HBase RPC protocol), its pretty verbose and inefficient. It roughly does the following: * Get the class name, method name, num params, param classes by reflection * write the class name (twice most of the time) * write the num params * write the types of each param * then serialize each param This makes coding nice, but hurts at the runtime perf. Remove Proxy instance from HBase RPC Key: HBASE-7477 URL: https://issues.apache.org/jira/browse/HBASE-7477 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Attachments: 7477experiment.txt Currently, we use HBaseRPC.getProxy() to get an Invoker object to serialize the RPC parameters. This is pretty inefficient as it uses reflection to lookup the current method name. The aim is to break up the proxy into an actual proxy implementation so that: 1. we can make it more efficient by eliminating reflection 2. can re-write some parts of the protocol to make it even better -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7477) Remove Proxy instance from HBase RPC
[ https://issues.apache.org/jira/browse/HBASE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-7477: --- Attachment: HBASE-7477.patch In this patch, HRegionInterfaceProxyImpl is the serializer that eliminates the proxy. This eliminates a decent chunk of CPU on the HBase client and entirely shifted the bottleneck to the server side. Was able to push the max get ops/sec to around 196K with this on the client and other server side changes. Will write up about this in detail sometime. Remove Proxy instance from HBase RPC Key: HBASE-7477 URL: https://issues.apache.org/jira/browse/HBASE-7477 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Attachments: 7477experiment.txt, HBASE-7477.patch Currently, we use HBaseRPC.getProxy() to get an Invoker object to serialize the RPC parameters. This is pretty inefficient as it uses reflection to lookup the current method name. The aim is to break up the proxy into an actual proxy implementation so that: 1. we can make it more efficient by eliminating reflection 2. can re-write some parts of the protocol to make it even better -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7477) Remove Proxy instance from HBase RPC
[ https://issues.apache.org/jira/browse/HBASE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548839#comment-13548839 ] Karthik Ranganathan commented on HBASE-7477: Yes was able to get it to 170-185K with HBASE-7100 and HBASE-7163 without this, and the client was the bottleneck. Now, its at 196K and the server seems to be the bottleneck. Remove Proxy instance from HBase RPC Key: HBASE-7477 URL: https://issues.apache.org/jira/browse/HBASE-7477 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Attachments: 7477experiment.txt, HBASE-7477.patch Currently, we use HBaseRPC.getProxy() to get an Invoker object to serialize the RPC parameters. This is pretty inefficient as it uses reflection to lookup the current method name. The aim is to break up the proxy into an actual proxy implementation so that: 1. we can make it more efficient by eliminating reflection 2. can re-write some parts of the protocol to make it even better -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7477) Remove Proxy instance from HBase RPC
Karthik Ranganathan created HBASE-7477: -- Summary: Remove Proxy instance from HBase RPC Key: HBASE-7477 URL: https://issues.apache.org/jira/browse/HBASE-7477 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Currently, we use HBaseRPC.getProxy() to get an Invoker object to serialize the RPC parameters. This is pretty inefficient as it uses reflection to lookup the current method name. The aim is to break up the proxy into an actual proxy implementation so that: 1. we can make it more efficient by eliminating reflection 2. can re-write some parts of the protocol to make it even better -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7478) Create a multi-threaded responder
Karthik Ranganathan created HBASE-7478: -- Summary: Create a multi-threaded responder Key: HBASE-7478 URL: https://issues.apache.org/jira/browse/HBASE-7478 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Currently, we have multi-threaded readers and handlers, but a single threaded responder which is a bottleneck. ipc.server.reader.count : number of reader threads to read data off the wire ipc.server.handler.count : number of handler threads that process the request We need to have the ability to specify a ipc.server.responder.count to be able to specify the number of responder threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7163) Low-hanging perf improvements in HBase client
[ https://issues.apache.org/jira/browse/HBASE-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-7163: --- Summary: Low-hanging perf improvements in HBase client (was: Change cachedRegionsLocations in HConnectionManager from SoftValueSortedMap to ConcurrentSkipListMap) Low-hanging perf improvements in HBase client -- Key: HBASE-7163 URL: https://issues.apache.org/jira/browse/HBASE-7163 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan This change saves 15% CPU on the client side per profiling. In using the ConcurrentSkipListMap, we can do: tableLocations.floorEntry(row).getValue() instead of doing: SortedMapbyte[], HRegionLocation matchingRegions = tableLocations.floorEntry(row).getValue(); if (!matchingRegions.isEmpty()) { HRegionLocation possibleRegion = matchingRegions.get(matchingRegions.lastKey()); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7163) Low-hanging perf improvements in HBase client
[ https://issues.apache.org/jira/browse/HBASE-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-7163: --- Description: 1. Change cachedRegionsLocations in HConnectionManager from SoftValueSortedMap to ConcurrentSkipListMap: This change saves 15% CPU on the client side per profiling. In using the ConcurrentSkipListMap, we can do: tableLocations.floorEntry(row).getValue() instead of doing: SortedMapbyte[], HRegionLocation matchingRegions = tableLocations.floorEntry(row).getValue(); if (!matchingRegions.isEmpty()) { HRegionLocation possibleRegion = matchingRegions.get(matchingRegions.lastKey()); } 2. NetUtils.getDefaultSocketFactory is very inefficient, use was: This change saves 15% CPU on the client side per profiling. In using the ConcurrentSkipListMap, we can do: tableLocations.floorEntry(row).getValue() instead of doing: SortedMapbyte[], HRegionLocation matchingRegions = tableLocations.floorEntry(row).getValue(); if (!matchingRegions.isEmpty()) { HRegionLocation possibleRegion = matchingRegions.get(matchingRegions.lastKey()); } Low-hanging perf improvements in HBase client -- Key: HBASE-7163 URL: https://issues.apache.org/jira/browse/HBASE-7163 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan 1. Change cachedRegionsLocations in HConnectionManager from SoftValueSortedMap to ConcurrentSkipListMap: This change saves 15% CPU on the client side per profiling. In using the ConcurrentSkipListMap, we can do: tableLocations.floorEntry(row).getValue() instead of doing: SortedMapbyte[], HRegionLocation matchingRegions = tableLocations.floorEntry(row).getValue(); if (!matchingRegions.isEmpty()) { HRegionLocation possibleRegion = matchingRegions.get(matchingRegions.lastKey()); } 2. NetUtils.getDefaultSocketFactory is very inefficient, use -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7163) Low-hanging perf improvements in HBase client
[ https://issues.apache.org/jira/browse/HBASE-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13498416#comment-13498416 ] Karthik Ranganathan commented on HBASE-7163: @Ted - yes that's the part, fix has another component to it though. Also, changed this task to add one more perf improvement. Low-hanging perf improvements in HBase client -- Key: HBASE-7163 URL: https://issues.apache.org/jira/browse/HBASE-7163 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan 1. Change cachedRegionsLocations in HConnectionManager from SoftValueSortedMap to ConcurrentSkipListMap: This change saves 15% CPU on the client side per profiling. In using the ConcurrentSkipListMap, we can do: tableLocations.floorEntry(row).getValue() instead of doing: SortedMapbyte[], HRegionLocation matchingRegions = tableLocations.floorEntry(row).getValue(); if (!matchingRegions.isEmpty()) { HRegionLocation possibleRegion = matchingRegions.get(matchingRegions.lastKey()); } 2. NetUtils.getDefaultSocketFactory is very inefficient, use -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7163) Low-hanging perf improvements in HBase client
[ https://issues.apache.org/jira/browse/HBASE-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13498501#comment-13498501 ] Karthik Ranganathan commented on HBASE-7163: Yes, thanks for explicitly mentioning, forgot to mention that point. The thought was that the overhead of caching all regions would not be too large. Low-hanging perf improvements in HBase client -- Key: HBASE-7163 URL: https://issues.apache.org/jira/browse/HBASE-7163 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan 1. Change cachedRegionsLocations in HConnectionManager from SoftValueSortedMap to ConcurrentSkipListMap: This change saves 15% CPU on the client side per profiling. In using the ConcurrentSkipListMap, we can do: tableLocations.floorEntry(row).getValue() instead of doing: SortedMapbyte[], HRegionLocation matchingRegions = tableLocations.floorEntry(row).getValue(); if (!matchingRegions.isEmpty()) { HRegionLocation possibleRegion = matchingRegions.get(matchingRegions.lastKey()); } 2. NetUtils.getDefaultSocketFactory is very inefficient, use -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7163) Change cachedRegionsLocations in HConnectionManager from SoftValueSortedMap to ConcurrentSkipListMap
Karthik Ranganathan created HBASE-7163: -- Summary: Change cachedRegionsLocations in HConnectionManager from SoftValueSortedMap to ConcurrentSkipListMap Key: HBASE-7163 URL: https://issues.apache.org/jira/browse/HBASE-7163 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan This change saves 15% CPU on the client side per profiling. In using the ConcurrentSkipListMap, we can do: tableLocations.floorEntry(row).getValue() instead of doing: SortedMapbyte[], HRegionLocation matchingRegions = tableLocations.floorEntry(row).getValue(); if (!matchingRegions.isEmpty()) { HRegionLocation possibleRegion = matchingRegions.get(matchingRegions.lastKey()); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491695#comment-13491695 ] Karthik Ranganathan commented on HBASE-6874: Lars - the dependency on HBASE-6770 is more to make the code simpler. Currently, the HRegionServer loops over numRows, and the RegionScanner loops over the columns in the various CF's but for one row. HBASE-6770 will move the looping on the numRows into the RegionScanner itself, because we need to track both memory size and number of rows - in order to respect the more restrictive of the two. Once that happens, we can implement prefetching in the RegionScanner itself, instead of spreading the logic in HRegionServer also. So more of a code-simplicity and not having to resolve conflicts thing. Implement prefetching for scanners -- Key: HBASE-6874 URL: https://issues.apache.org/jira/browse/HBASE-6874 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan I did some quick experiments by scanning data that should be completely in memory and found that adding pre-fetching increases the throughput by about 50% from 26MB/s to 39MB/s. The idea is to perform the next in a background thread, and keep the result ready. When the scanner's next comes in, return the pre-computed result and issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7105) RS throws NPE on forcing compaction from HBase shell on a single bulk imported file.
Karthik Ranganathan created HBASE-7105: -- Summary: RS throws NPE on forcing compaction from HBase shell on a single bulk imported file. Key: HBASE-7105 URL: https://issues.apache.org/jira/browse/HBASE-7105 Project: HBase Issue Type: Bug Components: regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan In StoreFile, we have: private AtomicBoolean majorCompaction = null; In StoreFile.open(), we do: b = metadataMap.get(MAJOR_COMPACTION_KEY); if (b != null) { // init majorCompaction variable } Because the file was bulk imported, this is never initialized. Any subsequent call to isMajorCompaction() NPE's. Fix is to initialize it to false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490898#comment-13490898 ] Karthik Ranganathan commented on HBASE-6874: Awesome, then layering in multi-pre-fetch should be very easy! Implement prefetching for scanners -- Key: HBASE-6874 URL: https://issues.apache.org/jira/browse/HBASE-6874 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan I did some quick experiments by scanning data that should be completely in memory and found that adding pre-fetching increases the throughput by about 50% from 26MB/s to 39MB/s. The idea is to perform the next in a background thread, and keep the result ready. When the scanner's next comes in, return the pre-computed result and issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7100) Allow multiple connections from HBaseClient to each remote endpoint
Karthik Ranganathan created HBASE-7100: -- Summary: Allow multiple connections from HBaseClient to each remote endpoint Key: HBASE-7100 URL: https://issues.apache.org/jira/browse/HBASE-7100 Project: HBase Issue Type: Sub-task Components: Client Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Allowing multiple connections gives a *huge* boost while benchmarking performance. In a production setup, many nodes query a single regionserver. But one connection is not enough for a single HBase client to push a single regionserver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
[ https://issues.apache.org/jira/browse/HBASE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488683#comment-13488683 ] Karthik Ranganathan commented on HBASE-6925: Where is the chunking (that JIRA had a lot of stuff to parse)? Right now, in 89-fb, the client's nio send buffers are at 128K, and the input stream that reads from the nio buffer is only 8K. This change is on the server side. I would hypothesize that scan (which return a lot of data) will benefit from this. Change socket write size from 8K to 64K for HBaseServer --- Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Sub-task Components: Performance Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6925.patch Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
[ https://issues.apache.org/jira/browse/HBASE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488687#comment-13488687 ] Karthik Ranganathan commented on HBASE-6925: Also, I was able to get the throughput of a single-threaded scan from a client of a block in the block-cache break 100MB/s (on whatever SKU I am using) - started around the 20MB/s. Will write a blog post about the various changes in detail if interested. I think there is scope to do even better. Prefetching is huge of-course - helps even more in case the block has to be read from disk. Change socket write size from 8K to 64K for HBaseServer --- Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Sub-task Components: Performance Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6925.patch Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488689#comment-13488689 ] Karthik Ranganathan commented on HBASE-6874: Actually did this analysis and enhancement for an online analytics use-case as well (and search indexing), and most of what you say maps one to one. The only difference I guess is that so far we are not heavily relying on server side filtering much, so decided on punting on the prefetching=n case for now (we actually discussed this). Implement prefetching for scanners -- Key: HBASE-6874 URL: https://issues.apache.org/jira/browse/HBASE-6874 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan I did some quick experiments by scanning data that should be completely in memory and found that adding pre-fetching increases the throughput by about 50% from 26MB/s to 39MB/s. The idea is to perform the next in a background thread, and keep the result ready. When the scanner's next comes in, return the pre-computed result and issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
[ https://issues.apache.org/jira/browse/HBASE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488991#comment-13488991 ] Karthik Ranganathan commented on HBASE-6925: No, I dont think that would matter, this is more about the socket transfer size into an underlying buffer. Change socket write size from 8K to 64K for HBaseServer --- Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Sub-task Components: Performance Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6925.patch Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
[ https://issues.apache.org/jira/browse/HBASE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13489042#comment-13489042 ] Karthik Ranganathan commented on HBASE-6925: Go for the commit Lars! Change socket write size from 8K to 64K for HBaseServer --- Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Sub-task Components: Performance Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6925.patch Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
[ https://issues.apache.org/jira/browse/HBASE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488459#comment-13488459 ] Karthik Ranganathan commented on HBASE-6925: Missed Stack's question - this complex change alone gave a 25% best case improvement in scan throughput. Change socket write size from 8K to 64K for HBaseServer --- Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Sub-task Components: Performance Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.94.3, 0.96.0 Attachments: HBASE-6925.patch Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488463#comment-13488463 ] Karthik Ranganathan commented on HBASE-6874: Thought about the N scanners, its a complicated change - you would have to change the entire scan protocol. Each of the next calls in scans are not numbered, and so you could go out of whack if prefetching N (and throw in exceptions). There is also the basic issue right now that scans do retries which is wrong. Also, reasoning about it another way, if your in memory scan throughput is the time to read from disk, you're probably good. I found that there are other unrelated bottlenecks preventing this from being the case. Of course, if the filtering is very heavy then this will breakdown... you probably want to implement prefetching based on the num filtered rows, which should not be too hard. I have a patch I have tested with, but its waiting on HBASE-6770 - that is going to refactor scans quite a bit. Will put a patch out once that is done. Implement prefetching for scanners -- Key: HBASE-6874 URL: https://issues.apache.org/jira/browse/HBASE-6874 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan I did some quick experiments by scanning data that should be completely in memory and found that adding pre-fetching increases the throughput by about 50% from 26MB/s to 39MB/s. The idea is to perform the next in a background thread, and keep the result ready. When the scanner's next comes in, return the pre-computed result and issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7068) Create a Get benchmark
Karthik Ranganathan created HBASE-7068: -- Summary: Create a Get benchmark Key: HBASE-7068 URL: https://issues.apache.org/jira/browse/HBASE-7068 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7067) HBase Get perf improvements
Karthik Ranganathan created HBASE-7067: -- Summary: HBase Get perf improvements Key: HBASE-7067 URL: https://issues.apache.org/jira/browse/HBASE-7067 Project: HBase Issue Type: Umbrella Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Umbrella task for improving Get performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7026) Make metrics collection in StoreScanner.java more efficient
Karthik Ranganathan created HBASE-7026: -- Summary: Make metrics collection in StoreScanner.java more efficient Key: HBASE-7026 URL: https://issues.apache.org/jira/browse/HBASE-7026 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Per the benchmarks I ran, the following block of code seems to be inefficient: StoreScanner.java: public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { // ... // update the counter if (addedResultsSize 0 metric != null) { HRegion.incrNumericMetric(this.metricNamePrefix + metric, addedResultsSize); } // ... Removing this block increased throughput by 10%. We should move this to the outer layer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7029) Result array serialization improvements
Karthik Ranganathan created HBASE-7029: -- Summary: Result array serialization improvements Key: HBASE-7029 URL: https://issues.apache.org/jira/browse/HBASE-7029 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan The Result[] is very inefficiently serialized - there are 2 for loops over each result and we instantiate every object. A better way is to make it a data block, and use delta block encoding to make it more efficient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
[ https://issues.apache.org/jira/browse/HBASE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480344#comment-13480344 ] Karthik Ranganathan commented on HBASE-6925: Yes, this is committed into 0.89.fb already, its a super-trivial change but does improve perf quite a bit. Change socket write size from 8K to 64K for HBaseServer --- Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBASE-6925.patch Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6923) Create scanner benchmark
[ https://issues.apache.org/jira/browse/HBASE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480347#comment-13480347 ] Karthik Ranganathan commented on HBASE-6923: I am still in the process of improving scan performance, so still work in progress (though I have made an initial commit). Am planning on more modifications to this. Create scanner benchmark Key: HBASE-6923 URL: https://issues.apache.org/jira/browse/HBASE-6923 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: TestStorePerformance.java Create a simple program to benchmark performance/throughput of scanners, and print some results at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5783) Faster HBase bulk loader
[ https://issues.apache.org/jira/browse/HBASE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477137#comment-13477137 ] Karthik Ranganathan commented on HBASE-5783: No, we track only the last (highest) one per region. Also, in the actual implementation, we did it with just timestamps from the RS. So, after doing all the puts the loader gets the time on the RS (t1). The server tracks the start time of the last successfully completed flush {t2). Querying that and making sure t2 t1 is enough. Of course - if the region has moved gracefully, thats considered a success too as an optimization. We used the term MR Bulk Loader simply to say that the load of the data should be repeatable in case of failure (as opposed to a online use case). Faster HBase bulk loader Key: HBASE-5783 URL: https://issues.apache.org/jira/browse/HBASE-5783 Project: HBase Issue Type: New Feature Components: Client, IPC/RPC, Performance, regionserver Reporter: Karthik Ranganathan Assignee: Amitanand Aiyer We can get a 3x to 4x gain based on a prototype demonstrating this approach in effect (hackily) over the MR bulk loader for very large data sets by doing the following: 1. Do direct multi-puts from HBase client using GZIP compressed RPC's 2. Turn off WAL (we will ensure no data loss in another way) 3. For each bulk load client, we need to: 3.1 do a put 3.2 get back a tracking cookie (memstoreTs or HLogSequenceId) per put 3.3 be able to ask the RS if the tracking cookie has been flushed to disk 4. For each client, we can succeed it if the tracking cookie for the last put it did (for every RS) makes it to disk. Otherwise the map task fails and is retried. 5. If the last put did not make it to disk for a timeout (say a second or so) we issue a manual flush. Enhancements: - Increase the memstore size so that we flush larger files - Decrease the compaction ratios (say increase the number of files to compact) Quick background: The bottlenecks in the multiput approach are that the data is transferred *uncompressed* twice over the top-of-rack: once from the client to the RS (on the multi put call) and again because of WAL (HDFS replication). We reduced the former with RPC compression and eliminated the latter above while still guaranteeing that data wont be lost. This is better than the MR bulk loader at a high level because we dont need to merge sort all the files for a given region and then make it a HFile - thats the equivalent of bulk loading AND majorcompacting in one shot. Also there is much more disk involved in the MR method (sort/spill). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6980) Parallel Flushing Of Memstores
[ https://issues.apache.org/jira/browse/HBASE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477194#comment-13477194 ] Karthik Ranganathan commented on HBASE-6980: @ramakrishna - this should not be necessary for ensuring no data loss right? Once we have a snapshot memstore, we automatically should know the max seq id to which it has data - that would never change. 1. From what I remember of the code (when I was looking into something unrelated), we track the *min* seq id from the current memstore instead of the max seq id from the snapshot memstore to put into the HLog when its rolled after a flush. So this synchronization becomes necessary - if we store the max seq id along with the memstore that is flushed, we should be able to eliminate the locks. 2. Also, its arguable if we need the absolute correct max-seq-id flushed. In a very small % of cases, we would end up rolling logs a bit slower. As long as we are conservative with updating the max seq id in the HLog we should be good, right? Parallel Flushing Of Memstores -- Key: HBASE-6980 URL: https://issues.apache.org/jira/browse/HBASE-6980 Project: HBase Issue Type: New Feature Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan For write dominated workloads, single threaded memstore flushing is an unnecessary bottleneck. With a single flusher thread, we are basically not setup to take advantage of the aggregate throughput that multi-disk nodes provide. * For puts with WAL enabled, the bottleneck is more likely the single WAL per region server. So this particular fix may not buy as much unless we unlock that bottleneck with multiple commit logs per region server. (Topic for a separate JIRA-- HBASE-6981). * But for puts with WAL disabled (e.g., when using HBASE-5783 style fast bulk imports), we should be able to support much better ingest rates with parallel flushing of memstores. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6619) Do no unregister and re-register interest ops in RPC
[ https://issues.apache.org/jira/browse/HBASE-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469802#comment-13469802 ] Karthik Ranganathan commented on HBASE-6619: No this is fine, its already committed into 89-fb. Do no unregister and re-register interest ops in RPC Key: HBASE-6619 URL: https://issues.apache.org/jira/browse/HBASE-6619 Project: HBase Issue Type: Bug Components: IPC/RPC, Performance Reporter: Karthik Ranganathan Assignee: Michal Gregorczyk Priority: Critical Attachments: 0001-jira-HBASE-6619-89-fb-Do-no-unregister-and-re-regist.patch While investigating perf of HBase, Michal noticed that we could cut about 5-40% (depending on number of threads) from the total get time in the RPC on the server side if we eliminated re-registering for interest ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6923) Create scanner benchmark
Karthik Ranganathan created HBASE-6923: -- Summary: Create scanner benchmark Key: HBASE-6923 URL: https://issues.apache.org/jira/browse/HBASE-6923 Project: HBase Issue Type: Improvement Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Create a simple program to benchmark performance/throughput of scanners, and print some results at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6922) HBase scanner performance improvements
Karthik Ranganathan created HBASE-6922: -- Summary: HBase scanner performance improvements Key: HBASE-6922 URL: https://issues.apache.org/jira/browse/HBASE-6922 Project: HBase Issue Type: Umbrella Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Umbrella task for improving through in HBase scanners. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6923) Create scanner benchmark
[ https://issues.apache.org/jira/browse/HBASE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-6923: --- Issue Type: Sub-task (was: Improvement) Parent: HBASE-6922 Create scanner benchmark Key: HBASE-6923 URL: https://issues.apache.org/jira/browse/HBASE-6923 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Create a simple program to benchmark performance/throughput of scanners, and print some results at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-6874: --- Issue Type: Sub-task (was: Improvement) Parent: HBASE-6922 Implement prefetching for scanners -- Key: HBASE-6874 URL: https://issues.apache.org/jira/browse/HBASE-6874 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan I did some quick experiments by scanning data that should be completely in memory and found that adding pre-fetching increases the throughput by about 50% from 26MB/s to 39MB/s. The idea is to perform the next in a background thread, and keep the result ready. When the scanner's next comes in, return the pre-computed result and issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-6770: --- Issue Type: Sub-task (was: Bug) Parent: HBASE-6922 Allow scanner setCaching to specify size instead of number of rows -- Key: HBASE-6770 URL: https://issues.apache.org/jira/browse/HBASE-6770 Project: HBase Issue Type: Sub-task Components: Client, regionserver Reporter: Karthik Ranganathan Assignee: Michal Gregorczyk Currently, we have the following api's to customize the behavior of scans: setCaching() - how many rows to cache on client to speed up scans setBatch() - max columns per row to return per row to prevent a very large response. Ideally, we should be able to specify a memory buffer size because: 1. that would take care of both of these use cases. 2. it does not need any knowledge of the size of the rows or cells, as the final thing we are worried about is the available memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6066) some low hanging read path improvement ideas
[ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-6066: --- Issue Type: Sub-task (was: Improvement) Parent: HBASE-6922 some low hanging read path improvement ideas - Key: HBASE-6066 URL: https://issues.apache.org/jira/browse/HBASE-6066 Project: HBase Issue Type: Sub-task Components: Performance Reporter: Kannan Muthukkaruppan Assignee: Michal Gregorczyk Priority: Critical Labels: noob Fix For: 0.96.0 Attachments: 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, metric-stringbuilder-fix.patch I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations... We seem to be doing several wasteful iterations over and/or building of temporary lists. 1) One such is the following code in HRegionServer.next(): {code} boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE); if (!values.isEmpty()) { for (KeyValue kv : values) { -- wasteful in most cases currentScanResultSize += kv.heapSize(); } results.add(new Result(values)); {code} By default the maxScannerResultSize is Long.MAX_VALUE. In those cases, we can avoid the unnecessary iteration to compute currentScanResultSize. 2) An example of a wasteful temporary array, is results in RegionScanner.next(). {code} results.clear(); boolean returnResult = nextInternal(limit, metric); outResults.addAll(results); {code} results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults. 3) Another almost similar exmaple of a wasteful array is results in StoreScanner.next(), which eventually also copies its results into outResults. 4) Reduce overhead of size metric maintained in StoreScanner.next(). {code} if (metric != null) { HRegion.incrNumericMetric(this.metricNamePrefix + metric, copyKv.getLength()); } results.add(copyKv); {code} A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV. 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6923) Create scanner benchmark
[ https://issues.apache.org/jira/browse/HBASE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468039#comment-13468039 ] Karthik Ranganathan commented on HBASE-6923: Hey Todd, nice! I too have written a benchmark with interesting results. Would be interesting to compare :) Create scanner benchmark Key: HBASE-6923 URL: https://issues.apache.org/jira/browse/HBASE-6923 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: TestStorePerformance.java Create a simple program to benchmark performance/throughput of scanners, and print some results at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
Karthik Ranganathan created HBASE-6925: -- Summary: Change socket write size from 8K to 64K for HBaseServer Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Improvement Reporter: Karthik Ranganathan Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
[ https://issues.apache.org/jira/browse/HBASE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-6925: --- Issue Type: Sub-task (was: Improvement) Parent: HBASE-6922 Change socket write size from 8K to 64K for HBaseServer --- Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Sub-task Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6925) Change socket write size from 8K to 64K for HBaseServer
[ https://issues.apache.org/jira/browse/HBASE-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan reassigned HBASE-6925: -- Assignee: Karthik Ranganathan Change socket write size from 8K to 64K for HBaseServer --- Key: HBASE-6925 URL: https://issues.apache.org/jira/browse/HBASE-6925 Project: HBase Issue Type: Improvement Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Creating a JIRA for this, but the change is trivial: change NIO_BUFFER_LIMIT from 8K to 64K in HBaseServer. This seems to increase scan throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan reassigned HBASE-6770: -- Assignee: Chen Jin (was: Michal Gregorczyk) Allow scanner setCaching to specify size instead of number of rows -- Key: HBASE-6770 URL: https://issues.apache.org/jira/browse/HBASE-6770 Project: HBase Issue Type: Sub-task Components: Client, regionserver Reporter: Karthik Ranganathan Assignee: Chen Jin Currently, we have the following api's to customize the behavior of scans: setCaching() - how many rows to cache on client to speed up scans setBatch() - max columns per row to return per row to prevent a very large response. Ideally, we should be able to specify a memory buffer size because: 1. that would take care of both of these use cases. 2. it does not need any knowledge of the size of the rows or cells, as the final thing we are worried about is the available memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6874) Implement prefetching for scanners
Karthik Ranganathan created HBASE-6874: -- Summary: Implement prefetching for scanners Key: HBASE-6874 URL: https://issues.apache.org/jira/browse/HBASE-6874 Project: HBase Issue Type: Improvement Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan I did some quick experiments by scanning data that should be completely in memory and found that adding pre-fetching increases the throughput by about 50% from 26MB/s to 39MB/s. The idea is to perform the next in a background thread, and keep the result ready. When the scanner's next comes in, return the pre-computed result and issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457823#comment-13457823 ] Karthik Ranganathan commented on HBASE-6770: Yes, good estimate is the intention. Across different use-cases (or sometimes different column families in the same table), the kv sizes are so different it gets hard to come up with good estimates that would not OOM the client in all cases. Allow scanner setCaching to specify size instead of number of rows -- Key: HBASE-6770 URL: https://issues.apache.org/jira/browse/HBASE-6770 Project: HBase Issue Type: Bug Components: client, regionserver Reporter: Karthik Ranganathan Assignee: Michal Gregorczyk Currently, we have the following api's to customize the behavior of scans: setCaching() - how many rows to cache on client to speed up scans setBatch() - max columns per row to return per row to prevent a very large response. Ideally, we should be able to specify a memory buffer size because: 1. that would take care of both of these use cases. 2. it does not need any knowledge of the size of the rows or cells, as the final thing we are worried about is the available memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5783) Faster HBase bulk loader
[ https://issues.apache.org/jira/browse/HBASE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-5783: --- Assignee: Amitanand Aiyer (was: Nicolas Spiegelberg) Faster HBase bulk loader Key: HBASE-5783 URL: https://issues.apache.org/jira/browse/HBASE-5783 Project: HBase Issue Type: New Feature Components: client, ipc, performance, regionserver Reporter: Karthik Ranganathan Assignee: Amitanand Aiyer We can get a 3x to 4x gain based on a prototype demonstrating this approach in effect (hackily) over the MR bulk loader for very large data sets by doing the following: 1. Do direct multi-puts from HBase client using GZIP compressed RPC's 2. Turn off WAL (we will ensure no data loss in another way) 3. For each bulk load client, we need to: 3.1 do a put 3.2 get back a tracking cookie (memstoreTs or HLogSequenceId) per put 3.3 be able to ask the RS if the tracking cookie has been flushed to disk 4. For each client, we can succeed it if the tracking cookie for the last put it did (for every RS) makes it to disk. Otherwise the map task fails and is retried. 5. If the last put did not make it to disk for a timeout (say a second or so) we issue a manual flush. Enhancements: - Increase the memstore size so that we flush larger files - Decrease the compaction ratios (say increase the number of files to compact) Quick background: The bottlenecks in the multiput approach are that the data is transferred *uncompressed* twice over the top-of-rack: once from the client to the RS (on the multi put call) and again because of WAL (HDFS replication). We reduced the former with RPC compression and eliminated the latter above while still guaranteeing that data wont be lost. This is better than the MR bulk loader at a high level because we dont need to merge sort all the files for a given region and then make it a HFile - thats the equivalent of bulk loading AND majorcompacting in one shot. Also there is much more disk involved in the MR method (sort/spill). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457179#comment-13457179 ] Karthik Ranganathan commented on HBASE-6770: Agreed. If that's the only issue, then passing a hint makes it easier to use - do something like setPartialRowScanning(true) if we want to respect that. But in any case, I am not suggesting removing the existing API, just adding the new ones. Allow scanner setCaching to specify size instead of number of rows -- Key: HBASE-6770 URL: https://issues.apache.org/jira/browse/HBASE-6770 Project: HBase Issue Type: Bug Components: client, regionserver Reporter: Karthik Ranganathan Currently, we have the following api's to customize the behavior of scans: setCaching() - how many rows to cache on client to speed up scans setBatch() - max columns per row to return per row to prevent a very large response. Ideally, we should be able to specify a memory buffer size because: 1. that would take care of both of these use cases. 2. it does not need any knowledge of the size of the rows or cells, as the final thing we are worried about is the available memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-6770: --- Assignee: Michal Gregorczyk Allow scanner setCaching to specify size instead of number of rows -- Key: HBASE-6770 URL: https://issues.apache.org/jira/browse/HBASE-6770 Project: HBase Issue Type: Bug Components: client, regionserver Reporter: Karthik Ranganathan Assignee: Michal Gregorczyk Currently, we have the following api's to customize the behavior of scans: setCaching() - how many rows to cache on client to speed up scans setBatch() - max columns per row to return per row to prevent a very large response. Ideally, we should be able to specify a memory buffer size because: 1. that would take care of both of these use cases. 2. it does not need any knowledge of the size of the rows or cells, as the final thing we are worried about is the available memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows
Karthik Ranganathan created HBASE-6770: -- Summary: Allow scanner setCaching to specify size instead of number of rows Key: HBASE-6770 URL: https://issues.apache.org/jira/browse/HBASE-6770 Project: HBase Issue Type: Bug Components: client, regionserver Reporter: Karthik Ranganathan Currently, we have the following api's to customize the behavior of scans: setCaching() - how many rows to cache on client to speed up scans setBatch() - max columns per row to return per row to prevent a very large response. Ideally, we should be able to specify a memory buffer size because: 1. that would take care of both of these use cases. 2. it does not need any knowledge of the size of the rows or cells, as the final thing we are worried about is the available memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6619) Do no unregister and re-register interest ops in RPC
Karthik Ranganathan created HBASE-6619: -- Summary: Do no unregister and re-register interest ops in RPC Key: HBASE-6619 URL: https://issues.apache.org/jira/browse/HBASE-6619 Project: HBase Issue Type: Bug Components: ipc Reporter: Karthik Ranganathan Assignee: Michal Gregorczyk While investigating perf of HBase, Michal noticed that we could cut about 5-40% (depending on number of threads) from the total get time in the RPC on the server side if we eliminated re-registering for interest ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6583) Enhance Hbase load test tool to automatically create cf's if not present
Karthik Ranganathan created HBASE-6583: -- Summary: Enhance Hbase load test tool to automatically create cf's if not present Key: HBASE-6583 URL: https://issues.apache.org/jira/browse/HBASE-6583 Project: HBase Issue Type: Bug Components: test Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan The load test tool currently disables the table and applies any changes to the cf descriptor if any, but does not create the cf if not present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6583) Enhance Hbase load test tool to automatically create cf's if not present
[ https://issues.apache.org/jira/browse/HBASE-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-6583: --- Assignee: (was: Karthik Ranganathan) Enhance Hbase load test tool to automatically create cf's if not present Key: HBASE-6583 URL: https://issues.apache.org/jira/browse/HBASE-6583 Project: HBase Issue Type: Bug Components: test Reporter: Karthik Ranganathan The load test tool currently disables the table and applies any changes to the cf descriptor if any, but does not create the cf if not present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6578) Make HDFS block size configurable for HBase WAL
Karthik Ranganathan created HBASE-6578: -- Summary: Make HDFS block size configurable for HBase WAL Key: HBASE-6578 URL: https://issues.apache.org/jira/browse/HBASE-6578 Project: HBase Issue Type: Bug Components: regionserver Reporter: Karthik Ranganathan Right now, because sync-on-block-close is enabled, HLog causes the disk to stall out on large writes (esp when we cross block boundary). We currently use 256MB blocks. The idea is that if we use smaller block sizes, we should be able to spray the data across more disks (because of round robin scheduling) and this would cause more uniform disk usage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6578) Make HDFS block size configurable for HBase WAL
[ https://issues.apache.org/jira/browse/HBASE-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan reassigned HBASE-6578: -- Assignee: Li Pi Make HDFS block size configurable for HBase WAL --- Key: HBASE-6578 URL: https://issues.apache.org/jira/browse/HBASE-6578 Project: HBase Issue Type: Bug Components: regionserver Reporter: Karthik Ranganathan Assignee: Li Pi Right now, because sync-on-block-close is enabled, HLog causes the disk to stall out on large writes (esp when we cross block boundary). We currently use 256MB blocks. The idea is that if we use smaller block sizes, we should be able to spray the data across more disks (because of round robin scheduling) and this would cause more uniform disk usage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6486) Enhance load test to print throughput measurements
Karthik Ranganathan created HBASE-6486: -- Summary: Enhance load test to print throughput measurements Key: HBASE-6486 URL: https://issues.apache.org/jira/browse/HBASE-6486 Project: HBase Issue Type: Bug Reporter: Karthik Ranganathan Assignee: Aurick Qiao Idea is to know how many MB/sec of throughput we are able to get by writing into HBase using a simple tool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6423) Writes should not block reads on blocking updates to memstores
Karthik Ranganathan created HBASE-6423: -- Summary: Writes should not block reads on blocking updates to memstores Key: HBASE-6423 URL: https://issues.apache.org/jira/browse/HBASE-6423 Project: HBase Issue Type: Bug Reporter: Karthik Ranganathan Assignee: Amitanand Aiyer We have a big data use case where we turn off WAL and have a ton of reads and writes. We found that: 1. flushing a memstore takes a while (GZIP compression) 2. incoming writes cause the new memstore to grow in an unbounded fashion 3. this triggers blocking memstore updates 4. in turn, this causes all the RPC handler threads to block on writes to that memstore 5. we are not able to read during this time as RPC handlers are blocked At a higher level, we should not hold up the RPC threads while blocking updates, and we should build in some sort of rate control. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6066) some low hanging read path improvement ideas
[ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan reassigned HBASE-6066: -- Assignee: Michal Gregorczyk (was: Aurick Qiao) some low hanging read path improvement ideas - Key: HBASE-6066 URL: https://issues.apache.org/jira/browse/HBASE-6066 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Michal Gregorczyk Priority: Critical Labels: noob Attachments: metric-stringbuilder-fix.patch I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations... We seem to be doing several wasteful iterations over and/or building of temporary lists. 1) One such is the following code in HRegionServer.next(): {code} boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE); if (!values.isEmpty()) { for (KeyValue kv : values) { -- wasteful in most cases currentScanResultSize += kv.heapSize(); } results.add(new Result(values)); {code} By default the maxScannerResultSize is Long.MAX_VALUE. In those cases, we can avoid the unnecessary iteration to compute currentScanResultSize. 2) An example of a wasteful temporary array, is results in RegionScanner.next(). {code} results.clear(); boolean returnResult = nextInternal(limit, metric); outResults.addAll(results); {code} results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults. 3) Another almost similar exmaple of a wasteful array is results in StoreScanner.next(), which eventually also copies its results into outResults. 4) Reduce overhead of size metric maintained in StoreScanner.next(). {code} if (metric != null) { HRegion.incrNumericMetric(this.metricNamePrefix + metric, copyKv.getLength()); } results.add(copyKv); {code} A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV. 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6360) Thrift proxy does not emit runtime metrics
Karthik Ranganathan created HBASE-6360: -- Summary: Thrift proxy does not emit runtime metrics Key: HBASE-6360 URL: https://issues.apache.org/jira/browse/HBASE-6360 Project: HBase Issue Type: Bug Components: thrift Reporter: Karthik Ranganathan Assignee: Michal Gregorczyk Open jconsole against a thrift proxy, and you will not find the rumtime stats that it should be exporting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)
[ https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397163#comment-13397163 ] Karthik Ranganathan commented on HBASE-5509: I know :) but I dont get the reason though. Going to put in a couple of comments more, but if its a no go - then oh well. MR based copier for copying HFiles (trunk version) -- Key: HBASE-5509 URL: https://issues.apache.org/jira/browse/HBASE-5509 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Lars Hofhansl Attachments: 5509-v2.txt, 5509.txt This copier is a modification of the distcp tool in HDFS. It does the following: 1. List out all the regions in the HBase cluster for the required table 2. Write the above out to a file 3. Each mapper 3.1 lists all the HFiles for a given region by querying the regionserver 3.2 copies all the HFiles 3.3 outputs success if the copy succeeded, failure otherwise. Failed regions are retried in another loop 4. Mappers are placed on nodes which have maximum locality for a given region to speed up copying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)
[ https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396510#comment-13396510 ] Karthik Ranganathan commented on HBASE-5509: @Lars - I ripped out some code which used the hardlinking - we have implemented it internally. I believe we are planning on opensourcing this, otherwise you'd have to wait for native hardlinks. The current copy approach still works though for a few tens of TB's. MR based copier for copying HFiles (trunk version) -- Key: HBASE-5509 URL: https://issues.apache.org/jira/browse/HBASE-5509 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Lars Hofhansl Attachments: 5509-v2.txt, 5509.txt This copier is a modification of the distcp tool in HDFS. It does the following: 1. List out all the regions in the HBase cluster for the required table 2. Write the above out to a file 3. Each mapper 3.1 lists all the HFiles for a given region by querying the regionserver 3.2 copies all the HFiles 3.3 outputs success if the copy succeeded, failure otherwise. Failed regions are retried in another loop 4. Mappers are placed on nodes which have maximum locality for a given region to speed up copying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4667) Importer for exported tables
[ https://issues.apache.org/jira/browse/HBASE-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan resolved HBASE-4667. Resolution: Duplicate Assignee: Karthik Ranganathan This is already covered by HBASE-5509 (trunk version) and HBASE-4663 (89-fb version) Importer for exported tables Key: HBASE-4667 URL: https://issues.apache.org/jira/browse/HBASE-4667 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Once HBase tables are backed up to a well known location, we need to be able to import them. A few flavors need to be supported here: 1. Running cluster or a cluster that is not up and running 2. Same tablename or a different one -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4655) Document architecture of backups
[ https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286809#comment-13286809 ] Karthik Ranganathan commented on HBASE-4655: I think we should add this doc to the HBase book. The code parts of this HBase backups feature is already done. I think the next step is to implement a simple wrapper script, and document that as well. The tasks are already created, see HBASE-4618 for a list of sub-tasks (tasks 1, 2, 4 and 6 are done, 4 needs to be checked in and closed out). The next one to look at would be HBASE-4664. Let me add some comments in there about what we came up with internally, and then we can go ahead from there. Document architecture of backups Key: HBASE-4655 URL: https://issues.apache.org/jira/browse/HBASE-4655 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBase Backups Architecture v2.docx, HBase Backups Architecture.docx Basic idea behind the backup architecture for HBase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4655) Document architecture of backups
[ https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281684#comment-13281684 ] Karthik Ranganathan commented on HBASE-4655: Marking as resolved, feel free to send more comments my way in case something is not clear. Document architecture of backups Key: HBASE-4655 URL: https://issues.apache.org/jira/browse/HBASE-4655 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBase Backups Architecture v2.docx, HBase Backups Architecture.docx Basic idea behind the backup architecture for HBase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4655) Document architecture of backups
[ https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan resolved HBASE-4655. Resolution: Fixed Document architecture of backups Key: HBASE-4655 URL: https://issues.apache.org/jira/browse/HBASE-4655 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBase Backups Architecture v2.docx, HBase Backups Architecture.docx Basic idea behind the backup architecture for HBase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4663) MR based copier for copying HFiles
[ https://issues.apache.org/jira/browse/HBASE-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281685#comment-13281685 ] Karthik Ranganathan commented on HBASE-4663: See https://reviews.facebook.net/D1965 for the diff. Also, see HBASE-5509 for the trunk version. MR based copier for copying HFiles -- Key: HBASE-4663 URL: https://issues.apache.org/jira/browse/HBASE-4663 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan This copier is a modification of the distcp tool in HDFS. It does the following: 1. List out all the regions in the HBase cluster for the required table 2. Write the above out to a file 3. Each mapper 3.1 lists all the HFiles for a given region by querying the regionserver 3.2 copies all the HFiles 3.3 outputs success if the copy succeeded, failure otherwise. Failed regions are retried in another loop 4. Mappers are placed on nodes which have maximum locality for a given region to speed up copying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113847#comment-13113847 ] Karthik Ranganathan commented on HBASE-4463: @Stack - we can find the exact amount of data we are writing to the dfs (only hfile blocks will contribute to this during compactions). So adding a threshold like this is not too hard... but there could be disk iops pressure (instead of network bandwidth) and detecting that would be hard. So we would still need to set off-peak time. I was trying to come up with a more generic solution but that involves setting up a feedback loop inside the regionserver - keep track of max, min and average latencies over the last k days (would have to store this in META or some other location as it needs to persist beyond restarts). Need to remove any spikes in the values. When we run an aggressive compaction, we need to make sure the latencies are still acceptable, otherwise dont run aggressive compactions. This is much harder to get right though. Run more aggressive compactions during off peak hours - Key: HBASE-4463 URL: https://issues.apache.org/jira/browse/HBASE-4463 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.2 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4463) Run more aggressive compactions during off peak hours
Run more aggressive compactions during off peak hours - Key: HBASE-4463 URL: https://issues.apache.org/jira/browse/HBASE-4463 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours
[ https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113017#comment-13113017 ] Karthik Ranganathan commented on HBASE-4463: Initially we are going to specify a start and stop for off peak hours... a more automatic detection based on response latencies and data read/transferred could be done, but is much harder to get right. Run more aggressive compactions during off peak hours - Key: HBASE-4463 URL: https://issues.apache.org/jira/browse/HBASE-4463 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan The number of iops on the disk and the top of the rack bandwidth utilization at off peak hours is much lower than at peak hours depending on the application usage pattern. We can utilize this knowledge to improve the performance of the HBase cluster by increasing the compact selection ratio to a much larger value during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the average number of files per store. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3375) Move away from jruby; build our shell elsewise either on another foundation or build up our own
[ https://issues.apache.org/jira/browse/HBASE-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974406#action_12974406 ] Karthik Ranganathan commented on HBASE-3375: Awesome discussion. I think the only way to make scripting for HBase take off is to allow scripting in any language. Language lock-in for scripting takes away the real advantage of scripting - all the time is spent in looking up the syntax (unless the person writing is committed to learning the language). So in that sense, REST + JSON is awesome. On a tangential note, REST+JSON also allows us to easily write HBase clients (that have ZK integration) in languages other than Java (aka C++). This would allow efficiently interacting with HBase from non Java services. If we are agreed on the REST+JSON approach - now its only a matter of how to write the shell the fastest in any language. I am not familiar with where the REST gateway stands today, and how much work it is to move all the structures to JSON. If these are easy to get out the door, then we should only think about the fastest way to write the shell. Move away from jruby; build our shell elsewise either on another foundation or build up our own --- Key: HBASE-3375 URL: https://issues.apache.org/jira/browse/HBASE-3375 Project: HBase Issue Type: Task Components: shell Reporter: stack Fix For: 0.92.0 JRuby has been sullied; its been shipping *GPL jars with a while now. A hack up to remove these jars is being done elsewhere (HBASE-3374). This issue is about casting our shell anew atop a foundation that is other than JRuby or writing a shell of our own from scratch. JRuby has gotten us this far. It provides a shell and it also was used scripting HBase. It would be nice if we could get scripting and shell in the redo. Apart from the licensing issue above and that the fix will be reverting our JRuby to a version that is no longer supported and that is old, other reasons to move off JRuby are that while its nice having ruby to hand when scripting, the JRuby complete jar is 10 or more MB in size. Its bloated at least from our small shell perspective. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3375) Move away from jruby; build our shell elsewise either on another foundation or build up our own
[ https://issues.apache.org/jira/browse/HBASE-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973922#action_12973922 ] Karthik Ranganathan commented on HBASE-3375: Hey guys, like Jonathan said, I think ANTLR would be good... and once the framework is set, the changes are relatively easy to get in. Also, the core grammar does not change that much - usually only enhancements to some commands here and there. Also if the META entries are all JSON (as we are increasingly moving towards) and we are able to expose REST API's for most of the operations, then building a shell in any language/framework will become trivial. Move away from jruby; build our shell elsewise either on another foundation or build up our own --- Key: HBASE-3375 URL: https://issues.apache.org/jira/browse/HBASE-3375 Project: HBase Issue Type: Task Components: shell Reporter: stack Fix For: 0.92.0 JRuby has been sullied; its been shipping *GPL jars with a while now. A hack up to remove these jars is being done elsewhere (HBASE-3374). This issue is about casting our shell anew atop a foundation that is other than JRuby or writing a shell of our own from scratch. JRuby has gotten us this far. It provides a shell and it also was used scripting HBase. It would be nice if we could get scripting and shell in the redo. Apart from the licensing issue above and that the fix will be reverting our JRuby to a version that is no longer supported and that is old, other reasons to move off JRuby are that while its nice having ruby to hand when scripting, the JRuby complete jar is 10 or more MB in size. Its bloated at least from our small shell perspective. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3329) HLog splitting after RS/cluster death should directly create HFiles
[ https://issues.apache.org/jira/browse/HBASE-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970288#action_12970288 ] Karthik Ranganathan commented on HBASE-3329: Yes, true. This is much more useful in the distributed log splitting context. My fault - forgot to add that... HLog splitting after RS/cluster death should directly create HFiles --- Key: HBASE-3329 URL: https://issues.apache.org/jira/browse/HBASE-3329 Project: HBase Issue Type: Bug Components: regionserver Reporter: Karthik Ranganathan After a RS dies or the cluster goes down and we are recovering, we first split HLogs into the logs for the regions. Then the region servers that host the regions replay the logs and open the regions. This can be made more efficient by directly creating HFiles from the HLogs (instead of producing a split HLogs file). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3329) HLog splitting after RS/cluster death should directly create HFiles
[ https://issues.apache.org/jira/browse/HBASE-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970325#action_12970325 ] Karthik Ranganathan commented on HBASE-3329: @Ryan - didnt get that... At a higher level, I was thinking that the current steps are: 1. Open and read hlogs 2. Split them and create edits per region files 3. RS that opens the regions reads the split edits files and then dumps them into hfiles I was thinking we could change this sequence to something like: 1. Open hlogs 2. Create hfiles for the regions And that would give us a big gain in not writing and reading the HLogs once each. HLog splitting after RS/cluster death should directly create HFiles --- Key: HBASE-3329 URL: https://issues.apache.org/jira/browse/HBASE-3329 Project: HBase Issue Type: Bug Components: regionserver Reporter: Karthik Ranganathan After a RS dies or the cluster goes down and we are recovering, we first split HLogs into the logs for the regions. Then the region servers that host the regions replay the logs and open the regions. This can be made more efficient by directly creating HFiles from the HLogs (instead of producing a split HLogs file). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3150) Allow some column to not write WALs
[ https://issues.apache.org/jira/browse/HBASE-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970339#action_12970339 ] Karthik Ranganathan commented on HBASE-3150: Yes, a co-processors based implementation would totally work. Allow some column to not write WALs --- Key: HBASE-3150 URL: https://issues.apache.org/jira/browse/HBASE-3150 Project: HBase Issue Type: Improvement Reporter: Karthik Ranganathan Priority: Minor We have this unique requirement where some column families hold data that is indexed from other existing column families. The index data is very large, and we end up writing these inserts into the WAL and then into the store files. In addition to taking more iops, this also slows down splitting files for recovery, etc. Creating this task to have an option to suppress WAL logging on a per CF basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3325) Optimize log splitter to not output obsolete edits
[ https://issues.apache.org/jira/browse/HBASE-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969863#action_12969863 ] Karthik Ranganathan commented on HBASE-3325: Yes +1 indeed!! Optimize log splitter to not output obsolete edits -- Key: HBASE-3325 URL: https://issues.apache.org/jira/browse/HBASE-3325 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Currently when the master splits logs, it outputs all edits it finds, even those that have already been obsoleted by flushes. At replay time on the RS we discard the edits that have already been flushed. We could do a pretty simple optimization here - basically the RS should replicate a map region id - last flushed seq id into ZooKeeper (this can be asynchronous by some seconds without any problems). Then when doing log splitting, if we have this map available, we can discard any edits found in the logs that were already flushed, and thus output a much smaller amount of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3327) For increment workloads, retain memstores in memory after flushing them
For increment workloads, retain memstores in memory after flushing them --- Key: HBASE-3327 URL: https://issues.apache.org/jira/browse/HBASE-3327 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan This is an improvement based on our observation of what happens in an increment workload. The working set is typically small and is contained in the memstores. 1. The reason the memstores get flushed is because the number of wal logs limit gets hit. 2. This in turn triggers compactions, which evicts the block cache. 3. Flushing of memstore and eviction of the block cache causes disk reads for increments coming in after this because the data is no longer in memory. We could solve this elegantly by retaining the memstores AFTER they are flushed into files. This would mean we can quickly populate the new memstore with the working set of data from memory itself without having to hit disk. We can throttle the number of such memstores we retain, or the memory allocated to it. In fact, allocating a percentage of the block cache to this would give us a huge boost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3327) For increment workloads, retain memstores in memory after flushing them
[ https://issues.apache.org/jira/browse/HBASE-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969892#action_12969892 ] Karthik Ranganathan commented on HBASE-3327: True - I mentioned HLog limit because we observed it because of that, but this would address the underlying issue for any of the reasons to flush. Additionally, this also makes it resilient in the face of compactions, which HLog compactions would not help with. HLog compactions would also be most effective for the ICV kind of workload (frequent updates to existing data) right? For increment workloads, retain memstores in memory after flushing them --- Key: HBASE-3327 URL: https://issues.apache.org/jira/browse/HBASE-3327 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan This is an improvement based on our observation of what happens in an increment workload. The working set is typically small and is contained in the memstores. 1. The reason the memstores get flushed is because the number of wal logs limit gets hit. 2. This in turn triggers compactions, which evicts the block cache. 3. Flushing of memstore and eviction of the block cache causes disk reads for increments coming in after this because the data is no longer in memory. We could solve this elegantly by retaining the memstores AFTER they are flushed into files. This would mean we can quickly populate the new memstore with the working set of data from memory itself without having to hit disk. We can throttle the number of such memstores we retain, or the memory allocated to it. In fact, allocating a percentage of the block cache to this would give us a huge boost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3327) For increment workloads, retain memstores in memory after flushing them
[ https://issues.apache.org/jira/browse/HBASE-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970047#action_12970047 ] Karthik Ranganathan commented on HBASE-3327: Ryan: was talking to Kannan as well about this. The only thing the writing into block cache on flushes works for flushes. But for compactions, it gets a bit complicated - and any algorithm will become a little dependent on the compaction policy. For increment workloads, retain memstores in memory after flushing them --- Key: HBASE-3327 URL: https://issues.apache.org/jira/browse/HBASE-3327 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan This is an improvement based on our observation of what happens in an increment workload. The working set is typically small and is contained in the memstores. 1. The reason the memstores get flushed is because the number of wal logs limit gets hit. 2. This in turn triggers compactions, which evicts the block cache. 3. Flushing of memstore and eviction of the block cache causes disk reads for increments coming in after this because the data is no longer in memory. We could solve this elegantly by retaining the memstores AFTER they are flushed into files. This would mean we can quickly populate the new memstore with the working set of data from memory itself without having to hit disk. We can throttle the number of such memstores we retain, or the memory allocated to it. In fact, allocating a percentage of the block cache to this would give us a huge boost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3329) HLog splitting after RS/cluster death should directly create HFiles
HLog splitting after RS/cluster death should directly create HFiles --- Key: HBASE-3329 URL: https://issues.apache.org/jira/browse/HBASE-3329 Project: HBase Issue Type: Bug Components: regionserver Reporter: Karthik Ranganathan After a RS dies or the cluster goes down and we are recovering, we first split HLogs into the logs for the regions. Then the region servers that host the regions replay the logs and open the regions. This can be made more efficient by directly creating HFiles from the HLogs (instead of producing a split HLogs file). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3156) Special case distributed log splitting on fresh cluster startup
Special case distributed log splitting on fresh cluster startup --- Key: HBASE-3156 URL: https://issues.apache.org/jira/browse/HBASE-3156 Project: HBase Issue Type: New Feature Reporter: Karthik Ranganathan If the entire HBase goes down (not a graceful stop - example namenode dies) then on a subsequent restart, the HMaster can hand off the hlog splitting to the respective region servers. This would parallelize the log splitting and maintain region server hfile locality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-3156) Special case distributed log splitting on fresh cluster startup
[ https://issues.apache.org/jira/browse/HBASE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan reassigned HBASE-3156: -- Assignee: Karthik Ranganathan Special case distributed log splitting on fresh cluster startup --- Key: HBASE-3156 URL: https://issues.apache.org/jira/browse/HBASE-3156 Project: HBase Issue Type: New Feature Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan If the entire HBase goes down (not a graceful stop - example namenode dies) then on a subsequent restart, the HMaster can hand off the hlog splitting to the respective region servers. This would parallelize the log splitting and maintain region server hfile locality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3149) Make flush decisions per column family
[ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925090#action_12925090 ] Karthik Ranganathan commented on HBASE-3149: Yes, agreed that the memory implication is different. Eventually, is it not better to enforce the memory limit by using a combination of flush sizes and restricting the number of regions we create? Because ideally we should allow different flush sizes for the different CF's as the KV sizes could be way different... Shall I just make this an option in the conf for now with the default the way it is? Make flush decisions per column family -- Key: HBASE-3149 URL: https://issues.apache.org/jira/browse/HBASE-3149 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Today, the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3156) Special case distributed log splitting on fresh cluster startup
[ https://issues.apache.org/jira/browse/HBASE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925150#action_12925150 ] Karthik Ranganathan commented on HBASE-3156: Awesome! I was thinking of doing this without MR if possible - since each RS would replay the all HLogs in a directory, there is no need to split files and then replay the logs... Special case distributed log splitting on fresh cluster startup --- Key: HBASE-3156 URL: https://issues.apache.org/jira/browse/HBASE-3156 Project: HBase Issue Type: New Feature Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan If the entire HBase goes down (not a graceful stop - example namenode dies) then on a subsequent restart, the HMaster can hand off the hlog splitting to the respective region servers. This would parallelize the log splitting and maintain region server hfile locality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3149) Make flush decisions per column family
Make flush decisions per column family -- Key: HBASE-3149 URL: https://issues.apache.org/jira/browse/HBASE-3149 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Today, the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3150) Allow some column to not write WALs
Allow some column to not write WALs --- Key: HBASE-3150 URL: https://issues.apache.org/jira/browse/HBASE-3150 Project: HBase Issue Type: Improvement Reporter: Karthik Ranganathan Priority: Minor We have this unique requirement where some column families hold data that is indexed from other existing column families. The index data is very large, and we end up writing these inserts into the WAL and then into the store files. In addition to taking more iops, this also slows down splitting files for recovery, etc. Creating this task to have an option to suppress WAL logging on a per CF basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2931) Do not throw RuntimeExceptions in RPC/HbaseObjectWritable code, ensure we log and rethrow as IOE
[ https://issues.apache.org/jira/browse/HBASE-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-2931: --- Attachment: HBASE-2931.patch Simple patch - posting directly instead of review board. Moves the newly added class down to the end of the object writeable opcode list so that all subsequent op-codes do not change. Also added some logging. Do not throw RuntimeExceptions in RPC/HbaseObjectWritable code, ensure we log and rethrow as IOE Key: HBASE-2931 URL: https://issues.apache.org/jira/browse/HBASE-2931 Project: HBase Issue Type: Bug Reporter: Jonathan Gray Priority: Critical Fix For: 0.90.0 Attachments: HBASE-2931.patch When there are issues with RPC and HbaseObjectWritable, primarily when server and client have different jars, the only thing that happens is the client will receive an EOF exception. The server does not log what happened at all and the client does not receive a server trace, rather the server seems to close the connection and the client gets an EOF because it tries to read off of a closed stream. We need to ensure that we catch, log, and rethrow as IOE any exceptions that may occur because of an issue with RPC or HbaseObjectWritable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-2931) Do not throw RuntimeExceptions in RPC/HbaseObjectWritable code, ensure we log and rethrow as IOE
[ https://issues.apache.org/jira/browse/HBASE-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan reassigned HBASE-2931: -- Assignee: Karthik Ranganathan Do not throw RuntimeExceptions in RPC/HbaseObjectWritable code, ensure we log and rethrow as IOE Key: HBASE-2931 URL: https://issues.apache.org/jira/browse/HBASE-2931 Project: HBase Issue Type: Bug Reporter: Jonathan Gray Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.90.0 Attachments: HBASE-2931.patch When there are issues with RPC and HbaseObjectWritable, primarily when server and client have different jars, the only thing that happens is the client will receive an EOF exception. The server does not log what happened at all and the client does not receive a server trace, rather the server seems to close the connection and the client gets an EOF because it tries to read off of a closed stream. We need to ensure that we catch, log, and rethrow as IOE any exceptions that may occur because of an issue with RPC or HbaseObjectWritable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2812) Disable 'table' fails to complete frustrating my ability to test easily
[ https://issues.apache.org/jira/browse/HBASE-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897082#action_12897082 ] Karthik Ranganathan commented on HBASE-2812: +1 Patch looks good to me. Disable 'table' fails to complete frustrating my ability to test easily --- Key: HBASE-2812 URL: https://issues.apache.org/jira/browse/HBASE-2812 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.20.6, 0.89.20100621 Environment: 0.89, non-distributed mode Reporter: Sam Pullara Fix For: 0.20.7, 0.90.0 Attachments: HBASE-2812.patch I see this in the client after it gives up: hbase(main):006:0 disable 'test_schema' ERROR: org.apache.hadoop.hbase.RegionException: Retries exhausted, it took too long to wait for the table test_schema to be disabled. Here is some help for this command: Disable the named table: e.g. hbase disable 't1' and this in the server log, a set of about 5 reports it is closing per disable call: 2010-07-03 15:19:47,554 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-07-03 15:19:47,554 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-07-03 15:19:47,555 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. to setClosing list 2010-07-03 15:19:47,576 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. 2010-07-03 15:19:47,576 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE: test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. 2010-07-03 15:19:48,567 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-07-03 15:19:48,567 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-07-03 15:19:48,568 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. to setClosing list 2010-07-03 15:19:48,577 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. 2010-07-03 15:19:48,578 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE: test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. 2010-07-03 15:19:49,580 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-07-03 15:19:49,580 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-07-03 15:19:49,581 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. to setClosing list 2010-07-03 15:19:50,580 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. 2010-07-03 15:19:50,581 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE: test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. 2010-07-03 15:19:50,592 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-07-03 15:19:50,592 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-07-03 15:19:50,593 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. to setClosing list 2010-07-03 15:19:51,581 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. 2010-07-03 15:19:51,581 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE: test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. 2010-07-03 15:19:52,605 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-07-03 15:19:52,605 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-07-03 15:19:52,606 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region test_schema,,1278195322074.65c77aedf2f2a08d161a188dd2dd5081. to setClosing list 2010-07-03 15:19:52,703 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 3.0 2010-07-03 15:19:52,863 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 192.168.2.1:54389, regionname: -ROOT-,,0.70236052, startKey: }
[jira] Commented: (HBASE-2866) Region permanently offlined
[ https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891649#action_12891649 ] Karthik Ranganathan commented on HBASE-2866: Hey Stack, Have a fix ready - testing now, will put it up in a bit. Fix is simple: we get into this situation because we update the same region in transition in ZK again and again, which bumps up the revision number of the ZNode. This causes the update to fail. So if the ZNode is already in the target state, do not update it again. The above explanation is super-cryptic :), so will sync up with you on the issue and the fix. Region permanently offlined Key: HBASE-2866 URL: https://issues.apache.org/jira/browse/HBASE-2866 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Karthik Ranganathan Priority: Blocker Attachments: master.log After split, master attempts to reassign a region to a region server. Occasionally, such a region can get permanently offlined. Master: - {code} 2010-07-22 01:26:00,914 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS: test1,651220,1279784117114.6466481aa931f8c1fa87622735487a72.: Daughters; test1,651220,1279787158624.6ead25ae677116cc88fc5420bb39d52e., test1,653179,1279787\ 158624.8d5490bfc166c687657cb09203bd7d44. from test024.test.xyz.com,60020,1279780567744; 1 of 1 2010-07-22 01:26:00,935 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 8d5490bfc166c687657cb09203bd7d44 in state = M2ZK_REGION_OFFLINE 2010-07-22 01:26:00,935 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 8d5490bfc166c687657cb09203bd7d44 in state = M2ZK_REGION_OFFLINE 2010-07-22 01:26:00,945 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. to test024.test.xyz.com,60020,1279780567744 2010-07-22 01:26:00,949 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: While updating UNASSIGNED region 8d5490bfc166c687657cb09203bd7d44 exists, state = M2ZK_REGION_OFFLINE 2010-07-22 01:26:00,954 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. in state M2ZK_REGION_OFFLINE {code} --- Region Server: {code} 2010-07-22 01:26:00,947 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. 2010-07-22 01:26:00,947 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: test1,651220,1279787158624.6ead25ae677116cc88fc5420bb39d52e. 2010-07-22 01:26:00,947 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. 2010-07-22 01:26:00,948 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44 with [RS2ZK_REGION_OPENING] expected version = 0 2010-07-22 01:26:00,952 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44 2010-07-22 01:26:00,974 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: msgstorectrl001.test.xyz.com,msgstorectrl021.test.xyz.com,msgstorectrl041.test.xyz.com,msgstorectrl061.test.xyz.com,msgstorectrl081.ash2.facebook\ .com:/hbase,test024.test.xyz.com,60020,1279780567744Failed to write data to ZooKeeper org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44 at org.apache.zookeeper.KeeperException.create(KeeperException.java:106) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062) at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161) at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1428) at
[jira] Commented: (HBASE-2866) Region permanently offlined
[ https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891804#action_12891804 ] Karthik Ranganathan commented on HBASE-2866: Stack - just uploaded a review at http://review.hbase.org/r/380/ Region permanently offlined Key: HBASE-2866 URL: https://issues.apache.org/jira/browse/HBASE-2866 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Karthik Ranganathan Priority: Blocker Attachments: master.log After split, master attempts to reassign a region to a region server. Occasionally, such a region can get permanently offlined. Master: - {code} 2010-07-22 01:26:00,914 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT_INCLUDES_DAUGHTERS: test1,651220,1279784117114.6466481aa931f8c1fa87622735487a72.: Daughters; test1,651220,1279787158624.6ead25ae677116cc88fc5420bb39d52e., test1,653179,1279787\ 158624.8d5490bfc166c687657cb09203bd7d44. from test024.test.xyz.com,60020,1279780567744; 1 of 1 2010-07-22 01:26:00,935 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 8d5490bfc166c687657cb09203bd7d44 in state = M2ZK_REGION_OFFLINE 2010-07-22 01:26:00,935 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED region 8d5490bfc166c687657cb09203bd7d44 in state = M2ZK_REGION_OFFLINE 2010-07-22 01:26:00,945 INFO org.apache.hadoop.hbase.master.RegionManager: Assigning region test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. to test024.test.xyz.com,60020,1279780567744 2010-07-22 01:26:00,949 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: While updating UNASSIGNED region 8d5490bfc166c687657cb09203bd7d44 exists, state = M2ZK_REGION_OFFLINE 2010-07-22 01:26:00,954 DEBUG org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. in state M2ZK_REGION_OFFLINE {code} --- Region Server: {code} 2010-07-22 01:26:00,947 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. 2010-07-22 01:26:00,947 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: test1,651220,1279787158624.6ead25ae677116cc88fc5420bb39d52e. 2010-07-22 01:26:00,947 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. 2010-07-22 01:26:00,948 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44 with [RS2ZK_REGION_OPENING] expected version = 0 2010-07-22 01:26:00,952 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44 2010-07-22 01:26:00,974 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: msgstorectrl001.test.xyz.com,msgstorectrl021.test.xyz.com,msgstorectrl041.test.xyz.com,msgstorectrl061.test.xyz.com,msgstorectrl081.ash2.facebook\ .com:/hbase,test024.test.xyz.com,60020,1279780567744Failed to write data to ZooKeeper org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/8d5490bfc166c687657cb09203bd7d44 at org.apache.zookeeper.KeeperException.create(KeeperException.java:106) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062) at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161) at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1428) at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1337) at java.lang.Thread.run(Thread.java:619) 2010-07-22 01:26:00,975 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening test1,653179,1279787158624.8d5490bfc166c687657cb09203bd7d44. java.io.IOException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for
[jira] Created: (HBASE-2872) Investigate why regions in transition are updated to the same state multiple times
Investigate why regions in transition are updated to the same state multiple times -- Key: HBASE-2872 URL: https://issues.apache.org/jira/browse/HBASE-2872 Project: HBase Issue Type: Bug Components: master Reporter: Karthik Ranganathan This is related to HBASE-2866 Regions going permanently offline. The fix prevented multiple duplicate updates from going to ZK. But the master still tries to update these regions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2833) Write a unit test for HBASE-2781
Write a unit test for HBASE-2781 Key: HBASE-2833 URL: https://issues.apache.org/jira/browse/HBASE-2833 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.89.20100621 Reporter: Karthik Ranganathan Need a test case to verify the fix for HBASE-2781 ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2781) ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state
[ https://issues.apache.org/jira/browse/HBASE-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-2781: --- Attachment: HBASE-2781-0.21.patch Adding the fix here, I will open a separate JIRA for adding a test case for this issue. ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state - Key: HBASE-2781 URL: https://issues.apache.org/jira/browse/HBASE-2781 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.90.0 Attachments: HBASE-2781-0.21.patch In ZKW.createUnassignedRegion I see this comment: {code} // check if this node already exists - // - it should not exist // - if it does, it should be in the CLOSED state {code} And what I got is: {noformat} 2010-06-23 15:42:05,823 INFO [IPC Server handler 3 on 60362] master.ServerManager(457): Processing MSG_REPORT_PROCESS_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. from h136.sfo.stumble.net,60365,1277332849712; 1 of 4 2010-06-23 15:42:05,867 INFO [RegionServer:1.worker] regionserver.HRegionServer$Worker(1338): Worker: MSG_REGION_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:05,870 DEBUG [RegionServer:1.worker] regionserver.RSZookeeperUpdater(157): Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENING] expected version = 0 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.HMaster(1158): Event NodeDataChanged with state SyncConnected with path /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.ZKMasterAddressWatcher(64): Got event NodeDataChanged with path /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.ZKUnassignedWatcher(95): ZK-EVENT-PROCESS: Got zkEvent NodeDataChanged state:SyncConnected path:/1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,872 INFO [main-EventThread] regionserver.HRegionServer(379): Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,872 DEBUG [MASTER_OPENREGION-10.10.1.136:60362-1] handler.MasterOpenRegionHandler(77): Event = RS2ZK_REGION_OPENING, region = 13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,874 DEBUG [RegionServer:1.worker] regionserver.HRegion(297): Creating region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:06,154 INFO [RegionServer:1.worker] regionserver.HRegion(366): Onlined test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.; next sequenceid=1 2010-06-23 15:42:06,154 DEBUG [RegionServer:1.worker] regionserver.RSZookeeperUpdater(157): Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENED] expected version = 1\ org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:06,249 ERROR [RegionServer:1.worker] regionserver.HRegionServer(1488): Failed to mark region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. as opened java.io.IOException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegionServer(1569): closing region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(487): Closing test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.: disabling compactions flushes 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(512): Updates disabled for region, no outstanding scanners on test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(519): No more row locks outstanding on region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:06,994 INFO [RegionServer:1] regionserver.HRegion(531): Closed test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:09,105 INFO [master] master.ProcessServerShutdown(126): Region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. was in transition name=test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464., state=PENDING_OPEN on dead server
[jira] Commented: (HBASE-2781) ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state
[ https://issues.apache.org/jira/browse/HBASE-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883239#action_12883239 ] Karthik Ranganathan commented on HBASE-2781: Just wanted to update - working on the test case for this, will upload patch along with the JUnit test. ZKW.createUnassignedRegion doesn't make sure existing znode is in the right state - Key: HBASE-2781 URL: https://issues.apache.org/jira/browse/HBASE-2781 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.21.0 In ZKW.createUnassignedRegion I see this comment: {code} // check if this node already exists - // - it should not exist // - if it does, it should be in the CLOSED state {code} And what I got is: {noformat} 2010-06-23 15:42:05,823 INFO [IPC Server handler 3 on 60362] master.ServerManager(457): Processing MSG_REPORT_PROCESS_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. from h136.sfo.stumble.net,60365,1277332849712; 1 of 4 2010-06-23 15:42:05,867 INFO [RegionServer:1.worker] regionserver.HRegionServer$Worker(1338): Worker: MSG_REGION_OPEN: test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:05,870 DEBUG [RegionServer:1.worker] regionserver.RSZookeeperUpdater(157): Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENING] expected version = 0 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.HMaster(1158): Event NodeDataChanged with state SyncConnected with path /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.ZKMasterAddressWatcher(64): Got event NodeDataChanged with path /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,871 DEBUG [main-EventThread] master.ZKUnassignedWatcher(95): ZK-EVENT-PROCESS: Got zkEvent NodeDataChanged state:SyncConnected path:/1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,872 INFO [main-EventThread] regionserver.HRegionServer(379): Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,872 DEBUG [MASTER_OPENREGION-10.10.1.136:60362-1] handler.MasterOpenRegionHandler(77): Event = RS2ZK_REGION_OPENING, region = 13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:05,874 DEBUG [RegionServer:1.worker] regionserver.HRegion(297): Creating region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:06,154 INFO [RegionServer:1.worker] regionserver.HRegion(366): Onlined test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.; next sequenceid=1 2010-06-23 15:42:06,154 DEBUG [RegionServer:1.worker] regionserver.RSZookeeperUpdater(157): Updating ZNode /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 with [RS2ZK_REGION_OPENED] expected version = 1\ org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:06,249 ERROR [RegionServer:1.worker] regionserver.HRegionServer(1488): Failed to mark region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. as opened java.io.IOException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /1/UNASSIGNED/13bef4950ac6827ac32d87682b8b2464 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegionServer(1569): closing region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(487): Closing test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464.: disabling compactions flushes 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(512): Updates disabled for region, no outstanding scanners on test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:06,993 DEBUG [RegionServer:1] regionserver.HRegion(519): No more row locks outstanding on region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:06,994 INFO [RegionServer:1] regionserver.HRegion(531): Closed test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. 2010-06-23 15:42:09,105 INFO [master] master.ProcessServerShutdown(126): Region test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464. was in transition name=test,lll,1277332918248.13bef4950ac6827ac32d87682b8b2464., state=PENDING_OPEN on dead server
[jira] Updated: (HBASE-2737) CME in ZKW introduced in HBASE-2694
[ https://issues.apache.org/jira/browse/HBASE-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-2737: --- Attachment: HBASE-2737-0.21.patch Making the register and unregister methods synchronized. Unit tests are passing. This change is so simple I am not putting it up on review board. CME in ZKW introduced in HBASE-2694 --- Key: HBASE-2737 URL: https://issues.apache.org/jira/browse/HBASE-2737 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Karthik Ranganathan Fix For: 0.21.0 Attachments: HBASE-2737-0.21.patch Saw this while tail'ing a log for something else: {code} 2010-06-15 17:30:03,769 ERROR [main-EventThread] zookeeper.ClientCnxn$EventThread(490): Error while calling watcher java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.process(ZooKeeperWrapper.java:235) {code} Looks like the listeners list's iterator is used in an unprotected manner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-2695) HMaster cleanup and refactor
[ https://issues.apache.org/jira/browse/HBASE-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan reassigned HBASE-2695: -- Assignee: Karthik Ranganathan HMaster cleanup and refactor Key: HBASE-2695 URL: https://issues.apache.org/jira/browse/HBASE-2695 Project: HBase Issue Type: Sub-task Components: master Reporter: Jonathan Gray Assignee: Karthik Ranganathan Priority: Critical Fix For: 0.21.0 Before doing the more significant changes to HMaster, it would benefit greatly from some cleanup, commenting, and a bit of refactoring. One motivation is to nail down the initialization flow and comment each step. Another is to add a couple new classes to break up functionality into helpers to reduce HMaster size (for example, pushing all filesystem operations into their own class). And lastly to stop the practice of passing around references to HMaster everywhere and instead pass along only what is necessary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2694) Move RS to Master region open/close messaging into ZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HBASE-2694: --- Attachment: HBASE-2694-OPENSOURCE-TRUNK-zk-based-messaging-v2.patch Second pass at the patch. Incorporates changes from in-person review with Todd and Stack. Unit tests pass. Move RS to Master region open/close messaging into ZooKeeper Key: HBASE-2694 URL: https://issues.apache.org/jira/browse/HBASE-2694 Project: HBase Issue Type: Sub-task Components: master, regionserver Reporter: Jonathan Gray Priority: Critical Fix For: 0.21.0 Attachments: HBASE-2694-OPENSOURCE-TRUNK-zk-based-messaging-v2.patch, HBASE-2694-OPENSOURCE-TRUNK-zk-based-messaging.patch As a first step towards HBASE-2485, this issue is about changing the message flow of opening and closing of regions without actually changing the implementation of what happens on both the Master and RegionServer sides. This way we can debug the messaging changes before the introduction of more significant changes to the master architecture and handling of regions in transition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.