[jira] [Commented] (HBASE-6611) Forcing region state offline cause double assignment
[ https://issues.apache.org/jira/browse/HBASE-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454133#comment-13454133 ] Jacques commented on HBASE-6611: Reminders from the PowWow yesterday... JD requested that you verify that force close continues to function despite changes. JD Andrew both requested that you run some performance tests to ensure that region assignment doesn't take substantially longer than 0.94. Something along the lines of bulk assignment of 10,000 regions and also checking to ensure that region failover isn't substantially longer. Forcing region state offline cause double assignment Key: HBASE-6611 URL: https://issues.apache.org/jira/browse/HBASE-6611 Project: HBase Issue Type: Bug Components: master Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 In assigning a region, assignment manager forces the region state offline if it is not. This could cause double assignment, for example, if the region is already assigned and in the Open state, you should not just change it's state to Offline, and assign it again. I think this could be the root cause for all double assignments IF the region state is reliable. After this loophole is closed, TestHBaseFsck should come up a different way to create some assignment inconsistencies, for example, calling region server to open a region directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5993) Add a no-read Append
[ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285764#comment-13285764 ] Jacques commented on HBASE-5993: The reason this can make sense is data overhead. In a case where we are capturing a large number of small values, the KeyValue overhead is substantial. The original use case is one where I'm adding to a list of documents that contain a certain term (search index). Let's say that each document number is a four byte int. Right now there are two options: use the existing append which means one will become swamped with reads as the cell value grows over time (this would also wreak havoc on memstore flushes as the cell value become megabytes in size and we're just adding another four bytes once a day). On the flipside, using separate columns creates a substantial amount of overhead for each value added. This utility of this functionality also extends to situations where people are capturing a large sequence of small values in monitoring applications. (Sematext are basically trying to create this functionality already with their HBaseHUT work.) Yes, an additional KeyValue.Type is needed. When this type is read, the return functionality goes and get all the appended values (and the last full value) and then combines them on return. As compactions are done, the complete merged values are created. I'm swamped right now but am going to try to write up a short design doc in the next couple of weeks and get you guys to review my approach since this will have to touch a number of components. I also need to make sure to manage edge cases like what happens if you do a no-read append and no existing value exists (probably ok--even though read back performance will be poor). Add a no-read Append Key: HBASE-5993 URL: https://issues.apache.org/jira/browse/HBASE-5993 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.0 Reporter: Jacques Priority: Critical HBASE-4102 added an atomic append. For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value. This would be useful in building a growing set of values. Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search. However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios. Within the client API, the simplest way to implement this would be to leverage the existing Append api. If the Append is marked as setReturnResults(false), use this code path. If result return is requested, use the existing Append implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding
[ https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285793#comment-13285793 ] Jacques commented on HBASE-4676: Random thought on this. I was talking to Nicolas Spiegelberg and he was talking about how they are exploring changing encoding schemes when data becomes more permanent. Don't quote me on this but it sounded like they were considering using gzip compression on major compactions. I was wondering if something similar made sense here. Basically, use less compression the shorter the likely lifetime of files. Use the Trie compressions on only major compactions. This way the performance difference could be less of a serious issue since you're paying for it a fewer number of times. I glanced around and didn't see any Jiras about a two-tiered approach of data storage formats but it seems like that would be a prerequisite for a hybrid/tiered approach. Prefix Compression - Trie data block encoding - Key: HBASE-4676 URL: https://issues.apache.org/jira/browse/HBASE-4676 Project: HBase Issue Type: New Feature Components: io, performance, regionserver Affects Versions: 0.90.6 Reporter: Matt Corgan Assignee: Matt Corgan Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png, hbase-prefix-trie-0.1.jar The HBase data block format has room for 2 significant improvements for applications that have high block cache hit ratios. First, there is no prefix compression, and the current KeyValue format is somewhat metadata heavy, so there can be tremendous memory bloat for many common data layouts, specifically those with long keys and short values. Second, there is no random access to KeyValues inside data blocks. This means that every time you double the datablock size, average seek time (or average cpu consumption) goes up by a factor of 2. The standard 64KB block size is ~10x slower for random seeks than a 4KB block size, but block sizes as small as 4KB cause problems elsewhere. Using block sizes of 256KB or 1MB or more may be more efficient from a disk access and block-cache perspective in many big-data applications, but doing so is infeasible from a random seek perspective. The PrefixTrie block encoding format attempts to solve both of these problems. Some features: * trie format for row key encoding completely eliminates duplicate row keys and encodes similar row keys into a standard trie structure which also saves a lot of space * the column family is currently stored once at the beginning of each block. this could easily be modified to allow multiple family names per block * all qualifiers in the block are stored in their own trie format which caters nicely to wide rows. duplicate qualifers between rows are eliminated. the size of this trie determines the width of the block's qualifier fixed-width-int * the minimum timestamp is stored at the beginning of the block, and deltas are calculated from that. the maximum delta determines the width of the block's timestamp fixed-width-int The block is structured with metadata at the beginning, then a section for the row trie, then the column trie, then the timestamp deltas, and then then all the values. Most work is done in the row trie, where every leaf node (corresponding to a row) contains a list of offsets/references corresponding to the cells in that row. Each cell is fixed-width to enable binary searching and is represented by [1 byte operationType, X bytes qualifier offset, X bytes timestamp delta offset]. If all operation types are the same for a block, there will be zero per-cell overhead. Same for timestamps. Same for qualifiers when i get a chance. So, the compression aspect is very strong, but makes a few small sacrifices on VarInt size to enable faster binary searches in trie fan-out nodes. A more compressed but slower version might build on this by also applying further (suffix, etc) compression on the trie nodes at the cost of slower write speed. Even further compression could be obtained by using all VInts instead of FInts with a sacrifice on random seek speed (though not huge). One current drawback is the current write speed. While programmed with good constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not programmed with the same level of optimization as the read path. Work will need to be done to optimize the data structures used for encoding and could probably show a 10x increase. It will still be slower than delta encoding, but with a much higher decode speed. I have not yet created a thorough benchmark for write speed nor sequential read speed.
[jira] [Commented] (HBASE-5993) Add a no-read Append
[ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285800#comment-13285800 ] Jacques commented on HBASE-5993: The implementation of HBASE-4218, HBASE-4676 and HBASE-6093 reduce the storage overhead of the multicolumn approach to solving this problem. Network bandwidth and processing overhead will still exist. Using encoding schemes to solve this problem is nice because the changes are constrained as opposed to cross-cutting. That being said, it seems a bit like boiling the ocean to make a cup of tea. Let me put a design doc together and then we can reevaluate. My intuition is that this type of functionality could open up a new set of use cases for HBase. Add a no-read Append Key: HBASE-5993 URL: https://issues.apache.org/jira/browse/HBASE-5993 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.0 Reporter: Jacques Priority: Critical HBASE-4102 added an atomic append. For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value. This would be useful in building a growing set of values. Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search. However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios. Within the client API, the simplest way to implement this would be to leverage the existing Append api. If the Append is marked as setReturnResults(false), use this code path. If result return is requested, use the existing Append implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5993) Add a no-read Append
[ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284074#comment-13284074 ] Jacques commented on HBASE-5993: Exactly. If you have five megs of values in a cell and then want to append another few bytes regularly, it would be best if HBase didn't have to read the existing value every time we wanted to add a few more bytes. Using multiple columns to psuedo accomplish this functionality creates a lot of data overhead. Add a no-read Append Key: HBASE-5993 URL: https://issues.apache.org/jira/browse/HBASE-5993 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.0 Reporter: Jacques Priority: Critical HBASE-4102 added an atomic append. For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value. This would be useful in building a growing set of values. Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search. However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios. Within the client API, the simplest way to implement this would be to leverage the existing Append api. If the Append is marked as setReturnResults(false), use this code path. If result return is requested, use the existing Append implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5993) Add a no-read Append
Jacques created HBASE-5993: -- Summary: Add a no-read Append Key: HBASE-5993 URL: https://issues.apache.org/jira/browse/HBASE-5993 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.0 Reporter: Jacques HBASE-4102 added an atomic append. For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value. This would be useful in building a growing set of values. Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search. However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios. Within the client API, the simplest way to implement this would be to leverage the existing Append api. If the Append is marked as setReturnResults(false), use this code path. If result return is requested, use the existing Append implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira