[jira] [Commented] (HBASE-6611) Forcing region state offline cause double assignment

2012-09-12 Thread Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454133#comment-13454133
 ] 

Jacques commented on HBASE-6611:


Reminders from the PowWow yesterday... 

JD requested that you verify that force close continues to function despite 
changes.

JD  Andrew both requested that you run some performance tests to ensure that 
region assignment doesn't take substantially longer than 0.94.  Something along 
the lines of bulk assignment of 10,000 regions and also checking to ensure that 
region failover isn't substantially longer.

 Forcing region state offline cause double assignment
 

 Key: HBASE-6611
 URL: https://issues.apache.org/jira/browse/HBASE-6611
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0


 In assigning a region, assignment manager forces the region state offline if 
 it is not. This could cause double assignment, for example, if the region is 
 already assigned and in the Open state, you should not just change it's state 
 to Offline, and assign it again.
 I think this could be the root cause for all double assignments IF the region 
 state is reliable.
 After this loophole is closed, TestHBaseFsck should come up a different way 
 to create some assignment inconsistencies, for example, calling region server 
 to open a region directly. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5993) Add a no-read Append

2012-05-30 Thread Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285764#comment-13285764
 ] 

Jacques commented on HBASE-5993:


The reason this can make sense is data overhead.  In a case where we are 
capturing a large number of small values, the KeyValue overhead is substantial. 
 The original use case is one where I'm adding to a list of documents that 
contain a certain term (search index).  Let's say that each document number is 
a four byte int.  Right now there are two options: use the existing append 
which means one will become swamped with reads as the cell value grows over 
time (this would also wreak havoc on memstore flushes as the cell value become 
megabytes in size and we're just adding another four bytes once a day).  On the 
flipside, using separate columns creates a substantial amount of overhead for 
each value added.  This utility of this functionality also extends to 
situations where people are capturing a large sequence of small values in 
monitoring applications.  (Sematext are basically trying to create this 
functionality already with their HBaseHUT work.)  

Yes, an additional KeyValue.Type is needed.  When this type is read, the return 
functionality goes and get all the appended values (and the last full value) 
and then combines them on return.  As compactions are done, the complete merged 
values are created.  

I'm swamped right now but am going to try to write up a short design doc in the 
next couple of weeks and get you guys to review my approach since this will 
have to touch a number of components.  I also need to make sure to manage edge 
cases like what happens if you do a no-read append and no existing value exists 
(probably ok--even though read back performance will be poor).  



 Add a no-read Append
 

 Key: HBASE-5993
 URL: https://issues.apache.org/jira/browse/HBASE-5993
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques
Priority: Critical

 HBASE-4102 added an atomic append.  For high performance situations, it would 
 be helpful to be able to do appends that don't actually require a read of the 
 existing value.  This would be useful in building a growing set of values.  
 Our original use case was for implementing a form of search in HBase where a 
 cell would contain a list of document ids associated with a particular 
 keyword for search.  However it seems like it would also be useful to provide 
 substantial performance improvements for most Append scenarios.
 Within the client API, the simplest way to implement this would be to 
 leverage the existing Append api.  If the Append is marked as 
 setReturnResults(false), use this code path.  If result return is requested, 
 use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-05-30 Thread Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285793#comment-13285793
 ] 

Jacques commented on HBASE-4676:


Random thought on this.  I was talking to Nicolas Spiegelberg and he was 
talking about how they are exploring changing encoding schemes when data 
becomes more permanent.  Don't quote me on this but it sounded like they were 
considering using gzip compression on major compactions.  I was wondering if 
something similar made sense here.  Basically, use less compression the shorter 
the likely lifetime of files.  Use the Trie compressions on only major 
compactions.   This way the performance difference could be less of a serious 
issue since you're paying for it a fewer number of times.  I glanced around and 
didn't see any Jiras about a two-tiered approach of data storage formats but it 
seems like that would be a prerequisite for a hybrid/tiered approach.

 Prefix Compression - Trie data block encoding
 -

 Key: HBASE-4676
 URL: https://issues.apache.org/jira/browse/HBASE-4676
 Project: HBase
  Issue Type: New Feature
  Components: io, performance, regionserver
Affects Versions: 0.90.6
Reporter: Matt Corgan
Assignee: Matt Corgan
 Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf, 
 PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png, 
 hbase-prefix-trie-0.1.jar


 The HBase data block format has room for 2 significant improvements for 
 applications that have high block cache hit ratios.  
 First, there is no prefix compression, and the current KeyValue format is 
 somewhat metadata heavy, so there can be tremendous memory bloat for many 
 common data layouts, specifically those with long keys and short values.
 Second, there is no random access to KeyValues inside data blocks.  This 
 means that every time you double the datablock size, average seek time (or 
 average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
 size is ~10x slower for random seeks than a 4KB block size, but block sizes 
 as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
 or more may be more efficient from a disk access and block-cache perspective 
 in many big-data applications, but doing so is infeasible from a random seek 
 perspective.
 The PrefixTrie block encoding format attempts to solve both of these 
 problems.  Some features:
 * trie format for row key encoding completely eliminates duplicate row keys 
 and encodes similar row keys into a standard trie structure which also saves 
 a lot of space
 * the column family is currently stored once at the beginning of each block.  
 this could easily be modified to allow multiple family names per block
 * all qualifiers in the block are stored in their own trie format which 
 caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
  the size of this trie determines the width of the block's qualifier 
 fixed-width-int
 * the minimum timestamp is stored at the beginning of the block, and deltas 
 are calculated from that.  the maximum delta determines the width of the 
 block's timestamp fixed-width-int
 The block is structured with metadata at the beginning, then a section for 
 the row trie, then the column trie, then the timestamp deltas, and then then 
 all the values.  Most work is done in the row trie, where every leaf node 
 (corresponding to a row) contains a list of offsets/references corresponding 
 to the cells in that row.  Each cell is fixed-width to enable binary 
 searching and is represented by [1 byte operationType, X bytes qualifier 
 offset, X bytes timestamp delta offset].
 If all operation types are the same for a block, there will be zero per-cell 
 overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
 So, the compression aspect is very strong, but makes a few small sacrifices 
 on VarInt size to enable faster binary searches in trie fan-out nodes.
 A more compressed but slower version might build on this by also applying 
 further (suffix, etc) compression on the trie nodes at the cost of slower 
 write speed.  Even further compression could be obtained by using all VInts 
 instead of FInts with a sacrifice on random seek speed (though not huge).
 One current drawback is the current write speed.  While programmed with good 
 constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
 programmed with the same level of optimization as the read path.  Work will 
 need to be done to optimize the data structures used for encoding and could 
 probably show a 10x increase.  It will still be slower than delta encoding, 
 but with a much higher decode speed.  I have not yet created a thorough 
 benchmark for write speed nor sequential read speed.
 

[jira] [Commented] (HBASE-5993) Add a no-read Append

2012-05-30 Thread Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285800#comment-13285800
 ] 

Jacques commented on HBASE-5993:


The implementation of HBASE-4218, HBASE-4676 and HBASE-6093 reduce the storage 
overhead of the multicolumn approach to solving this problem.  Network 
bandwidth and processing overhead will still exist.  Using encoding schemes to 
solve this problem is nice because the changes are constrained as opposed to 
cross-cutting.  That being said, it seems a bit like boiling the ocean to make 
a cup of tea.  Let me put a design doc together and then we can reevaluate.  My 
intuition is that this type of functionality could open up a new set of use 
cases for HBase.  

 Add a no-read Append
 

 Key: HBASE-5993
 URL: https://issues.apache.org/jira/browse/HBASE-5993
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques
Priority: Critical

 HBASE-4102 added an atomic append.  For high performance situations, it would 
 be helpful to be able to do appends that don't actually require a read of the 
 existing value.  This would be useful in building a growing set of values.  
 Our original use case was for implementing a form of search in HBase where a 
 cell would contain a list of document ids associated with a particular 
 keyword for search.  However it seems like it would also be useful to provide 
 substantial performance improvements for most Append scenarios.
 Within the client API, the simplest way to implement this would be to 
 leverage the existing Append api.  If the Append is marked as 
 setReturnResults(false), use this code path.  If result return is requested, 
 use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5993) Add a no-read Append

2012-05-26 Thread Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284074#comment-13284074
 ] 

Jacques commented on HBASE-5993:


Exactly. If you have five megs of values in a cell and then want to append 
another few bytes regularly, it would be best if HBase didn't have to read the 
existing value every time we wanted to add a few more bytes.  Using multiple 
columns to psuedo accomplish this functionality creates a lot of data overhead.

 Add a no-read Append
 

 Key: HBASE-5993
 URL: https://issues.apache.org/jira/browse/HBASE-5993
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques
Priority: Critical

 HBASE-4102 added an atomic append.  For high performance situations, it would 
 be helpful to be able to do appends that don't actually require a read of the 
 existing value.  This would be useful in building a growing set of values.  
 Our original use case was for implementing a form of search in HBase where a 
 cell would contain a list of document ids associated with a particular 
 keyword for search.  However it seems like it would also be useful to provide 
 substantial performance improvements for most Append scenarios.
 Within the client API, the simplest way to implement this would be to 
 leverage the existing Append api.  If the Append is marked as 
 setReturnResults(false), use this code path.  If result return is requested, 
 use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5993) Add a no-read Append

2012-05-12 Thread Jacques (JIRA)
Jacques created HBASE-5993:
--

 Summary: Add a no-read Append
 Key: HBASE-5993
 URL: https://issues.apache.org/jira/browse/HBASE-5993
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques


HBASE-4102 added an atomic append.  For high performance situations, it would 
be helpful to be able to do appends that don't actually require a read of the 
existing value.  This would be useful in building a growing set of values.  Our 
original use case was for implementing a form of search in HBase where a cell 
would contain a list of document ids associated with a particular keyword for 
search.  However it seems like it would also be useful to provide substantial 
performance improvements for most Append scenarios.

Within the client API, the simplest way to implement this would be to leverage 
the existing Append api.  If the Append is marked as setReturnResults(false), 
use this code path.  If result return is requested, use the existing Append 
implementation.  



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira