from:"Jacques \(JIRA\)"

[jira] [Commented] (HBASE-6611) Forcing region state offline cause double assignment

2012-09-12 Thread Jacques (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454133#comment-13454133
]

Jacques commented on HBASE-6611:

Reminders from the PowWow yesterday...

JD requested that you verify that force close continues to function despite
changes.

JD Andrew both requested that you run some performance tests to ensure that
region assignment doesn't take substantially longer than 0.94. Something along
the lines of bulk assignment of 10,000 regions and also checking to ensure that
region failover isn't substantially longer.

Forcing region state offline cause double assignment

Key: HBASE-6611
URL: https://issues.apache.org/jira/browse/HBASE-6611
Project: HBase
Issue Type: Bug
Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Fix For: 0.96.0

In assigning a region, assignment manager forces the region state offline if
it is not. This could cause double assignment, for example, if the region is
already assigned and in the Open state, you should not just change it's state
to Offline, and assign it again.
I think this could be the root cause for all double assignments IF the region
state is reliable.
After this loophole is closed, TestHBaseFsck should come up a different way
to create some assignment inconsistencies, for example, calling region server
to open a region directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5993) Add a no-read Append

2012-05-30 Thread Jacques (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285764#comment-13285764
]

Jacques commented on HBASE-5993:

The reason this can make sense is data overhead. In a case where we are
capturing a large number of small values, the KeyValue overhead is substantial.
The original use case is one where I'm adding to a list of documents that
contain a certain term (search index). Let's say that each document number is
a four byte int. Right now there are two options: use the existing append
which means one will become swamped with reads as the cell value grows over
time (this would also wreak havoc on memstore flushes as the cell value become
megabytes in size and we're just adding another four bytes once a day). On the
flipside, using separate columns creates a substantial amount of overhead for
each value added. This utility of this functionality also extends to
situations where people are capturing a large sequence of small values in
monitoring applications. (Sematext are basically trying to create this
functionality already with their HBaseHUT work.)

Yes, an additional KeyValue.Type is needed. When this type is read, the return
functionality goes and get all the appended values (and the last full value)
and then combines them on return. As compactions are done, the complete merged
values are created.

I'm swamped right now but am going to try to write up a short design doc in the
next couple of weeks and get you guys to review my approach since this will
have to touch a number of components. I also need to make sure to manage edge
cases like what happens if you do a no-read append and no existing value exists
(probably ok--even though read back performance will be poor).

Add a no-read Append

Key: HBASE-5993
URL: https://issues.apache.org/jira/browse/HBASE-5993
Project: HBase
Issue Type: Improvement
Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques
Priority: Critical

HBASE-4102 added an atomic append. For high performance situations, it would
be helpful to be able to do appends that don't actually require a read of the
existing value. This would be useful in building a growing set of values.
Our original use case was for implementing a form of search in HBase where a
cell would contain a list of document ids associated with a particular
keyword for search. However it seems like it would also be useful to provide
substantial performance improvements for most Append scenarios.
Within the client API, the simplest way to implement this would be to
leverage the existing Append api. If the Append is marked as
setReturnResults(false), use this code path. If result return is requested,
use the existing Append implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-05-30 Thread Jacques (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285793#comment-13285793
]

Jacques commented on HBASE-4676:

Random thought on this. I was talking to Nicolas Spiegelberg and he was
talking about how they are exploring changing encoding schemes when data
becomes more permanent. Don't quote me on this but it sounded like they were
considering using gzip compression on major compactions. I was wondering if
something similar made sense here. Basically, use less compression the shorter
the likely lifetime of files. Use the Trie compressions on only major
compactions. This way the performance difference could be less of a serious
issue since you're paying for it a fewer number of times. I glanced around and
didn't see any Jiras about a two-tiered approach of data storage formats but it
seems like that would be a prerequisite for a hybrid/tiered approach.

Prefix Compression - Trie data block encoding
-

Key: HBASE-4676
URL: https://issues.apache.org/jira/browse/HBASE-4676
Project: HBase
Issue Type: New Feature
Components: io, performance, regionserver
Affects Versions: 0.90.6
Reporter: Matt Corgan
Assignee: Matt Corgan
Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf,
PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png,
hbase-prefix-trie-0.1.jar

The HBase data block format has room for 2 significant improvements for
applications that have high block cache hit ratios.
First, there is no prefix compression, and the current KeyValue format is
somewhat metadata heavy, so there can be tremendous memory bloat for many
common data layouts, specifically those with long keys and short values.
Second, there is no random access to KeyValues inside data blocks. This
means that every time you double the datablock size, average seek time (or
average cpu consumption) goes up by a factor of 2. The standard 64KB block
size is ~10x slower for random seeks than a 4KB block size, but block sizes
as small as 4KB cause problems elsewhere. Using block sizes of 256KB or 1MB
or more may be more efficient from a disk access and block-cache perspective
in many big-data applications, but doing so is infeasible from a random seek
perspective.
The PrefixTrie block encoding format attempts to solve both of these
problems. Some features:
* trie format for row key encoding completely eliminates duplicate row keys
and encodes similar row keys into a standard trie structure which also saves
a lot of space
* the column family is currently stored once at the beginning of each block.
this could easily be modified to allow multiple family names per block
* all qualifiers in the block are stored in their own trie format which
caters nicely to wide rows. duplicate qualifers between rows are eliminated.
the size of this trie determines the width of the block's qualifier
fixed-width-int
* the minimum timestamp is stored at the beginning of the block, and deltas
are calculated from that. the maximum delta determines the width of the
block's timestamp fixed-width-int
The block is structured with metadata at the beginning, then a section for
the row trie, then the column trie, then the timestamp deltas, and then then
all the values. Most work is done in the row trie, where every leaf node
(corresponding to a row) contains a list of offsets/references corresponding
to the cells in that row. Each cell is fixed-width to enable binary
searching and is represented by [1 byte operationType, X bytes qualifier
offset, X bytes timestamp delta offset].
If all operation types are the same for a block, there will be zero per-cell
overhead. Same for timestamps. Same for qualifiers when i get a chance.
So, the compression aspect is very strong, but makes a few small sacrifices
on VarInt size to enable faster binary searches in trie fan-out nodes.
A more compressed but slower version might build on this by also applying
further (suffix, etc) compression on the trie nodes at the cost of slower
write speed. Even further compression could be obtained by using all VInts
instead of FInts with a sacrifice on random seek speed (though not huge).
One current drawback is the current write speed. While programmed with good
constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not
programmed with the same level of optimization as the read path. Work will
need to be done to optimize the data structures used for encoding and could
probably show a 10x increase. It will still be slower than delta encoding,
but with a much higher decode speed. I have not yet created a thorough
benchmark for write speed nor sequential read speed.

[jira] [Commented] (HBASE-5993) Add a no-read Append

2012-05-30 Thread Jacques (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285800#comment-13285800
]

Jacques commented on HBASE-5993:

The implementation of HBASE-4218, HBASE-4676 and HBASE-6093 reduce the storage
overhead of the multicolumn approach to solving this problem. Network
bandwidth and processing overhead will still exist. Using encoding schemes to
solve this problem is nice because the changes are constrained as opposed to
cross-cutting. That being said, it seems a bit like boiling the ocean to make
a cup of tea. Let me put a design doc together and then we can reevaluate. My
intuition is that this type of functionality could open up a new set of use
cases for HBase.

Add a no-read Append

Key: HBASE-5993
URL: https://issues.apache.org/jira/browse/HBASE-5993
Project: HBase
Issue Type: Improvement
Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques
Priority: Critical

[jira] [Commented] (HBASE-5993) Add a no-read Append

2012-05-26 Thread Jacques (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284074#comment-13284074
]

Jacques commented on HBASE-5993:

Exactly. If you have five megs of values in a cell and then want to append
another few bytes regularly, it would be best if HBase didn't have to read the
existing value every time we wanted to add a few more bytes. Using multiple
columns to psuedo accomplish this functionality creates a lot of data overhead.

Add a no-read Append

Key: HBASE-5993
URL: https://issues.apache.org/jira/browse/HBASE-5993
Project: HBase
Issue Type: Improvement
Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques
Priority: Critical

[jira] [Created] (HBASE-5993) Add a no-read Append

2012-05-12 Thread Jacques (JIRA)

Jacques created HBASE-5993:
--

 Summary: Add a no-read Append
 Key: HBASE-5993
 URL: https://issues.apache.org/jira/browse/HBASE-5993
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques


HBASE-4102 added an atomic append.  For high performance situations, it would 
be helpful to be able to do appends that don't actually require a read of the 
existing value.  This would be useful in building a growing set of values.  Our 
original use case was for implementing a form of search in HBase where a cell 
would contain a list of document ids associated with a particular keyword for 
search.  However it seems like it would also be useful to provide substantial 
performance improvements for most Append scenarios.

Within the client API, the simplest way to implement this would be to leverage 
the existing Append api.  If the Append is marked as setReturnResults(false), 
use this code path.  If result return is requested, use the existing Append 
implementation.  



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6611) Forcing region state offline cause double assignment

[jira] [Commented] (HBASE-5993) Add a no-read Append

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

[jira] [Commented] (HBASE-5993) Add a no-read Append

[jira] [Commented] (HBASE-5993) Add a no-read Append

[jira] [Created] (HBASE-5993) Add a no-read Append

6 matches

Site Navigation

Mail list logo

Footer information