.
Hope this helps. Please let us know how it goes.
-- Lars
From: Kristoffer Sjögren sto...@gmail.com
To: user@hbase.apache.org
Sent: Wednesday, April 8, 2015 6:41 AM
Subject: Re: Rowkey design question
Yes, I think you're right. Adding one or more
...@gmail.com
To: user@hbase.apache.org
Sent: Wednesday, April 8, 2015 6:41 AM
Subject: Re: Rowkey design question
Yes, I think you're right. Adding one or more dimensions to the rowkey
would indeed make the table narrower.
And I guess it also make sense to store actual values (bigger qualifiers
: Thursday, April 9, 2015 4:53 PM
Subject: Re: Rowkey design question
On Thu, Apr 9, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com
wrote:
Hint: You could have sandboxed the end user code which makes it a lot
easier to manage.
I filed the fucking JIRA for that. Look
Trying to figure out the best place to jump in here...
Kristoffer,
I would like to echo what Michael and Andrew have said. While a
pre-aggregation co-proc may work in my experience with co-procs they are
typically more trouble than they are worth. I would first try this outside
the client
.
-- Lars
From: Andrew Purtell apurt...@apache.org javascript:;
To: user@hbase.apache.org javascript:; user@hbase.apache.org
javascript:;
Sent: Thursday, April 9, 2015 4:53 PM
Subject: Re: Rowkey design question
On Thu, Apr 9, 2015 at 2:26 PM, Michael Segel
apurt...@apache.org
To: user@hbase.apache.org user@hbase.apache.org
Sent: Thursday, April 9, 2015 4:53 PM
Subject: Re: Rowkey design question
On Thu, Apr 9, 2015 at 2:26 PM, Michael Segel michael_se...@hotmail.com
wrote:
Hint: You could have sandboxed the end user code which makes it a lot
Ok…
Coprocessors are poorly implemented in HBase.
If you work in a secure environment, outside of the system coprocessors… (ones
that you load from hbase-site.xml) , you don’t want to use them. (The
coprocessor code runs on the same JVM as the RS.) This means that if you have
a poorly
An HBase coprocessor. My idea is to move as much pre-aggregation as
possible to where the data lives in the region servers, instead of doing it
in the client. If there is good data locality inside and across rows within
regions then I would expect aggregation to be faster in the coprocessor
Andrew,
In a nutshell running end user code within the RS JVM is a bad design.
To be clear, this is not just my opinion… I just happen to be more vocal about
it. ;-)
We’ve covered this ground before and just because the code runs doesn’t mean
its good. Or that the design is good.
I would
This is one person's opinion, to which he is absolutely entitled to, but
blanket black and white statements like coprocessors are poorly
implemented is obviously not an opinion shared by all those who have used
them successfully, nor the HBase committers, or we would remove the
feature. On the
When you say coprocessor, do you mean HBase coprocessors or do you mean a
physical hardware coprocessor?
In terms of queries…
HBase can perform a single get() and return the result back quickly. (The size
of the data being returned will impact the overall timing.)
HBase also caches the
Ok…
First, I’d suggest you rethink your schema by adding an additional dimension.
You’ll end up with more rows, but a narrower table.
In terms of compaction… if the data is relatively static, you won’t have
compactions because nothing changed.
But if your data is that static… why not put
I just read through HBase MOB design document and one thing that caught my
attention was the following statement.
When HBase deals with large numbers of values 100kb and up to ~10MB of
data, it encounters performance degradations due to write amplification
caused by splits and compactions.
Is
A small set of qualifiers will be accessed frequently so keeping them in
block cache would be very beneficial. Some very seldom. So this sounds very
promising!
The reason why i'm considering a coprocessor is that I need to provide very
specific information in the query request. Same thing with
Yes, I think you're right. Adding one or more dimensions to the rowkey
would indeed make the table narrower.
And I guess it also make sense to store actual values (bigger qualifiers)
outside HBase. Keeping them in Hadoop why not? Pulling hot ones out on SSD
caches would be an interesting
I think you misunderstood.
The suggestion was to put the data in to HDFS sequence files and to use HBase
to store an index in to the file. (URL to the file, then offset in to the file
for the start of the record…)
The reason you want to do this is that you’re reading in large amounts of data
But if the coprocessor is omitted then CPU cycles from region servers are
lost, so where would the query execution go?
Queries needs to be quick (sub-second rather than seconds) and HDFS is
quite latency hungry, unless there are optimizations that i'm unaware of?
On Wed, Apr 8, 2015 at 7:43
Hi
I have a row with around 100.000 qualifiers with mostly small values around
1-5KB and maybe 5 largers ones around 1-5 MB. A coprocessor do random
access of 1-10 qualifiers per row.
I would like to understand how HBase loads the data into memory. Will the
entire row be loaded or only the
how HBase loads the data into memory.
If you init Get and specify columns with addColumn, it is likely that only
data for these columns is read and loaded in memory.
Rowkey is best kept short. So are column qualifiers.
Sorry I should have explained my use case a bit more.
Yes, it's a pretty big row and it's close to worst case. Normally there
would be fewer qualifiers and the largest qualifiers would be smaller.
The reason why these rows gets big is because they stores aggregated data
in indexed compressed
Sorry, but your initial problem statement doesn’t seem to parse …
Are you saying that you a single row with approximately 100,000 elements where
each element is roughly 1-5KB in size and in addition there are ~5 elements
which will be between one and five MB in size?
And you then mention a
Those rows are written out into HBase blocks on cell boundaries. Your
column family has a BLOCK_SIZE attribute, which you may or may have no
overridden the default of 64k. Cells are written into a block until is it
= the target block size. So your single 500mb row will be broken down into
An easier way is to place one byte before the time stamp which is called a
bucket. You can calculate it by using modulu on the time stamp by the
number of buckets. We are now in the process of field testing it.
On Tuesday, February 19, 2013, Paul van Hoven wrote:
Yeah it worked fine.
But as
Another good point.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Feb 22, 2013 at 3:45 AM, Asaf Mesika asaf.mes...@gmail.com wrote:
An easier way is to place one byte before the time stamp which is called a
bucket. You can calculate it by using modulu on the
Hi,
I'm currently playing with hbase. The design of the rowkey seems to be
critical.
The rowkey for a certain database table of mine is:
timestamp+ipaddress
It looks something like this when performing a scan on the table in the shell:
hbase(main):012:0 scan 'ToyDataTable'
ROW
Hello Paul,
Try this and see if it works :
scan.setStartRow(Bytes.toBytes(startDate.getTime() + ));
scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + ));
Also try not to use TS as the rowkey, as it may lead to RS hotspotting.
Just add a hash to your rowkeys so that data is
Hey Tariq,
thanks for your quick answer. I'm not sure if I got the idea in the
seond part of your answer. You mean if I use a timestamp as a rowkey I
should append a hash like this:
135727920+MD5HASH
and then the data would be distributed more equally?
2013/2/19 Mohammad Tariq
No. before the timestamp. All the row keys which are identical go to the
same region. This is the default Hbase behavior and is meant to make the
performance better. But sometimes the machine gets overloaded with reads
and writes because we get concentrated on that particular machine. For
example
Yeah it worked fine.
But as I understand: If I prefix my row key with something like
md5-hash + timestamp
then the rowkeys are probably evenly distributed but how would I
perform then a scan restricted to a special time range?
2013/2/19 Mohammad Tariq donta...@gmail.com:
No. before the
You can use
FuzzyRowFilterhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.htmlto
do that.
Have a look at this
linkhttp://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/.
You might find it helpful.
Warm
30 matches
Mail list logo