Thank you all.  This info is really helpful.    I have a follow up question
related to my use case.

I need to create a table called as LOGS to log event info generated from
multiple services that I am calling.

For each rowkey (a random UUID generated in this case), multiple services
are called and success or failure status is logged into LOGS table.
So, My data is something like this...

Rowkey       Filename Service Message
312sdasd31244  file1 service1   success
312sdasd31244  file1 service2   success
312sdasd31244  file2 service1   failure:   Reason for failure
....
..
.
789sdfsf34234  file1 service1   success
789sdfsf34234  file1 service2   success
789sdfsf34234  file2 service3   failure:   Reason for failure
...
..
.


This log info will be accessed by a polling service to track the progress
of Rowkey using REST API.

So, Basically polling service will do GET on rowkey with filter on filename
something like this...

get 'LOGS', '312sdasd31244', {FILTER => "ColumnPrefixFilter('file1')"}

So, My question is how the rowkey to be designed?  Salting may not help
because the access pattern is not random. I will be scanning a range of
rows.

What are the other factors I need to consider to make this really effective
for this use case?

Regards,
Arun


On Tue, May 12, 2015 at 9:58 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Arun:
> See the following for details:
>
> http://hbase.apache.org/book.html#_determining_split_points
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html
>
> Cheers
>
> On Tue, May 12, 2015 at 6:11 AM, Talat Uyarer <ta...@uyarer.com> wrote:
>
> > Hi Arun,
> >
> > rowKeys. Hbase decide which data is stored which region by rowkeys.
> > the RegionSplitter uses MD5 algorithm to generate region starting keys
> > of MD5 checksum.
> >
> > Talat
> >
> >
> >
> > 2015-05-12 15:48 GMT+03:00 Arun Patel <arunp.bigd...@gmail.com>:
> > > Thank you.  This helps.
> > >
> > > So, when I pre-split regions with below command, SPLITALGO is creating
> > the
> > > rowkey boundaries for each region?
> > >
> > > create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
> > >
> > > I am failing to understand HexStringSplit.  As per documentation,The
> > format
> > > of a HexStringSplit region boundary is the ASCII representation of an
> MD5
> > > checksum, or any other uniformly distributed hexadecimal value.
> > >
> > > My Question is MD5 Checksum of what?
> > >
> > > Regards,
> > > Arun
> > >
> > >
> > >
> > >
> > >
> > > On Mon, May 11, 2015 at 8:57 PM, Nick Dimiduk <ndimi...@gmail.com>
> > wrote:
> > >
> > >> On Mon, May 11, 2015 at 3:38 PM, Arun Patel <arunp.bigd...@gmail.com>
> > >> wrote:
> > >>
> > >> > 1) I have a 10 node HBase cluster.  When I create a table in HBase,
> > >> > how many regions will be allocated by default?
> > >>
> > >>
> > >> In HBase, the number of region servers is orthogonal to table
> > partitions.
> > >> These two operational details are related but managed independently.
> > >>
> > >> I looked at the HBase Master UIand it seems regions are not allocated
> to
> > >> > all the Regionservers by
> > >> > default.  How can I allocate the regions in all Region Servers?
> > >>
> > >>
> > >> HBase will evenly balance the regions of all tables it's hosting
> across
> > all
> > >> region servers in the cluster. If you have fewer regions than region
> > >> servers, some servers will have no regions to host.
> > >>
> > >> Basically, This distributes the data in a better way If I am using a
> > slated
> > >> > key. My requirement is to distribute the data across the cluster
> using
> > >> > salted keys.  But, Having few regions is a constraint?
> > >> >
> > >>
> > >> You're moving in the right direction. The next step would be to split
> > your
> > >> table according to some prefix value, presumably related to your
> > "salting"
> > >> choice. This will depend on what value you're prepending to the row
> keys
> > >> and the cardinality of those values. Apache Phoenix does this, for
> > example,
> > >> with a fixed byte prefix and an one pre-split per salt-byte value
> > (i.e., 0,
> > >> 1, 2, 3, ... 15).
> > >>
> > >> 2) How does the rowkey to region mapping works?  In Cassandra, we
> have a
> > >> > concept of assigning token range for each node.  Rowkey will be
> > assigned
> > >> to
> > >> > a node based on the token range.  How does this work in HBase?
> > >>
> > >>
> > >> HBase is ordered and range-partitioned. Basically, your row keys are
> > sorted
> > >> and region boundaries are determined at points within that range. So
> if
> > you
> > >> have rows 'a' - 'z', HBase will define regions as contiguous segments
> of
> > >> this range, 'a' - 'f', and 'g' - 'k' for example. The range of a
> region
> > is
> > >> dictated primarily by the amount of data contained therein. When a
> > region
> > >> becomes too big, it will be split in half and two child regions are
> > created
> > >> (i.e., 'a' - 'f' becomes 'a' - 'c' and 'd' - 'f'). Once a region
> splits,
> > >> the children are independent and can be moved to other region servers.
> > >>
> > >> I explain a bit of this and more in my talk "HBase for Architects". I
> > link
> > >> to a video from my blog [0]. As Michael mentioned, there's more detail
> > >> published in both our book [1], as well as our other books [2], [3].
> > >>
> > >> Welcome to HBase ;)
> > >> -n
> > >>
> > >> [0]: http://www.n10k.com/blog/hbase-for-architects-redux/
> > >> [1]: https://hbase.apache.org/book.html#regions.arch
> > >> [2]: http://www.manning.com/dimidukkhurana/
> > >> [3]: http://shop.oreilly.com/product/0636920033943.do
> > >>
> >
> >
> >
> > --
> > Talat UYARER
> > Websitesi: http://talat.uyarer.com
> > Twitter: http://twitter.com/talatuyarer
> > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
> >
>

Reply via email to