2010/6/19 Stack <[email protected]>

> On Thu, Jun 17, 2010 at 12:18 PM, Andrey Stepachev <[email protected]>
> wrote:
> > As i see in sources there no place, where kv sorted (except client
> > Result.sorted() method). So we can get keyvalues from store and from
> > memstore (and in this case we can get 1 3 5 from stores and 4 from
> memstore)
> > in incorrect order.
> >
> > Or I miss something?
> >
>
> Data is sorted in hbase.  Scanning, we'll be running a scanner against
> each data store element -- memstore and one for each store file -- and
> we'll pop off the elements in order.  Thats the general story.  There
> may once have been a legitimate reason for the client-side sort --
> perhaps when our Get and Scan code paths differed it was needed -- but
> as to whether it still required, I'm not sure.  I'd have to dig.  Any
> one else?
>

It is very interesting to know, is hbase guarantee ordering in columns.
Because if
someone will use very wide rows, in absence of sorting, it is not very
useful (and of course
someone should know about partitioning problem for wide rows).
Suppose, that we want to work with time data, in that case we can use
qualifiers as
date and expect data in sorted order and we can't order it somewhere else,
because
we will lost most of hbase advantage.



>
> >
> >> > The rest of the data needs to be accessed occasionally. We want to
> avoid
> >> > getting it shipped to the client as it makes our map reduce job go out
> of
> >> > memory.
> >> >
> >>
> >> You are not using incremental get on a row?  You should be able to get
> >> your big rows piecemeal.
> >>
> > This scanner api changes was not included in 0.20.4 :( (infra row
> scanner).
> >
>
> Oh.
>
> Sorry about that Andrey.  Somehow we missed your backport of
> HBASE-1537.  I just applied it.  It'll appear in the 0.20.5RC4 I'm
> rolling now.  Please excuse our bungling.
>

Not a problem. I'll wait 0.20.5. But I should warn, that with this patch
0.20.5 will be not wire compatible with 0.20.4 (because this patch adds
additional
field in Scan, and this make Scan binary incompatible).

I'm, personnaly, not using now infrarow scanner, because of unknown
ordering, i use
compound keys.
More over, infrarow scanning should use separate api (giving Result the
ability
to fetch additional kvs for given row) to be mo usable and easy to use.

Reply via email to