There will be a development release sometime next week but that will not be recommended for production usage.
There is no release date for the full version but I think we're hoping to have a release candidate before the end of July. > -----Original Message----- > From: Vaibhav Puranik [mailto:[email protected]] > Sent: Monday, June 21, 2010 9:48 AM > To: [email protected] > Subject: Re: Sorting columns > > Jon, Stack, > > Is there a tentative date when this version (with column scanner) is > coming > out? > > Vaibhav > > On Mon, Jun 21, 2010 at 9:28 AM, Jonathan Gray <[email protected]> > wrote: > > > Yes, when using Scan, even on 0.20, everything will be sorted. > > > > Re: OOM, you'll need more memory or you'll need to break stuff up > across > > rows. Not much else to be done about that :) > > > > > -----Original Message----- > > > From: Andrey Stepachev [mailto:[email protected]] > > > Sent: Monday, June 21, 2010 6:40 AM > > > To: [email protected] > > > Subject: Re: Sorting columns > > > > > > 2010/6/19 Jonathan Gray <[email protected]> > > > > > > > So there is no confusion, everything is sorted in HBase. All > columns > > > in > > > > each family are sorted, always. > > > > > > > > > > Thans a good news!. Thanks. I have no time (and enought knowlage of > > > hbase) > > > to check this myself. No it's clear (and I use scan always for > now). > > > > > > > > > > > > > > There are optimizations for Get queries (in 0.20 but gone in > trunk) > > > that > > > > make it so that what gets returned to the client is not > completely > > > sorted > > > > though it would be mostly sorted. > > > > > > Is it true, that if i use Scan (even when scan is really get) in > 0.20, > > > i'll > > > got all things sorted? > > > > > > > > > > Are you returning millions of columns at once? Otherwise it > > > shouldn't be > > > > too expensive to do the sorted() call in the client. > > > > > > > I got a OOM when i try to build index (i have 1 index key which > points > > > to > > > 5mil another keys, so I got OOM in server). With infrarow I can > scan > > > this > > > columns (in mr job mostly) to doing some work. > > > After I got OOM, i change schema to use compound keys. It is a bit > > > complicated to make such keys (instead of simple LongWritable and > > > friends). > > > May be avro can help, but i don't try yet. With infra row I got > > > slightly > > > complicated Result scan (i need to detect real key change), but > this > > > way is > > > less complicated, then compound keys. > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Andrey Stepachev [mailto:[email protected]] > > > > > Sent: Saturday, June 19, 2010 5:45 AM > > > > > To: [email protected] > > > > > Subject: Re: Sorting columns > > > > > > > > > > 2010/6/19 Stack <[email protected]> > > > > > > > > > > > On Thu, Jun 17, 2010 at 12:18 PM, Andrey Stepachev > > > <[email protected]> > > > > > > wrote: > > > > > > > As i see in sources there no place, where kv sorted (except > > > client > > > > > > > Result.sorted() method). So we can get keyvalues from store > and > > > > > from > > > > > > > memstore (and in this case we can get 1 3 5 from stores and > 4 > > > from > > > > > > memstore) > > > > > > > in incorrect order. > > > > > > > > > > > > > > Or I miss something? > > > > > > > > > > > > > > > > > > > Data is sorted in hbase. Scanning, we'll be running a > scanner > > > > > against > > > > > > each data store element -- memstore and one for each store > file - > > > - > > > > > and > > > > > > we'll pop off the elements in order. Thats the general > story. > > > There > > > > > > may once have been a legitimate reason for the client-side > sort - > > > - > > > > > > perhaps when our Get and Scan code paths differed it was > needed - > > > - > > > > > but > > > > > > as to whether it still required, I'm not sure. I'd have to > dig. > > > Any > > > > > > one else? > > > > > > > > > > > > > > > > It is very interesting to know, is hbase guarantee ordering in > > > columns. > > > > > Because if > > > > > someone will use very wide rows, in absence of sorting, it is > not > > > very > > > > > useful (and of course > > > > > someone should know about partitioning problem for wide rows). > > > > > Suppose, that we want to work with time data, in that case we > can > > > use > > > > > qualifiers as > > > > > date and expect data in sorted order and we can't order it > > > somewhere > > > > > else, > > > > > because > > > > > we will lost most of hbase advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > The rest of the data needs to be accessed occasionally. > We > > > want > > > > > to > > > > > > avoid > > > > > > >> > getting it shipped to the client as it makes our map > reduce > > > job > > > > > go out > > > > > > of > > > > > > >> > memory. > > > > > > >> > > > > > > > >> > > > > > > >> You are not using incremental get on a row? You should be > > > able to > > > > > get > > > > > > >> your big rows piecemeal. > > > > > > >> > > > > > > > This scanner api changes was not included in 0.20.4 :( > (infra > > > row > > > > > > scanner). > > > > > > > > > > > > > > > > > > > Oh. > > > > > > > > > > > > Sorry about that Andrey. Somehow we missed your backport of > > > > > > HBASE-1537. I just applied it. It'll appear in the > 0.20.5RC4 > > > I'm > > > > > > rolling now. Please excuse our bungling. > > > > > > > > > > > > > > > > Not a problem. I'll wait 0.20.5. But I should warn, that with > this > > > > > patch > > > > > 0.20.5 will be not wire compatible with 0.20.4 (because this > patch > > > adds > > > > > additional > > > > > field in Scan, and this make Scan binary incompatible). > > > > > > > > > > I'm, personnaly, not using now infrarow scanner, because of > unknown > > > > > ordering, i use > > > > > compound keys. > > > > > More over, infrarow scanning should use separate api (giving > Result > > > the > > > > > ability > > > > > to fetch additional kvs for given row) to be mo usable and easy > to > > > use. > > > > > >
