Jon, Stack, Is there a tentative date when this version (with column scanner) is coming out?
Vaibhav On Mon, Jun 21, 2010 at 9:28 AM, Jonathan Gray <[email protected]> wrote: > Yes, when using Scan, even on 0.20, everything will be sorted. > > Re: OOM, you'll need more memory or you'll need to break stuff up across > rows. Not much else to be done about that :) > > > -----Original Message----- > > From: Andrey Stepachev [mailto:[email protected]] > > Sent: Monday, June 21, 2010 6:40 AM > > To: [email protected] > > Subject: Re: Sorting columns > > > > 2010/6/19 Jonathan Gray <[email protected]> > > > > > So there is no confusion, everything is sorted in HBase. All columns > > in > > > each family are sorted, always. > > > > > > > Thans a good news!. Thanks. I have no time (and enought knowlage of > > hbase) > > to check this myself. No it's clear (and I use scan always for now). > > > > > > > > > > There are optimizations for Get queries (in 0.20 but gone in trunk) > > that > > > make it so that what gets returned to the client is not completely > > sorted > > > though it would be mostly sorted. > > > > Is it true, that if i use Scan (even when scan is really get) in 0.20, > > i'll > > got all things sorted? > > > > > > > Are you returning millions of columns at once? Otherwise it > > shouldn't be > > > too expensive to do the sorted() call in the client. > > > > > I got a OOM when i try to build index (i have 1 index key which points > > to > > 5mil another keys, so I got OOM in server). With infrarow I can scan > > this > > columns (in mr job mostly) to doing some work. > > After I got OOM, i change schema to use compound keys. It is a bit > > complicated to make such keys (instead of simple LongWritable and > > friends). > > May be avro can help, but i don't try yet. With infra row I got > > slightly > > complicated Result scan (i need to detect real key change), but this > > way is > > less complicated, then compound keys. > > > > > > > > > > > > > -----Original Message----- > > > > From: Andrey Stepachev [mailto:[email protected]] > > > > Sent: Saturday, June 19, 2010 5:45 AM > > > > To: [email protected] > > > > Subject: Re: Sorting columns > > > > > > > > 2010/6/19 Stack <[email protected]> > > > > > > > > > On Thu, Jun 17, 2010 at 12:18 PM, Andrey Stepachev > > <[email protected]> > > > > > wrote: > > > > > > As i see in sources there no place, where kv sorted (except > > client > > > > > > Result.sorted() method). So we can get keyvalues from store and > > > > from > > > > > > memstore (and in this case we can get 1 3 5 from stores and 4 > > from > > > > > memstore) > > > > > > in incorrect order. > > > > > > > > > > > > Or I miss something? > > > > > > > > > > > > > > > > Data is sorted in hbase. Scanning, we'll be running a scanner > > > > against > > > > > each data store element -- memstore and one for each store file - > > - > > > > and > > > > > we'll pop off the elements in order. Thats the general story. > > There > > > > > may once have been a legitimate reason for the client-side sort - > > - > > > > > perhaps when our Get and Scan code paths differed it was needed - > > - > > > > but > > > > > as to whether it still required, I'm not sure. I'd have to dig. > > Any > > > > > one else? > > > > > > > > > > > > > It is very interesting to know, is hbase guarantee ordering in > > columns. > > > > Because if > > > > someone will use very wide rows, in absence of sorting, it is not > > very > > > > useful (and of course > > > > someone should know about partitioning problem for wide rows). > > > > Suppose, that we want to work with time data, in that case we can > > use > > > > qualifiers as > > > > date and expect data in sorted order and we can't order it > > somewhere > > > > else, > > > > because > > > > we will lost most of hbase advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> > The rest of the data needs to be accessed occasionally. We > > want > > > > to > > > > > avoid > > > > > >> > getting it shipped to the client as it makes our map reduce > > job > > > > go out > > > > > of > > > > > >> > memory. > > > > > >> > > > > > > >> > > > > > >> You are not using incremental get on a row? You should be > > able to > > > > get > > > > > >> your big rows piecemeal. > > > > > >> > > > > > > This scanner api changes was not included in 0.20.4 :( (infra > > row > > > > > scanner). > > > > > > > > > > > > > > > > Oh. > > > > > > > > > > Sorry about that Andrey. Somehow we missed your backport of > > > > > HBASE-1537. I just applied it. It'll appear in the 0.20.5RC4 > > I'm > > > > > rolling now. Please excuse our bungling. > > > > > > > > > > > > > Not a problem. I'll wait 0.20.5. But I should warn, that with this > > > > patch > > > > 0.20.5 will be not wire compatible with 0.20.4 (because this patch > > adds > > > > additional > > > > field in Scan, and this make Scan binary incompatible). > > > > > > > > I'm, personnaly, not using now infrarow scanner, because of unknown > > > > ordering, i use > > > > compound keys. > > > > More over, infrarow scanning should use separate api (giving Result > > the > > > > ability > > > > to fetch additional kvs for given row) to be mo usable and easy to > > use. > > > >
