Yes, when using Scan, even on 0.20, everything will be sorted. Re: OOM, you'll need more memory or you'll need to break stuff up across rows. Not much else to be done about that :)
> -----Original Message----- > From: Andrey Stepachev [mailto:[email protected]] > Sent: Monday, June 21, 2010 6:40 AM > To: [email protected] > Subject: Re: Sorting columns > > 2010/6/19 Jonathan Gray <[email protected]> > > > So there is no confusion, everything is sorted in HBase. All columns > in > > each family are sorted, always. > > > > Thans a good news!. Thanks. I have no time (and enought knowlage of > hbase) > to check this myself. No it's clear (and I use scan always for now). > > > > > > There are optimizations for Get queries (in 0.20 but gone in trunk) > that > > make it so that what gets returned to the client is not completely > sorted > > though it would be mostly sorted. > > Is it true, that if i use Scan (even when scan is really get) in 0.20, > i'll > got all things sorted? > > > > Are you returning millions of columns at once? Otherwise it > shouldn't be > > too expensive to do the sorted() call in the client. > > > I got a OOM when i try to build index (i have 1 index key which points > to > 5mil another keys, so I got OOM in server). With infrarow I can scan > this > columns (in mr job mostly) to doing some work. > After I got OOM, i change schema to use compound keys. It is a bit > complicated to make such keys (instead of simple LongWritable and > friends). > May be avro can help, but i don't try yet. With infra row I got > slightly > complicated Result scan (i need to detect real key change), but this > way is > less complicated, then compound keys. > > > > > > > > -----Original Message----- > > > From: Andrey Stepachev [mailto:[email protected]] > > > Sent: Saturday, June 19, 2010 5:45 AM > > > To: [email protected] > > > Subject: Re: Sorting columns > > > > > > 2010/6/19 Stack <[email protected]> > > > > > > > On Thu, Jun 17, 2010 at 12:18 PM, Andrey Stepachev > <[email protected]> > > > > wrote: > > > > > As i see in sources there no place, where kv sorted (except > client > > > > > Result.sorted() method). So we can get keyvalues from store and > > > from > > > > > memstore (and in this case we can get 1 3 5 from stores and 4 > from > > > > memstore) > > > > > in incorrect order. > > > > > > > > > > Or I miss something? > > > > > > > > > > > > > Data is sorted in hbase. Scanning, we'll be running a scanner > > > against > > > > each data store element -- memstore and one for each store file - > - > > > and > > > > we'll pop off the elements in order. Thats the general story. > There > > > > may once have been a legitimate reason for the client-side sort - > - > > > > perhaps when our Get and Scan code paths differed it was needed - > - > > > but > > > > as to whether it still required, I'm not sure. I'd have to dig. > Any > > > > one else? > > > > > > > > > > It is very interesting to know, is hbase guarantee ordering in > columns. > > > Because if > > > someone will use very wide rows, in absence of sorting, it is not > very > > > useful (and of course > > > someone should know about partitioning problem for wide rows). > > > Suppose, that we want to work with time data, in that case we can > use > > > qualifiers as > > > date and expect data in sorted order and we can't order it > somewhere > > > else, > > > because > > > we will lost most of hbase advantage. > > > > > > > > > > > > > > > > > > > > > > >> > The rest of the data needs to be accessed occasionally. We > want > > > to > > > > avoid > > > > >> > getting it shipped to the client as it makes our map reduce > job > > > go out > > > > of > > > > >> > memory. > > > > >> > > > > > >> > > > > >> You are not using incremental get on a row? You should be > able to > > > get > > > > >> your big rows piecemeal. > > > > >> > > > > > This scanner api changes was not included in 0.20.4 :( (infra > row > > > > scanner). > > > > > > > > > > > > > Oh. > > > > > > > > Sorry about that Andrey. Somehow we missed your backport of > > > > HBASE-1537. I just applied it. It'll appear in the 0.20.5RC4 > I'm > > > > rolling now. Please excuse our bungling. > > > > > > > > > > Not a problem. I'll wait 0.20.5. But I should warn, that with this > > > patch > > > 0.20.5 will be not wire compatible with 0.20.4 (because this patch > adds > > > additional > > > field in Scan, and this make Scan binary incompatible). > > > > > > I'm, personnaly, not using now infrarow scanner, because of unknown > > > ordering, i use > > > compound keys. > > > More over, infrarow scanning should use separate api (giving Result > the > > > ability > > > to fetch additional kvs for given row) to be mo usable and easy to > use. > >
