RE: Sorting columns

Jonathan Gray Mon, 21 Jun 2010 09:29:02 -0700

Yes, when using Scan, even on 0.20, everything will be sorted.

Re: OOM, you'll need more memory or you'll need to break stuff up across rows.  
Not much else to be done about that :)


> -----Original Message-----
> From: Andrey Stepachev [mailto:[email protected]]
> Sent: Monday, June 21, 2010 6:40 AM
> To: [email protected]
> Subject: Re: Sorting columns
> 
> 2010/6/19 Jonathan Gray <[email protected]>
> 
> > So there is no confusion, everything is sorted in HBase.  All columns
> in
> > each family are sorted, always.
> >
> 
> Thans a good news!. Thanks. I have no time (and enought knowlage of
> hbase)
> to check this myself. No it's clear (and I use scan always for now).
> 
> 
> >
> > There are optimizations for Get queries (in 0.20 but gone in trunk)
> that
> > make it so that what gets returned to the client is not completely
> sorted
> > though it would be mostly sorted.
> 
> Is it true, that if i use Scan (even when scan is really get) in 0.20,
> i'll
> got all things sorted?
> 
> 
> > Are you returning millions of columns at once?  Otherwise it
> shouldn't be
> > too expensive to do the sorted() call in the client.
> >
> I got a OOM when i try to build index (i have 1 index key which points
> to
> 5mil another keys, so I got OOM in server). With infrarow I can scan
> this
> columns (in mr job mostly) to doing some work.
> After I got OOM, i change schema to use compound keys. It is a bit
> complicated to make such keys (instead of simple LongWritable and
> friends).
> May be avro can help, but i don't try yet. With infra row I got
> slightly
> complicated Result scan (i need to detect real key change), but this
> way is
> less complicated, then compound keys.
> 
> 
> 
> >
> > > -----Original Message-----
> > > From: Andrey Stepachev [mailto:[email protected]]
> > > Sent: Saturday, June 19, 2010 5:45 AM
> > > To: [email protected]
> > > Subject: Re: Sorting columns
> > >
> > > 2010/6/19 Stack <[email protected]>
> > >
> > > > On Thu, Jun 17, 2010 at 12:18 PM, Andrey Stepachev
> <[email protected]>
> > > > wrote:
> > > > > As i see in sources there no place, where kv sorted (except
> client
> > > > > Result.sorted() method). So we can get keyvalues from store and
> > > from
> > > > > memstore (and in this case we can get 1 3 5 from stores and 4
> from
> > > > memstore)
> > > > > in incorrect order.
> > > > >
> > > > > Or I miss something?
> > > > >
> > > >
> > > > Data is sorted in hbase.  Scanning, we'll be running a scanner
> > > against
> > > > each data store element -- memstore and one for each store file -
> -
> > > and
> > > > we'll pop off the elements in order.  Thats the general story.
> There
> > > > may once have been a legitimate reason for the client-side sort -
> -
> > > > perhaps when our Get and Scan code paths differed it was needed -
> -
> > > but
> > > > as to whether it still required, I'm not sure.  I'd have to dig.
> Any
> > > > one else?
> > > >
> > >
> > > It is very interesting to know, is hbase guarantee ordering in
> columns.
> > > Because if
> > > someone will use very wide rows, in absence of sorting, it is not
> very
> > > useful (and of course
> > > someone should know about partitioning problem for wide rows).
> > > Suppose, that we want to work with time data, in that case we can
> use
> > > qualifiers as
> > > date and expect data in sorted order and we can't order it
> somewhere
> > > else,
> > > because
> > > we will lost most of hbase advantage.
> > >
> > >
> > >
> > > >
> > > > >
> > > > >> > The rest of the data needs to be accessed occasionally. We
> want
> > > to
> > > > avoid
> > > > >> > getting it shipped to the client as it makes our map reduce
> job
> > > go out
> > > > of
> > > > >> > memory.
> > > > >> >
> > > > >>
> > > > >> You are not using incremental get on a row?  You should be
> able to
> > > get
> > > > >> your big rows piecemeal.
> > > > >>
> > > > > This scanner api changes was not included in 0.20.4 :( (infra
> row
> > > > scanner).
> > > > >
> > > >
> > > > Oh.
> > > >
> > > > Sorry about that Andrey.  Somehow we missed your backport of
> > > > HBASE-1537.  I just applied it.  It'll appear in the 0.20.5RC4
> I'm
> > > > rolling now.  Please excuse our bungling.
> > > >
> > >
> > > Not a problem. I'll wait 0.20.5. But I should warn, that with this
> > > patch
> > > 0.20.5 will be not wire compatible with 0.20.4 (because this patch
> adds
> > > additional
> > > field in Scan, and this make Scan binary incompatible).
> > >
> > > I'm, personnaly, not using now infrarow scanner, because of unknown
> > > ordering, i use
> > > compound keys.
> > > More over, infrarow scanning should use separate api (giving Result
> the
> > > ability
> > > to fetch additional kvs for given row) to be mo usable and easy to
> use.
> >

RE: Sorting columns

Reply via email to