one-CF?

Jianshi Huang Sat, 06 Sep 2014 11:19:44 -0700

Thanks Ted for the reference.

That's right, extend the row.start and row.end to specify multiple ranges
and also getSplits.


I would probably bin the event sequence CF into 16 to 256 bins. So 16 to
256 ranges.

Jianshi



On Sun, Sep 7, 2014 at 2:09 AM, Ted Yu <[email protected]> wrote:

> Please refer to HBASE-5416 Filter on one CF and if a match, then load and
> return full row
>
> bq. to extend TableInputFormat to accept multiple row ranges
>
> You mean extending hbase.mapreduce.scan.row.start and
> hbase.mapreduce.scan.row.stop so that multiple ranges can be specified ?
> How many such ranges do you normally need ?
>
> Cheers
>
>
> On Sat, Sep 6, 2014 at 11:01 AM, Jianshi Huang <[email protected]>
> wrote:
>
> > Thanks Ted,
> >
> > I'll pre-split the table during ingestion. The reason to keep the rowkey
> > monotonic is for easier working with TableInputFormat, otherwise I
> would've
> > binned it into 256 splits. (well, I think a good way is to extend
> > TableInputFormat to accept multiple row ranges, if there's an existing
> > efficient implementation, please let me know :)
> >
> > Would you elaborate a little more on the heap memory usage during scan?
> Is
> > there any reference to that?
> >
> > Jianshi
> >
> >
> >
> > On Sun, Sep 7, 2014 at 1:20 AM, Ted Yu <[email protected]> wrote:
> >
> > > If you use monotonically increasing rowkeys, separating out the column
> > > family into a new table would give you same issue you're facing today.
> > >
> > > Using a single table, essential column family feature would reduce the
> > > amount of heap memory used during scan. With two tables, there is no
> such
> > > facility.
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Sep 6, 2014 at 10:11 AM, Jianshi Huang <
> [email protected]>
> > > wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > Yes, that's the table having RegionTooBusyExceptions :) But the
> > > performance
> > > > I care most are scan performance.
> > > >
> > > > It's mostly for analytics, so I don't care much about atomicity
> > > currently.
> > > >
> > > > What's your suggestion?
> > > >
> > > > Jianshi
> > > >
> > > >
> > > > On Sun, Sep 7, 2014 at 1:08 AM, Ted Yu <[email protected]> wrote:
> > > >
> > > > > Is this the same table you mentioned in the thread about
> > > > > RegionTooBusyException
> > > > > ?
> > > > >
> > > > > If you move the column family to another table, you may have to
> > handle
> > > > > atomicity yourself - currently atomic operations are within region
> > > > > boundaries.
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > On Sat, Sep 6, 2014 at 9:49 AM, Jianshi Huang <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm currently putting everything into one table (to make cross
> > > > reference
> > > > > > queries easier) and there's one CF which contains rowkeys very
> > > > different
> > > > > to
> > > > > > the rest. Currently it works well, but I'm wondering if it will
> > cause
> > > > > > performance issues in the future.
> > > > > >
> > > > > > So my questions are
> > > > > >
> > > > > > 1) will there be performance penalties in the way I'm doing?
> > > > > > 2) should I move that CF to a separate table?
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > --
> > > > > > Jianshi Huang
> > > > > >
> > > > > > LinkedIn: jianshi
> > > > > > Twitter: @jshuang
> > > > > > Github & Blog: http://huangjs.github.com/
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Re: One-table w/ multi-CF or multi-table w/ one-CF?

Reply via email to