bq. 16 to 256 ranges

Would each range be within single region or the range may span regions ?
Are the ranges dynamic ?

Using command line for multiple ranges would be out of question. A file
with ranges is needed.

Cheers


On Sat, Sep 6, 2014 at 11:18 AM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Thanks Ted for the reference.
>
> That's right, extend the row.start and row.end to specify multiple ranges
> and also getSplits.
>
> I would probably bin the event sequence CF into 16 to 256 bins. So 16 to
> 256 ranges.
>
> Jianshi
>
>
>
> On Sun, Sep 7, 2014 at 2:09 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Please refer to HBASE-5416 Filter on one CF and if a match, then load and
> > return full row
> >
> > bq. to extend TableInputFormat to accept multiple row ranges
> >
> > You mean extending hbase.mapreduce.scan.row.start and
> > hbase.mapreduce.scan.row.stop so that multiple ranges can be specified ?
> > How many such ranges do you normally need ?
> >
> > Cheers
> >
> >
> > On Sat, Sep 6, 2014 at 11:01 AM, Jianshi Huang <jianshi.hu...@gmail.com>
> > wrote:
> >
> > > Thanks Ted,
> > >
> > > I'll pre-split the table during ingestion. The reason to keep the
> rowkey
> > > monotonic is for easier working with TableInputFormat, otherwise I
> > would've
> > > binned it into 256 splits. (well, I think a good way is to extend
> > > TableInputFormat to accept multiple row ranges, if there's an existing
> > > efficient implementation, please let me know :)
> > >
> > > Would you elaborate a little more on the heap memory usage during scan?
> > Is
> > > there any reference to that?
> > >
> > > Jianshi
> > >
> > >
> > >
> > > On Sun, Sep 7, 2014 at 1:20 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > If you use monotonically increasing rowkeys, separating out the
> column
> > > > family into a new table would give you same issue you're facing
> today.
> > > >
> > > > Using a single table, essential column family feature would reduce
> the
> > > > amount of heap memory used during scan. With two tables, there is no
> > such
> > > > facility.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Sat, Sep 6, 2014 at 10:11 AM, Jianshi Huang <
> > jianshi.hu...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > Yes, that's the table having RegionTooBusyExceptions :) But the
> > > > performance
> > > > > I care most are scan performance.
> > > > >
> > > > > It's mostly for analytics, so I don't care much about atomicity
> > > > currently.
> > > > >
> > > > > What's your suggestion?
> > > > >
> > > > > Jianshi
> > > > >
> > > > >
> > > > > On Sun, Sep 7, 2014 at 1:08 AM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > > >
> > > > > > Is this the same table you mentioned in the thread about
> > > > > > RegionTooBusyException
> > > > > > ?
> > > > > >
> > > > > > If you move the column family to another table, you may have to
> > > handle
> > > > > > atomicity yourself - currently atomic operations are within
> region
> > > > > > boundaries.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > > On Sat, Sep 6, 2014 at 9:49 AM, Jianshi Huang <
> > > jianshi.hu...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm currently putting everything into one table (to make cross
> > > > > reference
> > > > > > > queries easier) and there's one CF which contains rowkeys very
> > > > > different
> > > > > > to
> > > > > > > the rest. Currently it works well, but I'm wondering if it will
> > > cause
> > > > > > > performance issues in the future.
> > > > > > >
> > > > > > > So my questions are
> > > > > > >
> > > > > > > 1) will there be performance penalties in the way I'm doing?
> > > > > > > 2) should I move that CF to a separate table?
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > --
> > > > > > > Jianshi Huang
> > > > > > >
> > > > > > > LinkedIn: jianshi
> > > > > > > Twitter: @jshuang
> > > > > > > Github & Blog: http://huangjs.github.com/
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jianshi Huang
> > > > >
> > > > > LinkedIn: jianshi
> > > > > Twitter: @jshuang
> > > > > Github & Blog: http://huangjs.github.com/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Reply via email to