Thanks Ted, I'll pre-split the table during ingestion. The reason to keep the rowkey monotonic is for easier working with TableInputFormat, otherwise I would've binned it into 256 splits. (well, I think a good way is to extend TableInputFormat to accept multiple row ranges, if there's an existing efficient implementation, please let me know :)
Would you elaborate a little more on the heap memory usage during scan? Is there any reference to that? Jianshi On Sun, Sep 7, 2014 at 1:20 AM, Ted Yu <[email protected]> wrote: > If you use monotonically increasing rowkeys, separating out the column > family into a new table would give you same issue you're facing today. > > Using a single table, essential column family feature would reduce the > amount of heap memory used during scan. With two tables, there is no such > facility. > > Cheers > > > On Sat, Sep 6, 2014 at 10:11 AM, Jianshi Huang <[email protected]> > wrote: > > > Hi Ted, > > > > Yes, that's the table having RegionTooBusyExceptions :) But the > performance > > I care most are scan performance. > > > > It's mostly for analytics, so I don't care much about atomicity > currently. > > > > What's your suggestion? > > > > Jianshi > > > > > > On Sun, Sep 7, 2014 at 1:08 AM, Ted Yu <[email protected]> wrote: > > > > > Is this the same table you mentioned in the thread about > > > RegionTooBusyException > > > ? > > > > > > If you move the column family to another table, you may have to handle > > > atomicity yourself - currently atomic operations are within region > > > boundaries. > > > > > > Cheers > > > > > > > > > On Sat, Sep 6, 2014 at 9:49 AM, Jianshi Huang <[email protected] > > > > > wrote: > > > > > > > Hi, > > > > > > > > I'm currently putting everything into one table (to make cross > > reference > > > > queries easier) and there's one CF which contains rowkeys very > > different > > > to > > > > the rest. Currently it works well, but I'm wondering if it will cause > > > > performance issues in the future. > > > > > > > > So my questions are > > > > > > > > 1) will there be performance penalties in the way I'm doing? > > > > 2) should I move that CF to a separate table? > > > > > > > > > > > > Thanks, > > > > -- > > > > Jianshi Huang > > > > > > > > LinkedIn: jianshi > > > > Twitter: @jshuang > > > > Github & Blog: http://huangjs.github.com/ > > > > > > > > > > > > > > > -- > > Jianshi Huang > > > > LinkedIn: jianshi > > Twitter: @jshuang > > Github & Blog: http://huangjs.github.com/ > > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
