Hi Jeremy, I don't see any issue for HBase to handle 4000 tables. However, I don't think it's the best solution for your use case.
JM 2013/9/24 jeremy p <[email protected]> > Short description : I'd like to have 4000 tables in my HBase cluster. Will > this be a problem? In general, what problems do you run into when you try > to host thousands of tables in a cluster? > > Long description : I'd like the performance advantage of pre-split tables, > and I'd also like to do filtered range scans. Imagine a keyspace where the > key consists of : [POSITION]_[WORD] , where POSITION is a number from 1 to > 4000, and WORD is a string consisting of 96 characters. The value in the > cell would be a single integer. My app will examine a 'document', where > each 'line' consists of 4000 WORDs. For each WORD, it'll do a filtered > regex lookup. Only problem? Say I have 200 mappers and they all start at > POSITION 1, my region servers would get hotspotted like crazy. So my idea > is to break it into 4000 tables (one for each POSITION), and then pre-split > the tables such that each region gets an equal amount of the traffic. In > this scenario, the key would just be WORD. Dunno if this a bad idea, would > be open to suggestions > > Thanks! > > --J >
