On Thu, Oct 25, 2012 at 9:00 AM, Nick maillard <[email protected]> wrote: > Hi Jean-Daniel > > Again thanks for the quick reply and for the env detail I'll get to it. > > Of course select count (*) is not what I want to optimize. > My more regular queries will have an Hbase schema designed for them using the > rowkeys and potentially column families etc... > I'm guessing Hive uses the rowkey hash aspect when in the sql query.
HBase row keys aren't hashes, it relies completely on their lexicographical sorted nature. Hive really just uses HBase's input format, which creates 1 map per region. Then each mapper scans from the start key to the end key within each region in parallel with the other mappers. > > My question on a more general note. When querying hbase through hive on tables > that have not been designed specifically with that typeof query in mind I > wanted > to keep query time low. I'm trying to get a feel of when I should make table > with a thought out rowkey, family etc.. and to what extent I can have a decent > query time on more exotic queries. What kind of "decent query time" are you looking for? > > I am trying to decide If I make several tables on a dataset for the very > common > queries and for other more rare queries If Hive can give me good resolve time > or > If I should use to extract a good view to feed to other querying systems, like > big query or Mysql or anything. It really depends on what your use case is.
