So the difference between a pool and an HTable would be negligible in a typical map-reduce environment, right.. if I am not creating any new HTable instances in the map and reduce phases? Perhaps creating a pool can have negative impact in this case?
e.g, what performance impact can I expect in my bulk uploading mapreducejob? I create an HTable connection in the run() method, each map converts a line from a text file to a put instance. Also, it would be great if any of you could point me to an example usage of TablePool. hari On Tue, Nov 9, 2010 at 9:44 PM, Michael Segel <[email protected]>wrote: > > > > > Date: Tue, 9 Nov 2010 09:57:42 -0600 > > Subject: Re: Why and When to use HTablePool? > > From: [email protected] > > To: [email protected] > > > > Two differences that I know of:) > > > > With htable you bear the overhead of instantiating the htable for each > time > > you need access to it. The overhead can be substantial if response time > is > > your biggest concern. > > Example: contact = *new* HTable(config, "contact"); > > > Huh? > > Sorry, but that's a bit of an overly broad statement. > > When you're using hbase in a map/reduce environment, you set up a single > htable instance in setup() > then reference it in your map() method. So you incur the cost of setting up > the htable once. > > > If you're working in a single node, and a multi-threaded application like a > web service reading from HBase, then you may want to have a pool > of connections. Totally different design. > > The use case for the HTablePool is pretty much the same as any application > where you need to fetch a resource from a pool rather than constantly > instantiate them. > > Really the driving factor on which to use (HTable or HTablePool) is going > to be your use case, or rather what it is you hope to achieve. > >
