So the difference between a pool and an HTable would be negligible in a
typical map-reduce environment, right.. if I am not creating any new HTable
instances in the map and reduce phases? Perhaps creating a pool can have
negative impact in this case?

e.g, what performance impact can I expect in my bulk uploading mapreducejob?
I create an HTable connection in the run() method, each map converts a line
from a text file to a put instance. Also, it would be great if any of you
could point me to an example usage of TablePool.

hari

On Tue, Nov 9, 2010 at 9:44 PM, Michael Segel <[email protected]>wrote:

>
>
>
> > Date: Tue, 9 Nov 2010 09:57:42 -0600
> > Subject: Re: Why and When to use HTablePool?
> > From: [email protected]
> > To: [email protected]
> >
> > Two differences that I know of:)
> >
> > With htable you bear the overhead of instantiating the htable for each
> time
> > you need access to it.  The overhead can be substantial if response time
> is
> > your biggest concern.
> > Example:  contact = *new* HTable(config, "contact");
> >
> Huh?
>
> Sorry, but that's a bit of an overly broad statement.
>
> When you're using hbase in a map/reduce environment, you set up a single
> htable instance in setup()
> then reference it in your map() method. So you incur the cost of setting up
> the htable once.
>
>
> If you're working in a single node, and a multi-threaded application like a
> web service reading from HBase, then you may want to have a pool
> of connections. Totally different design.
>
> The use case for the HTablePool is pretty much the same as any application
> where you need to fetch a resource from a pool rather than constantly
> instantiate them.
>
> Really the driving factor on which to use (HTable or HTablePool) is going
> to be your use case, or rather what it is you hope to achieve.
>
>

Reply via email to