I trying to understand this concept a little better and could use some help 
from the larger community. I jotted down a couple of quick notes as I was 
reading through the material of local vs remote:


1.       In local mode, each hive client will invoke a connection to the 
database. If there are several clients connected to the database this could 
overwhelm the instance depending on the max connection parameter set. By 
default, this value is set at 151 (in MySQL) and can be bumped up to large 
value depending on how much ram the box has.

2.       In remote mode, each of the clients go through the metastore service.


The question here is:


1.       Can each node on the cluster have a separate metastore service when 
using the remote metastore configuration?

a.       If so managing this seems like a nightmare in terms of keeping the 
logs in sync.

b.      This seems to be like a single point of failure as all connections are 
routed through a metastore service.

2.       What is preferred approach here with respect to local vs remote?

3.       In order to avoid overwhelming the database should the following 
parameters be tuned:

a.       hive.metastore.server.min.threads

b.      hive.metastore.server.max.threads

Thanks,
Ranjith

Reply via email to