Hi, The situation is fairly complex and there is no super clear answer. Here are some facts:
- Thrift requires the use of a shared-infrastructure client/server app that is another scaling factor - Thrift servers live a long time and thus can more effective amortize the HTable cache across multiple short client runs - Thrift servers dont have as advanced batch put and you could run into scaling issues - The Java API parallelizes as the number of client JVMs you have - there is no real limit (other than your cluster's ability to handle the requests) - The API is probably the way to go if you are in Java - Stumbleupon uses the Thrift API with PHP and it works like a charm. The overhead it adds is surprisingly negligible. We deploy thrift servers on all the regionservers On Thu, Jul 22, 2010 at 2:18 PM, Sylvain Hellegouarch <[email protected]> wrote: > On Thu, Jul 22, 2010 at 8:22 PM, S Ahmed <[email protected]> wrote: > >> Can someone explain, at a high level, how the hbase service is exposed? >> >> Is it a Java socket or? (sorry not that well versed in this) >> >> >> Does anyone have any numbers on the performance differences between using >> the native java driver (that presumably connects 'directly') versus the >> Thrift route? >> > > > Basically when you use the thrift API, you're conversing with a thrift > server that is written using the Java API. In other words, it's similar to > some RPC mechanism, which means you'll introduce some overhead over using > directly the Java API. > > I don't have numbers at hand unfortunately. > > -- > - Sylvain > http://www.defuze.org > http://twitter.com/lawouach >
