Is it because of the 'hop'? Java goes against RS. The thrift C++ goes to a thriftserver which hosts a java client and then it goes to the RS? St.Ack
On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <[email protected]> wrote: > hi, guys, > > I am trying to get a rough idea about the performance comparison between > c++ and java client when access HBase table, and is surprised to find out > that Thrift (c++) is 4X slower > > The performance result is: > C++: real *16m11.313s*; user 5m3.642s; sys 2m21.388s > Java: real *4m6.012s*;user 0m31.228s; sys 0m8.018s > > > I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and use the > largest table : lineitem, which has 6M rows, roughly 600MB data. > > For c++ client, I used the thrift example provided by hbase-examples, the > C++ code looks like: > > > std::string t("lineitem"); > > int scanner = client.scannerOpenWithScan(t, tscan, dummyAttributes); > > int count = 0; > > .. > > while (true) { > > std::vector<TRowResult> value; > > client.scannerGet(value, scanner); > > if (value.size() == 0) break; > > count ++; > > } > > > > std::cout << count << " rows scanned"<< std::endl; > > > > For java client is the most simple one: > > > HTable table = new HTable(conf,"lineitem"); > > > > Scan scan = new Scan(); > > ResultScanner resScanner; > > resScanner = table.getScanner(scan); > > int count = 0; > > for (Result res: resScanner) { > > count ++; > > } > > > > > > Since most of the time should be on I/O, I don't expect any significant > difference between Thrift(C++) and Java. Any ideas? Many thanks > > Demai >
