Sorry Demai, I have no access to that code currently. But what you described seems that you use thrift v1. I'd recommend to use thrift2.
Also it is a good idea to check thrift server configuration: 1. blocking/nonblocking/hsha, and framed or not 2. size of thread pool On Mon, Mar 9, 2015 at 9:26 PM, Demai Ni <[email protected]> wrote: > Andrey and all, > > thanks for the input. Andrey, if possible, do you mind share your code > segment so I can follow the setting on your side? > > I have exactly the same thought when face the result first time. I was > expecting a little bit performance issue (10~20%) when using Thrift(C++), > and not as much. > > Now I am looking into the C++ api call. Original, I used > "client.scannerGet(value, scanner)" ,which will do a lot of prepare > work(like flush) for each call. I just changed the code to use > "client.scannerGetList(value,scanner, 10000);". Sure enough, the > performance improved. However, for a similiar comparison, I did set java > client to 10000 batch/cache. Here is the new code: > > > *C++* > > TScan tscan; > > int scanner = client.scannerOpenWithScan(t, tscan, dummyAttributes); > > int count = 0; > > try { > > while (true) { > > std::vector<TRowResult> value; > > > > client.scannerGetList(value,scanner, *10000*); > > if (value.size() == 0) { > > break; > > } else count+=value.size(); > > } > > > > *Java * > int total = 0; > > scan = new Scan(); > > * scan.setCaching(10000); scan.setBatch(10000);* > resScanner = table.getScanner(scan); > int count = 0; > for (Result res: resScanner) { > count ++; > } > > so both client code improved as expected, and the Thrift C++ still take 3X > time comparing to Java: > C++ : real 6m46.845s, user 1m59.636s, sys 0m11.984s > Java: real 2m27.245s, user 0m17.624s, sys 0m4.779s > > To be fair, I am able to setCaching on Java Client, but didn't find a way > to do the same through the C++ API, which also make some difference > > Demai > > > On Sun, Mar 8, 2015 at 1:40 PM, Andrey Stepachev <[email protected]> wrote: > > > Hi Demai. > > > > Thats seems odd for me, in my tests I got very similar performance. > > I'd like to suggest to check that scans have identical parameters > > (cache size in particular). That can bring very different performance > > in you case. > > > > Thanks. > > > > On Sun, Mar 8, 2015 at 6:50 PM, Mike Axiak <[email protected]> wrote: > > > > > If you're going the JNI route, the best bet is to embed a VM in your C > > > project. You use "java -s -p" to create the required header files and > > > compile linking against the java library. This article talks about > > > how to talk from C to Java: > > > > > > > > > http://www.codeproject.com/Articles/22881/How-to-Call-Java-Functions-from-C-Using-JNI > > > > > > Best, > > > Mike > > > > > > On Sun, Mar 8, 2015 at 10:29 AM, Michael Segel > > > <[email protected]> wrote: > > > > JNI example? > > > > > > > > I don’t have one… my client’s own the code so I can’t take it with me > > > and share. > > > > (The joys of being a consultant means you can’t take it with you and > > you > > > need to make sure you don’t xfer IP accidentally. ) > > > > > > > > > > > > Maybe in one of the HBase books? Or just google for a JNI example on > > the > > > web since its straight forward Java code to connect to HBase and then > > > straight JNI t talk to C/C++ > > > > > > > > > > > >> On Mar 7, 2015, at 5:56 PM, Demai Ni <[email protected]> wrote: > > > >> > > > >> Nick, thanks. I will give REST a try. However, if it use the same > > > design, > > > >> the result probably will be the same. > > > >> > > > >> Michael, I was thinking about the same thing through JNI. Is there > an > > > >> example I can follow? > > > >> > > > >> Mike (Axiak), I run the C++ client on the same linux machine as the > > > hbase > > > >> and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. It > > > doesn't > > > >> make a difference, does it? > > > >> > > > >> Anyway, considering Thrift will get the scan result from HBase > first, > > > then > > > >> my c++ client the same data from Thrift. It definitely > cost(probably) > > > >> double the time/cpu. So JNI may be the right way to go. Is there an > > > example > > > >> I can use? thanks > > > >> > > > >> Demai > > > >> > > > >> On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <[email protected]> wrote: > > > >> > > > >>> What if you install the thrift server locally on every C++ client > > > >>> machine? I'd imagine performance should be similar to native java > > > >>> performance at that point. > > > >>> > > > >>> -Mike > > > >>> > > > >>> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel < > > > [email protected]> > > > >>> wrote: > > > >>>> Or you could try a java connection wrapped by JNI so you can call > it > > > >>> from your C++ app. > > > >>>> > > > >>>>> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <[email protected]> > > wrote: > > > >>>>> > > > >>>>> You can try the REST gateway, though it has the same basic > > > architecture > > > >>> as > > > >>>>> the thrift gateway. May be the details work out in your favor > over > > > rest. > > > >>>>> > > > >>>>> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <[email protected]> > wrote: > > > >>>>> > > > >>>>>> Stack, > > > >>>>>> > > > >>>>>> Thanks for the quick response. Well, the extra layer really kill > > the > > > >>>>>> Performance. The 'hop' is so expensive > > > >>>>>> > > > >>>>>> Is there another C/C++ api to try out? I saw there is a jira > > > >>> Hbase-1015, > > > >>>>>> but was inactive for a while. > > > >>>>>> > > > >>>>>> Demai > > > >>>>>> > > > >>>>>> Stack <[email protected]> wrote: > > > >>>>>> > > > >>>>>>> Is it because of the 'hop'? Java goes against RS. The thrift > C++ > > > >>> goes to > > > >>>>>> a > > > >>>>>>> thriftserver which hosts a java client and then it goes to the > > RS? > > > >>>>>>> St.Ack > > > >>>>>>> > > > >>>>>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <[email protected]> > > wrote: > > > >>>>>>> > > > >>>>>>>> hi, guys, > > > >>>>>>>> > > > >>>>>>>> I am trying to get a rough idea about the performance > comparison > > > >>> between > > > >>>>>>>> c++ and java client when access HBase table, and is surprised > to > > > find > > > >>>>>> out > > > >>>>>>>> that Thrift (c++) is 4X slower > > > >>>>>>>> > > > >>>>>>>> The performance result is: > > > >>>>>>>> C++: real *16m11.313s*; user 5m3.642s; sys 2m21.388s > > > >>>>>>>> Java: real *4m6.012s*;user 0m31.228s; sys 0m8.018s > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> I have a single node HBase(98.6) cluster, with 1X TPCH loaded, > > and > > > >>> use > > > >>>>>> the > > > >>>>>>>> largest table : lineitem, which has 6M rows, roughly 600MB > data. > > > >>>>>>>> > > > >>>>>>>> For c++ client, I used the thrift example provided by > > > hbase-examples, > > > >>>>>> the > > > >>>>>>>> C++ code looks like: > > > >>>>>>>> > > > >>>>>>>>> std::string t("lineitem"); > > > >>>>>>>>> int scanner = client.scannerOpenWithScan(t, tscan, > > > >>> dummyAttributes); > > > >>>>>>>>> int count = 0; > > > >>>>>>>>> .. > > > >>>>>>>>> while (true) { > > > >>>>>>>>> std::vector<TRowResult> value; > > > >>>>>>>>> client.scannerGet(value, scanner); > > > >>>>>>>>> if (value.size() == 0) break; > > > >>>>>>>>> count ++; > > > >>>>>>>>> } > > > >>>>>>>>> > > > >>>>>>>>> std::cout << count << " rows scanned"<< std::endl; > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> For java client is the most simple one: > > > >>>>>>>> > > > >>>>>>>>> HTable table = new HTable(conf,"lineitem"); > > > >>>>>>>>> > > > >>>>>>>>> Scan scan = new Scan(); > > > >>>>>>>>> ResultScanner resScanner; > > > >>>>>>>>> resScanner = table.getScanner(scan); > > > >>>>>>>>> int count = 0; > > > >>>>>>>>> for (Result res: resScanner) { > > > >>>>>>>>> count ++; > > > >>>>>>>>> } > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> Since most of the time should be on I/O, I don't expect any > > > >>> significant > > > >>>>>>>> difference between Thrift(C++) and Java. Any ideas? Many > thanks > > > >>>>>>>> > > > >>>>>>>> Demai > > > >>>>>>>> > > > >>>>>> > > > >>>> > > > >>>> The opinions expressed here are mine, while they may reflect a > > > cognitive > > > >>> thought, that is purely accidental. > > > >>>> Use at your own risk. > > > >>>> Michael Segel > > > >>>> michael_segel (AT) hotmail.com > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>> > > > > > > > > The opinions expressed here are mine, while they may reflect a > > cognitive > > > thought, that is purely accidental. > > > > Use at your own risk. > > > > Michael Segel > > > > michael_segel (AT) hotmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Andrey. > > > -- Andrey.
