Hey Christopher, Thanks for reporting back. One thing about this is unless you have contention at your top of the rack switches, issuing a get on the local node or a remote one shouldn't be very different. What is going to make a big difference is if you have to hit disk or not.
J-D On Tue, Feb 14, 2012 at 7:27 AM, Christopher Dorner <[email protected]> wrote: > Hi, > sorry for a very late reply on this topic, but i was busy and now i > promised to report back. > > I implemented your suggested "hack" :) It is actually only few lines of > code. One for getting the machines hostname and one for retrieving the > destination of the get request. Then i set up two counters, one for the > data local get requests and one for the others. > > It gives us some sort of idea about the network I/O when having GET > requests inside mappers, but it is kind of obvious: > > Since data locality only kicks in for the input to the mapper (HBase table > scan or straight from HDFS, which both work very well), it is unpredictable > to which machine the request will be pointed at. > I am not sure what exactly to do with this information. As I said, it can > only help to estimate network traffic, but it does not help to tune this > aspect in any way. > > But as a conclusion, it is possible to retrieve this sort of information by > dirty hacks :) > > Regards, > Christopher > > > > > On Mon, Jan 9, 2012 at 11:36 PM, Jean-Daniel Cryans > <[email protected]>wrote: > >> It would definitely be interesting, please do report back. >> >> Thx, >> >> J-D >> >> On Mon, Jan 9, 2012 at 2:33 PM, Christopher Dorner >> <[email protected]> wrote: >> > Thank you for the reply. >> > Though that sounds a bit like some dirty hacking, it seems to be doable. >> I >> > think i will give it a try. >> > I can report back when i get some usable results. Maybe some more people >> are >> > interested in that. >> > >> > Christopher >> > >> > >> > Am 09.01.2012 23:15, schrieb Jean-Daniel Cryans: >> > >> >> Short answer: no. >> >> >> >> Painful way to get around the problem: >> >> >> >> You *could* by looking up the machines hostname when the job starts >> >> and then from the HConnection that HTables can give you through >> >> getConnection() do getRegionLocation for the row you are going to Get >> >> and then get the hostname by getServerAddress().getHostname() >> >> >> >> J-D >> >> >> >> On Mon, Jan 9, 2012 at 1:19 PM, Christopher Dorner >> >> <[email protected]> wrote: >> >>> >> >>> Hi, >> >>> >> >>> i am using the input of a mapper as a rowkey to make a GET Request to a >> >>> table. >> >>> >> >>> Is it somehow possible to retrieve information about how much data had >> to >> >>> be >> >>> transferred over network or how many of the requests were data local >> >>> (namenodes are also regionservers) or where the request was not on the >> >>> same >> >>> node? >> >>> >> >>> That would be some really cool and useful statistics for us :) >> >>> >> >>> Thank you, >> >>> >> >>> Christopher Dorner >> > >> > >>
