This isn't called 'shuffle' (but rather a plain remote read) so your original question was confusing, thanks for clarifying!
In that case, you could count the bytes coming in from the required record reader - for example a TextRecordReader uses a Long key that denotes current offset in file, which you could use as a simple, progressing counter of bytes read thus far. On Wed, Dec 26, 2012 at 5:16 PM, Eduard Skaley <[email protected]> wrote: > Hi, > > I mean TO the mappers. I'm using the CompositeInputFormat for my application > to compute map-side joins. > I want to join two datasets A and B one is stored on node 1 and the other > one on node 2. > For example if the join will be computed on node 2 then the inputsplit of > the dataset which is stored on node 1 has to be transferred to node 2. > I want to count the bytes which are shuffled (transferred) TO the mapper of > node 2. > >> Hi, >> >> What do you mean by "shuffled bytes [to] the mappers"? If you mean >> "from", it is "Reduce shuffle bytes" you look for; otherwise, you may >> be looking for the per-map counter of "Map output bytes". >> >> Per-partition counters can be constructed on the user side if needed, >> by pre-computing the partition before emit (using the same >> partitioner) and counting up the bytes of your objects for its >> counter. >> >> On Tue, Dec 25, 2012 at 6:03 PM, Eduard Skaley <[email protected]> >> wrote: >>> >>> Hello guys, >>> >>> I need a counter for shuffled bytes to the mappers. >>> Is there existing one or should I define one myself ? >>> How can I implement such a counter? >>> >>> Thank you and happy Christmas time, >>> Eduard >> >> >> > -- Harsh J
