Hi Dave, thanks for your reply. Now it's more clear; in fact the code that I wrote is inspired to the old api, where the behavior is another. So, how can I achieve the same behavior as the old api? I need the second field of the first key object to stay the same among the iterations, in order to compare it with other objects. Do I have to clone the object?
Thanks. On 15 October 2012 21:27, Dave Beech <[email protected]> wrote: > Hi Alberto > > The iterator you are looping over in your reduce method isn't a > self-contained list of values. What's actually happening is that > you're iterating through *part* of the sorted key/value set that was > sent to that reduce node, and it is the grouping comparator that > decides when to break that loop and call reduce again on the next key. > > Moreover, the "key" object is re-used. So, as you're iterating through > the values, what's actually happening is this pointer to the > associated key data moves with it - and you're seeing it change. > > This only happens in the new "mapreduce" API - in the older "mapred" > API you get the first key, and it appears to stay the same during the > loop. > > It's sometimes useful behaviour, but it's confusing how the two APIs > don't act the same. > > Hope that helps, > Dave > > On 15 October 2012 20:11, Alberto Cordioli <[email protected]> wrote: >> Hi all, >> >> a very strange thing is happening with my hadoop program. >> My map simply emits tuples with a custom object as key (which >> implement WritableComparable). >> The object is made of 2 fields, and I implement my partitioner and >> groupingclass in such a way that only the first field is taken into >> account. >> The second field is just a tag and could be 1 or 2. >> >> This is the reducer's snippet: >> >> tag = key.getSecondField(); >> Iterator it1 = values.iterator(); >> while(it1.hasNext()){ >> it1.next(); >> collector.emit(new Text("dummy"), tag); >> } >> >> I would expect in my output all the lines with: >> dummy 1 >> ... >> dummy 1 >> >> but actually the value of tag changes in time and I obtain this type of >> output: >> >> dummy 1 >> ... >> dummy 1 >> dummy 2 >> ... >> dummy 2 >> >> >> Someone could explain me way, please? >> >> >> Thanks. >> >> >> >> >> >> -- >> Alberto Cordioli -- Alberto Cordioli
