Thanks Dave. You solved my problem. Just a little question about your tip: I suppose also the value returned by iterator.next() is re-used. So if want to store some values of the Iterable list in the reducer, I should create a List and put cloned objects inside it. In this case there is no possibility to avoid the "new" operator, right?
On 15 October 2012 22:49, Dave Beech <[email protected]> wrote: > Well, if all you need is the tag (the 1 or 2), why not just use a Text > or IntWritable instance variable. You wouldn't need to clone the whole > key. > > Then, instead of tag = key.getSecondField() you'd say > tag.set(key.getSecondField().get()); > I don't know what type of object tag is (if it's Text you'll say > toString() rather than get()), but you see what I mean. > > Also - just a tip - try to avoid creating new objects wherever > possible. You'll get better performance if you create one Text object > as an instance variable and re-use it by setting the value instead of > calling new Text("") on every output. > > Thanks, > Dave > > On 15 October 2012 21:39, Alberto Cordioli <[email protected]> wrote: >> Hi Dave, >> >> thanks for your reply. Now it's more clear; in fact the code that I >> wrote is inspired to the old api, where the behavior is another. >> So, how can I achieve the same behavior as the old api? I need the >> second field of the first key object to stay the same among the >> iterations, in order to compare it with other objects. Do I have to >> clone the object? >> >> >> Thanks. >> >> On 15 October 2012 21:27, Dave Beech <[email protected]> wrote: >>> Hi Alberto >>> >>> The iterator you are looping over in your reduce method isn't a >>> self-contained list of values. What's actually happening is that >>> you're iterating through *part* of the sorted key/value set that was >>> sent to that reduce node, and it is the grouping comparator that >>> decides when to break that loop and call reduce again on the next key. >>> >>> Moreover, the "key" object is re-used. So, as you're iterating through >>> the values, what's actually happening is this pointer to the >>> associated key data moves with it - and you're seeing it change. >>> >>> This only happens in the new "mapreduce" API - in the older "mapred" >>> API you get the first key, and it appears to stay the same during the >>> loop. >>> >>> It's sometimes useful behaviour, but it's confusing how the two APIs >>> don't act the same. >>> >>> Hope that helps, >>> Dave >>> >>> On 15 October 2012 20:11, Alberto Cordioli <[email protected]> >>> wrote: >>>> Hi all, >>>> >>>> a very strange thing is happening with my hadoop program. >>>> My map simply emits tuples with a custom object as key (which >>>> implement WritableComparable). >>>> The object is made of 2 fields, and I implement my partitioner and >>>> groupingclass in such a way that only the first field is taken into >>>> account. >>>> The second field is just a tag and could be 1 or 2. >>>> >>>> This is the reducer's snippet: >>>> >>>> tag = key.getSecondField(); >>>> Iterator it1 = values.iterator(); >>>> while(it1.hasNext()){ >>>> it1.next(); >>>> collector.emit(new Text("dummy"), tag); >>>> } >>>> >>>> I would expect in my output all the lines with: >>>> dummy 1 >>>> ... >>>> dummy 1 >>>> >>>> but actually the value of tag changes in time and I obtain this type of >>>> output: >>>> >>>> dummy 1 >>>> ... >>>> dummy 1 >>>> dummy 2 >>>> ... >>>> dummy 2 >>>> >>>> >>>> Someone could explain me way, please? >>>> >>>> >>>> Thanks. >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Alberto Cordioli >> >> >> >> -- >> Alberto Cordioli -- Alberto Cordioli
