Yes, I know that keeping an in-memory collection ins't a good idea. The problem is that I need to perform a join, so there is no other possibilities! :(
Cheers, Alberto On 16 October 2012 11:08, Dave Beech <[email protected]> wrote: > Great! Glad the problem is solved. > > You're right - the object returned by iterator.next() is re-used too. > So yes, you would need to clone in this case and you'd have no choice > but to create new objects. > > Please be sure though that you really do need to store values in a > list to do what you're trying to do. Keeping an in-memory collection > might not be very scalable. Obviously, if you've got loads of RAM or > not a lot of data (or both), then that's fine! Just something else to > think about... > > Cheers, > Dave > > On 16 October 2012 09:42, Alberto Cordioli <[email protected]> wrote: >> Thanks Dave. >> You solved my problem. Just a little question about your tip: >> I suppose also the value returned by iterator.next() is re-used. >> So if want to store some values of the Iterable list in the reducer, I >> should create a List and put cloned objects inside it. >> In this case there is no possibility to avoid the "new" operator, right? >> >> >> >> On 15 October 2012 22:49, Dave Beech <[email protected]> wrote: >>> Well, if all you need is the tag (the 1 or 2), why not just use a Text >>> or IntWritable instance variable. You wouldn't need to clone the whole >>> key. >>> >>> Then, instead of tag = key.getSecondField() you'd say >>> tag.set(key.getSecondField().get()); >>> I don't know what type of object tag is (if it's Text you'll say >>> toString() rather than get()), but you see what I mean. >>> >>> Also - just a tip - try to avoid creating new objects wherever >>> possible. You'll get better performance if you create one Text object >>> as an instance variable and re-use it by setting the value instead of >>> calling new Text("") on every output. >>> >>> Thanks, >>> Dave >>> >>> On 15 October 2012 21:39, Alberto Cordioli <[email protected]> >>> wrote: >>>> Hi Dave, >>>> >>>> thanks for your reply. Now it's more clear; in fact the code that I >>>> wrote is inspired to the old api, where the behavior is another. >>>> So, how can I achieve the same behavior as the old api? I need the >>>> second field of the first key object to stay the same among the >>>> iterations, in order to compare it with other objects. Do I have to >>>> clone the object? >>>> >>>> >>>> Thanks. >>>> >>>> On 15 October 2012 21:27, Dave Beech <[email protected]> wrote: >>>>> Hi Alberto >>>>> >>>>> The iterator you are looping over in your reduce method isn't a >>>>> self-contained list of values. What's actually happening is that >>>>> you're iterating through *part* of the sorted key/value set that was >>>>> sent to that reduce node, and it is the grouping comparator that >>>>> decides when to break that loop and call reduce again on the next key. >>>>> >>>>> Moreover, the "key" object is re-used. So, as you're iterating through >>>>> the values, what's actually happening is this pointer to the >>>>> associated key data moves with it - and you're seeing it change. >>>>> >>>>> This only happens in the new "mapreduce" API - in the older "mapred" >>>>> API you get the first key, and it appears to stay the same during the >>>>> loop. >>>>> >>>>> It's sometimes useful behaviour, but it's confusing how the two APIs >>>>> don't act the same. >>>>> >>>>> Hope that helps, >>>>> Dave >>>>> >>>>> On 15 October 2012 20:11, Alberto Cordioli <[email protected]> >>>>> wrote: >>>>>> Hi all, >>>>>> >>>>>> a very strange thing is happening with my hadoop program. >>>>>> My map simply emits tuples with a custom object as key (which >>>>>> implement WritableComparable). >>>>>> The object is made of 2 fields, and I implement my partitioner and >>>>>> groupingclass in such a way that only the first field is taken into >>>>>> account. >>>>>> The second field is just a tag and could be 1 or 2. >>>>>> >>>>>> This is the reducer's snippet: >>>>>> >>>>>> tag = key.getSecondField(); >>>>>> Iterator it1 = values.iterator(); >>>>>> while(it1.hasNext()){ >>>>>> it1.next(); >>>>>> collector.emit(new Text("dummy"), tag); >>>>>> } >>>>>> >>>>>> I would expect in my output all the lines with: >>>>>> dummy 1 >>>>>> ... >>>>>> dummy 1 >>>>>> >>>>>> but actually the value of tag changes in time and I obtain this type of >>>>>> output: >>>>>> >>>>>> dummy 1 >>>>>> ... >>>>>> dummy 1 >>>>>> dummy 2 >>>>>> ... >>>>>> dummy 2 >>>>>> >>>>>> >>>>>> Someone could explain me way, please? >>>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Alberto Cordioli >>>> >>>> >>>> >>>> -- >>>> Alberto Cordioli >> >> >> >> -- >> Alberto Cordioli -- Alberto Cordioli
