> -----Original Message----- > From: Alan Gates [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 22, 2008 2:01 PM > To: Shravan Narayanamurthy > Cc: [email protected] > Subject: Re: Comparison between Tuple compare & > WritableComparabale compare > > Clearly we should be thinking about exec time. And having to > load one less bag into memory should greatly reduce exec > time, at least in the case where we can't fit that bag into > memory and we have to spill. I have no idea of how to > compare and say which is a better performance gain.
In adition to performance, this can mean failing/succeeding on some joins. If we can't bring key into the memory, we can still process the query if we just stream through the data. > > A few thoughts: > > 1) We're in the boat of using tuples anytime a user groups, > cogroups, or sorts on more than one column and for all > distincts, correct? So we have this problem at least in some > cases, no matter what. > > 2) In the previous code, we had switched from using the tuple > object comparator to using a binary comparator provided by > hadoop. This gave us a large speed up. Are we still using > that binary comparator? There is no reason why we should not use binary comparator! > > 3) We need to take a look at the tuple and see what is taking > so long. > Are we spending time constructing the tuples vs hadoops > WritableComparable types, time comparing them, etc. > > Alan. > > Shravan Narayanamurthy wrote: > > I completely messed up the calculation of speed reduction. > Sorry. The > > 30 to 40 times speed reduction in comparison time leads to the same > > reduction in speed even when we do n log n comparisons :) > > > > Still don't you think its a high price to pay just to go > from n to n-1 bags. I agree that memory savings can be huge > but shouldn't we also be thinking about exec time? > > > > Thanks, > > --Shravan > > > > ________________________________ > > > > From: Shravan Narayanamurthy > > Sent: Thu 5/22/2008 11:35 PM > > To: Alan Gates > > Subject: Comparison between Tuple compare & WritableComparabale > > compare > > > > > > > > Hi Alan, > > Comparing the times to compare two WritableComparables a > million with > > the time to compare the same objects when embedded in a Tuple. Also > > the Tuple has two elements. First one is the index and the > second one > > is the actual object: > > > > BOOLEAN : Tuple :: 14.16 : 602.76 > > BYTEARRAY : Tuple :: 53.94 : 414.06 > > CHARARRAY : Tuple :: 50.9 : 417.86 > > FLOAT : Tuple :: 20.2 : 655.4 > > INTEGER : Tuple :: 14.24 : 539.3 > > LONG : Tuple :: 16.08 : 578.6 > > > > > > The numbers surely look depressing. I was wondering if its > a good idea > > to do the (n-1) bag optimization at all. Because with just > adding two > > inputs into the cogroup, would make us send tuples as keys and this > > incurring nearly 30 to 40 times reduced speed just for comparing. > > Since we are sorting we will do n log n comparisons thus > incurring 150 > > to 200 times reduction in speed. Joins being pretty > commonly used, I > > feel we should avoid this optimization. > > > > Thanks, > > --Shravan > > > > > > > > > > >
