Re: Confused by FOREACH .. GENERATE .. TOP semantics

2011-07-22 Thread Andrew Clegg
Dmitriy -- my requirements have changed slightly in this particular instance, I actually now need to order by several columns, so I think that means I have to use an inner order-by, rather than TOP. Thankfully the bags are small. Daniel -- I'm working on extracting out a small test case that demon

Re: Confused by FOREACH .. GENERATE .. TOP semantics

2011-07-22 Thread Dmitriy Ryaboy
On the subject of TOP -- the reason you would use it instead of an inner order + limit is that it's much more efficient for large bags. It is algebraic, so the computation can be well optimized. On top of that, it does not require a full sort of the bag. -D On Thu, Jul 21, 2011 at 9:41 PM, Daniel

Re: Confused by FOREACH .. GENERATE .. TOP semantics

2011-07-21 Thread Daniel Dai
The syntax looks legal. Can you do an explain? Daniel On Thu, Jul 21, 2011 at 5:15 AM, Andrew Clegg wrote: > Hi, > > I have some code that looks like this: > > top_hits = foreach regrouped { >result = TOP(1, 6, projected_joined_albums); -- field 6 = score >generate flatten(result); > };

Confused by FOREACH .. GENERATE .. TOP semantics

2011-07-21 Thread Andrew Clegg
Hi, I have some code that looks like this: top_hits = foreach regrouped { result = TOP(1, 6, projected_joined_albums); -- field 6 = score generate flatten(result); }; I'm not too keen on the TOP syntax because it's opaque and you need the comment there to explain what's going on. I've s