Thanks Alan,

It was indeed a purely academic question. I've had no issues at all with the
limits or order by not working in Pig. I'm a happy Pig user ;)

Cheers,

Josh


On 15 November 2010 21:56, Alan Gates <[email protected]> wrote:

> POSort is only used for sorts of bags in memory (such as sort inside a
> foreach) not top level sorts.  In both cases the physical operators only
> capture part of the actual operations, since much of the work is done by the
> Hadoop framework.
>
> Very briefly, order by works by taking a sample of the input, building a
> partitioner that will produce a balanced total ordering of the data (that
> is, each part file will be approximately the same size) and then running an
> MR job that uses the order by key as the grouping key along with the just
> built partitioner.  Limit works by applying the limit to each mapper and
> then running a reduce pass in a single reduce, again applying the limit.
>
> Are these questions purely academic or are their applications where you'd
> like to use Pig's order and limit but you can't do the other processing in
> Pig?  If the latter, I'd recommend checking out the new mapreduce command
> introduced in 0.8 (which we'll release here in a week or two I hope) which
> allows you to invoke MR jobs from Pig.  You can learn more about this at
> https://issues.apache.org/jira/browse/PIG-506.  You can also see the
> documentation for this feature in
> http://svn.apache.org/viewvc/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_ref2.xml?view=markup(search
>  on MAPREDUCE).  Sorry, this is the forrest version.  You can also
> see it in html by checking out the code and building it yourself.
>
> Alan.
>
>
> On Nov 15, 2010, at 12:50 AM, Rekha Joshi wrote:
>
>  Hi Josh
>>
>> AFAIR, all relationaloperators reside in source PO*.java under
>> o.a.p.backend.hadoop.executionengine.physicalLayer.relationalOperators.
>> Alternatively check POLimit, POSort under
>> http://pig.apache.org/docs/r0.7.0/api/
>>
>> PigServer is the starting point. and internally will have formations of
>> logical/physical plan of jobs.The executionengine executes the job. Refer
>> files under o.a.p.backend.hadoop.executionengine.
>> More details under http://wiki.apache.org/pig/PigExecutionModel
>>
>> Thanks & Regards,
>> /Rekha.
>>
>> On 11/14/10 7:59 PM, "Josh Devins" <[email protected]> wrote:
>>
>> Hi all,
>>
>> I'm happily using Pig to ORDER BY and LIMIT some large relations quite
>> effectively. However I'm curious about how these are/would be implemented
>> in
>> "raw" MapReduce. Can anyone shed some light/point to some details,
>> examples
>> or pseudo-code somewhere?
>>
>> Cheers,
>>
>> Josh
>>
>>
>

Reply via email to