Rekha commented on PIG-420:

Although the long/int issue of limit is taken care by ticket 3201952, the 
usecase is a subset need to have debug mode in pig scripts.

I faced similar concern, and wanted a debug mode for usecases like - storing 
intermediate data only if it is debug, not QA/prod, limit the dataset, applying 
filters only if debug mode, etc.

I worked around the issue by loading a dummy dataset whose only record -> 
column would be populated with the passed param value $DEBUG.Depending on the 
value of this column, processsing was controlled.

Since it was roundabout way, I agree an inbuilt understanding of debug mode in 
pig scripts would help.Thanks!

> Limit on nothing functionality
> ------------------------------
>                 Key: PIG-420
>                 URL: https://issues.apache.org/jira/browse/PIG-420
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Anand Murugappan
> Pig 2.0 implements the limit feature but as a standalone statement. 
> Limit is very useful in debug mode where we could run queries on smaller 
> amount of data (faster and on fewer nodes) to iron out issues but in the 
> production mode we would like to run through all the data. It would be good 
> to have a easy "switch" between debug and prod mode using the limit statement 
> without having to change the underlying code templates. Given that LIMIT is a 
> separate standalone statement it gets hard to parametrize the code. 
> For instance a query template might look like, 
> A = LOAD '...';
> B = LIMIT A $N;
> C = FOREACH B .... 
> In debug mode, we would like to set the variable $N to 100 but in prod mode 
> we would like to set it to a 'special value' that would not apply LIMIT and 
> letting us run it on all the data. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to