[
https://issues.apache.org/jira/browse/PIG-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779819#action_12779819
]
Rekha commented on PIG-420:
---------------------------
Although the long/int issue of limit is taken care by ticket 3201952, the
usecase is a subset need to have debug mode in pig scripts.
I faced similar concern, and wanted a debug mode for usecases like - storing
intermediate data only if it is debug, not QA/prod, limit the dataset, applying
filters only if debug mode, etc.
I worked around the issue by loading a dummy dataset whose only record ->
column would be populated with the passed param value $DEBUG.Depending on the
value of this column, processsing was controlled.
Since it was roundabout way, I agree an inbuilt understanding of debug mode in
pig scripts would help.Thanks!
> Limit on nothing functionality
> ------------------------------
>
> Key: PIG-420
> URL: https://issues.apache.org/jira/browse/PIG-420
> Project: Pig
> Issue Type: Improvement
> Reporter: Anand Murugappan
>
> Pig 2.0 implements the limit feature but as a standalone statement.
> Limit is very useful in debug mode where we could run queries on smaller
> amount of data (faster and on fewer nodes) to iron out issues but in the
> production mode we would like to run through all the data. It would be good
> to have a easy "switch" between debug and prod mode using the limit statement
> without having to change the underlying code templates. Given that LIMIT is a
> separate standalone statement it gets hard to parametrize the code.
> For instance a query template might look like,
> A = LOAD '...';
> B = LIMIT A $N;
> C = FOREACH B ....
> In debug mode, we would like to set the variable $N to 100 but in prod mode
> we would like to set it to a 'special value' that would not apply LIMIT and
> letting us run it on all the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.