[ https://issues.apache.org/jira/browse/PIG-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779819#action_12779819 ]
Rekha commented on PIG-420: --------------------------- Although the long/int issue of limit is taken care by ticket 3201952, the usecase is a subset need to have debug mode in pig scripts. I faced similar concern, and wanted a debug mode for usecases like - storing intermediate data only if it is debug, not QA/prod, limit the dataset, applying filters only if debug mode, etc. I worked around the issue by loading a dummy dataset whose only record -> column would be populated with the passed param value $DEBUG.Depending on the value of this column, processsing was controlled. Since it was roundabout way, I agree an inbuilt understanding of debug mode in pig scripts would help.Thanks! > Limit on nothing functionality > ------------------------------ > > Key: PIG-420 > URL: https://issues.apache.org/jira/browse/PIG-420 > Project: Pig > Issue Type: Improvement > Reporter: Anand Murugappan > > Pig 2.0 implements the limit feature but as a standalone statement. > Limit is very useful in debug mode where we could run queries on smaller > amount of data (faster and on fewer nodes) to iron out issues but in the > production mode we would like to run through all the data. It would be good > to have a easy "switch" between debug and prod mode using the limit statement > without having to change the underlying code templates. Given that LIMIT is a > separate standalone statement it gets hard to parametrize the code. > For instance a query template might look like, > A = LOAD '...'; > B = LIMIT A $N; > C = FOREACH B .... > In debug mode, we would like to set the variable $N to 100 but in prod mode > we would like to set it to a 'special value' that would not apply LIMIT and > letting us run it on all the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.