Command that selects a random sample of the rows, similar to LIMIT ------------------------------------------------------------------
Key: PIG-795 URL: https://issues.apache.org/jira/browse/PIG-795 Project: Pig Issue Type: New Feature Components: impl Reporter: Eric Gaudet Priority: Trivial When working with very large data sets (imagine that!), running a pig script can take time. It may be useful to run on a small subset of the data in some situations (eg: debugging / testing, or to get fast results even if less accurate.) The command "LIMIT N" selects the first N rows of the data, but these are not necessarily randomzed. A command "SAMPLE X" would retain the row only with the probability x%. Note: it is possible to implement this feature with FILTER BY and an UDF, but so is LIMIT, and limit is built-in. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.