Load the regex patterns from a file (one pattern per line), CROSS their relation with BagName, and then use SelectFieldByName UDF to summon the regex pattern from the regex relation.
https://issues.apache.org/jira/plugins/servlet/mobile#issue/DATAFU-69 I believe you can use a field name against matches, if not write a simple UDF or streaming job. On Monday, October 6, 2014, Ankur Kasliwal <ankur.kasliwal...@gmail.com> wrote: > Hi, > > > > I have written a ‘Pig Script’ which is processing Sequence files given as > input. > > It is working fine but there is one problem mentioned below. > > > > I have repetitive statements in my pig script, as shown below: > > > > > > - Filtered_Data _1= FILTER BagName BY ($0 matches 'RegEx-1'); > - Filtered_Data_2 = FILTER BagName BY ($0 matches 'RegEx-2'); > - Filtered_Data_3 = FILTER BagName BY ($0 matches 'RegEx-3'); > - So on… > > > > Question : > > So is there any way by which I can have above statement written once and > > then loop through all possible “RegEx” and substitute in Pig script. > > > > For Example: > > > Filtered_Data _X = FILTER BagName BY ($0 matches 'RegEx'); ( have this > statement once ) > > ( loop through all possible RegEx and substitute value in the statement ) > > > > Right now I am calling Pig script from a shell script, so any way from > shell script will be also be welcome.. > > > > Thanks in advance. > > Happy Pigging!!!! > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com