Load the regex patterns from a file (one pattern per line), CROSS their
relation with BagName, and then use SelectFieldByName UDF to summon the
regex pattern from the regex relation.

https://issues.apache.org/jira/plugins/servlet/mobile#issue/DATAFU-69

I believe you can use a field name against matches, if not write a simple
UDF or streaming job.

On Monday, October 6, 2014, Ankur Kasliwal <ankur.kasliwal...@gmail.com>
wrote:

> Hi,
>
>
>
> I have written a ‘Pig Script’ which is processing Sequence files given as
> input.
>
> It is working fine but there is one problem mentioned below.
>
>
>
> I have repetitive statements in my pig script,  as shown below:
>
>
>
>
>
>    -  Filtered_Data _1= FILTER BagName BY ($0 matches 'RegEx-1');
>    -  Filtered_Data_2 = FILTER BagName BY ($0 matches 'RegEx-2');
>    -  Filtered_Data_3 = FILTER BagName BY ($0 matches 'RegEx-3');
>    - So on…
>
>
>
> Question :
>
> So is there any way by which I can have above statement written once and
>
> then loop through all possible “RegEx” and substitute in Pig script.
>
>
>
> For Example:
>
>
> Filtered_Data _X  =   FILTER BagName BY ($0 matches 'RegEx');  ( have this
> statement once )
>
> ( loop through all possible RegEx and substitute value in the statement )
>
>
>
> Right now I am calling Pig script from a shell script, so any way from
> shell script will be also be welcome..
>
>
>
> Thanks in advance.
>
> Happy Pigging!!!!
>


-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com

Reply via email to