Hi Ankur,

Is the list of regular expressions static or dynamic? If it's a static
list, you can collapse all the filter operators into a single operator and
use the AND keyword to combine them.

E.g.

 Filtered_Data = FILTER BagName BY ($0 matches 'RegEx-1') AND ($0 matches
'RegEx-2') AND ($0 matches 'RegEx-3');

If it's dynamic, you can use the option that Russell and Prashant
suggested. Write a UDF that loads a list of regular expressions and
processes them in sequence.

On Mon, Oct 6, 2014 at 12:44 PM, Ankur Kasliwal <ankur.kasliwal...@gmail.com
> wrote:

> Hi,
>
>
>
> I have written a ‘Pig Script’ which is processing Sequence files given as
> input.
>
> It is working fine but there is one problem mentioned below.
>
>
>
> I have repetitive statements in my pig script,  as shown below:
>
>
>
>
>
>    -  Filtered_Data _1= FILTER BagName BY ($0 matches 'RegEx-1');
>    -  Filtered_Data_2 = FILTER BagName BY ($0 matches 'RegEx-2');
>    -  Filtered_Data_3 = FILTER BagName BY ($0 matches 'RegEx-3');
>    - So on…
>
>
>
> Question :
>
> So is there any way by which I can have above statement written once and
>
> then loop through all possible “RegEx” and substitute in Pig script.
>
>
>
> For Example:
>
>
> Filtered_Data _X  =   FILTER BagName BY ($0 matches 'RegEx');  ( have this
> statement once )
>
> ( loop through all possible RegEx and substitute value in the statement )
>
>
>
> Right now I am calling Pig script from a shell script, so any way from
> shell script will be also be welcome..
>
>
>
> Thanks in advance.
>
> Happy Pigging!!!!
>

Reply via email to