Thanks for replying everyone. Few comments to everyone's suggestion.

1>  I am processing sequence file which consist of many CSV files. I need
to extract only few among all CSV'S. So that is the reason I am doing
'SelectFieldByValue'
which is file name in my case not by field directly.

2>  All selected files ( different RegEx ) are stored in HDFS separately.
So one STORE statement for each extracted file in a bag.

3>  Cannot  do cross join as all files input will get combined, do not want
to do that.

4>  Cannot do AND/OR operator as i need different bags for each selected
file ( RegEx).



Let me know if any one has any other suggestions.
Sorry for not being clear with specification at first place.

Thanks.

On Mon, Oct 6, 2014 at 4:12 PM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:

> In case you haven't seen this already, take a look at
> http://pig.apache.org/docs/r0.13.0/perf.html for some basic strategies on
> optimizing your pig scripts.
>
> On Mon, Oct 6, 2014 at 1:08 PM, Russell Jurney <russell.jur...@gmail.com>
> wrote:
>
> > Actually, I don't think you need SelectFieldByValue. Just use the name of
> > the field directly.
> >
> > On Monday, October 6, 2014, Prashant Kommireddi <prash1...@gmail.com>
> > wrote:
> >
> > > Are these regex static? If yes, this is easily achieved with embedding
> > your
> > > script in Java or any other language that Pig supports
> > > http://pig.apache.org/docs/r0.13.0/cont.html
> > >
> > > You could also possibly write a UDF that loops through all the regex
> and
> > > returns result.
> > >
> > >
> > >
> > > On Mon, Oct 6, 2014 at 12:44 PM, Ankur Kasliwal <
> > > ankur.kasliwal...@gmail.com <javascript:;>
> > > > wrote:
> > >
> > > > Hi,
> > > >
> > > >
> > > >
> > > > I have written a ‘Pig Script’ which is processing Sequence files
> given
> > as
> > > > input.
> > > >
> > > > It is working fine but there is one problem mentioned below.
> > > >
> > > >
> > > >
> > > > I have repetitive statements in my pig script,  as shown below:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >    -  Filtered_Data _1= FILTER BagName BY ($0 matches 'RegEx-1');
> > > >    -  Filtered_Data_2 = FILTER BagName BY ($0 matches 'RegEx-2');
> > > >    -  Filtered_Data_3 = FILTER BagName BY ($0 matches 'RegEx-3');
> > > >    - So on…
> > > >
> > > >
> > > >
> > > > Question :
> > > >
> > > > So is there any way by which I can have above statement written once
> > and
> > > >
> > > > then loop through all possible “RegEx” and substitute in Pig script.
> > > >
> > > >
> > > >
> > > > For Example:
> > > >
> > > >
> > > > Filtered_Data _X  =   FILTER BagName BY ($0 matches 'RegEx');  ( have
> > > this
> > > > statement once )
> > > >
> > > > ( loop through all possible RegEx and substitute value in the
> > statement )
> > > >
> > > >
> > > >
> > > > Right now I am calling Pig script from a shell script, so any way
> from
> > > > shell script will be also be welcome..
> > > >
> > > >
> > > >
> > > > Thanks in advance.
> > > >
> > > > Happy Pigging!!!!
> > > >
> > >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
> > datasyndrome.com
> >
>

Reply via email to