If you can describe the layout of your input files more thoroughly, it
would help.

On Monday, October 6, 2014, Pradeep Gollakota <[email protected]> wrote:

> It looks like the best option at this point is to write a custom UDF that
> takes loads a set of regular expressions from file and runs the data
> through all of them.
>
> On Mon, Oct 6, 2014 at 1:44 PM, Ankur Kasliwal <
> [email protected] <javascript:;>>
> wrote:
>
> > Thanks for replying everyone. Few comments to everyone's suggestion.
> >
> > 1>  I am processing sequence file which consist of many CSV files. I need
> > to extract only few among all CSV'S. So that is the reason I am doing
> 'SelectFieldByValue'
> > which is file name in my case not by field directly.
> >
> > 2>  All selected files ( different RegEx ) are stored in HDFS separately.
> > So one STORE statement for each extracted file in a bag.
> >
> > 3>  Cannot  do cross join as all files input will get combined, do not
> > want to do that.
> >
> > 4>  Cannot do AND/OR operator as i need different bags for each selected
> > file ( RegEx).
> >
> >
> >
> > Let me know if any one has any other suggestions.
> > Sorry for not being clear with specification at first place.
> >
> > Thanks.
> >
> > On Mon, Oct 6, 2014 at 4:12 PM, Pradeep Gollakota <[email protected]
> <javascript:;>>
> > wrote:
> >
> >> In case you haven't seen this already, take a look at
> >> http://pig.apache.org/docs/r0.13.0/perf.html for some basic strategies
> on
> >> optimizing your pig scripts.
> >>
> >> On Mon, Oct 6, 2014 at 1:08 PM, Russell Jurney <
> [email protected] <javascript:;>>
> >> wrote:
> >>
> >> > Actually, I don't think you need SelectFieldByValue. Just use the name
> >> of
> >> > the field directly.
> >> >
> >> > On Monday, October 6, 2014, Prashant Kommireddi <[email protected]
> <javascript:;>>
> >> > wrote:
> >> >
> >> > > Are these regex static? If yes, this is easily achieved with
> embedding
> >> > your
> >> > > script in Java or any other language that Pig supports
> >> > > http://pig.apache.org/docs/r0.13.0/cont.html
> >> > >
> >> > > You could also possibly write a UDF that loops through all the regex
> >> and
> >> > > returns result.
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Oct 6, 2014 at 12:44 PM, Ankur Kasliwal <
> >> > > [email protected] <javascript:;> <javascript:;>
> >> > > > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > >
> >> > > >
> >> > > > I have written a ‘Pig Script’ which is processing Sequence files
> >> given
> >> > as
> >> > > > input.
> >> > > >
> >> > > > It is working fine but there is one problem mentioned below.
> >> > > >
> >> > > >
> >> > > >
> >> > > > I have repetitive statements in my pig script,  as shown below:
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >    -  Filtered_Data _1= FILTER BagName BY ($0 matches 'RegEx-1');
> >> > > >    -  Filtered_Data_2 = FILTER BagName BY ($0 matches 'RegEx-2');
> >> > > >    -  Filtered_Data_3 = FILTER BagName BY ($0 matches 'RegEx-3');
> >> > > >    - So on…
> >> > > >
> >> > > >
> >> > > >
> >> > > > Question :
> >> > > >
> >> > > > So is there any way by which I can have above statement written
> once
> >> > and
> >> > > >
> >> > > > then loop through all possible “RegEx” and substitute in Pig
> script.
> >> > > >
> >> > > >
> >> > > >
> >> > > > For Example:
> >> > > >
> >> > > >
> >> > > > Filtered_Data _X  =   FILTER BagName BY ($0 matches 'RegEx');  (
> >> have
> >> > > this
> >> > > > statement once )
> >> > > >
> >> > > > ( loop through all possible RegEx and substitute value in the
> >> > statement )
> >> > > >
> >> > > >
> >> > > >
> >> > > > Right now I am calling Pig script from a shell script, so any way
> >> from
> >> > > > shell script will be also be welcome..
> >> > > >
> >> > > >
> >> > > >
> >> > > > Thanks in advance.
> >> > > >
> >> > > > Happy Pigging!!!!
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Russell Jurney twitter.com/rjurney [email protected]
> <javascript:;>
> >> > datasyndrome.com
> >> >
> >>
> >
> >
>


-- 
Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com

Reply via email to