Thanks for replying everyone. Few comments to everyone's suggestion. 1> I am processing sequence file which consist of many CSV files. I need to extract only few among all CSV'S. So that is the reason I am doing 'SelectFieldByValue' which is file name in my case not by field directly.
2> All selected files ( different RegEx ) are stored in HDFS separately. So one STORE statement for each extracted file in a bag. 3> Cannot do cross join as all files input will get combined, do not want to do that. 4> Cannot do AND/OR operator as i need different bags for each selected file ( RegEx). Let me know if any one has any other suggestions. Sorry for not being clear with specification at first place. Thanks. On Mon, Oct 6, 2014 at 4:12 PM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > In case you haven't seen this already, take a look at > http://pig.apache.org/docs/r0.13.0/perf.html for some basic strategies on > optimizing your pig scripts. > > On Mon, Oct 6, 2014 at 1:08 PM, Russell Jurney <russell.jur...@gmail.com> > wrote: > > > Actually, I don't think you need SelectFieldByValue. Just use the name of > > the field directly. > > > > On Monday, October 6, 2014, Prashant Kommireddi <prash1...@gmail.com> > > wrote: > > > > > Are these regex static? If yes, this is easily achieved with embedding > > your > > > script in Java or any other language that Pig supports > > > http://pig.apache.org/docs/r0.13.0/cont.html > > > > > > You could also possibly write a UDF that loops through all the regex > and > > > returns result. > > > > > > > > > > > > On Mon, Oct 6, 2014 at 12:44 PM, Ankur Kasliwal < > > > ankur.kasliwal...@gmail.com <javascript:;> > > > > wrote: > > > > > > > Hi, > > > > > > > > > > > > > > > > I have written a ‘Pig Script’ which is processing Sequence files > given > > as > > > > input. > > > > > > > > It is working fine but there is one problem mentioned below. > > > > > > > > > > > > > > > > I have repetitive statements in my pig script, as shown below: > > > > > > > > > > > > > > > > > > > > > > > > - Filtered_Data _1= FILTER BagName BY ($0 matches 'RegEx-1'); > > > > - Filtered_Data_2 = FILTER BagName BY ($0 matches 'RegEx-2'); > > > > - Filtered_Data_3 = FILTER BagName BY ($0 matches 'RegEx-3'); > > > > - So on… > > > > > > > > > > > > > > > > Question : > > > > > > > > So is there any way by which I can have above statement written once > > and > > > > > > > > then loop through all possible “RegEx” and substitute in Pig script. > > > > > > > > > > > > > > > > For Example: > > > > > > > > > > > > Filtered_Data _X = FILTER BagName BY ($0 matches 'RegEx'); ( have > > > this > > > > statement once ) > > > > > > > > ( loop through all possible RegEx and substitute value in the > > statement ) > > > > > > > > > > > > > > > > Right now I am calling Pig script from a shell script, so any way > from > > > > shell script will be also be welcome.. > > > > > > > > > > > > > > > > Thanks in advance. > > > > > > > > Happy Pigging!!!! > > > > > > > > > > > > > -- > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > > datasyndrome.com > > >