Pradeep, Thanks for the pointers, but as i mentioned that I need to extract that string till semicolon, so facing issues with that.
I need to print it before semiclon that's causing pain as when I mention semicolon in regex it treats it as end of statement & produces error. However without mentioning semicolon it works fine but produces complete stuff starting with B75. eg . B=foreach D generate REGEX_EXTRACT(test,'(B75.*)',1); Is there any way by which I can mention semicolon in my above regex, so that it prints the string before that. Thanks, Kartik On Mon, May 12, 2014 at 2:03 PM, Pradeep Gollakota <[email protected]>wrote: > Check out > http://archive.cloudera.com/cdh/3/pig/piglatin_ref2.html#REGEX_EXTRACT > > This may suit your needs > > > On Mon, May 12, 2014 at 12:16 AM, kartik manocha <[email protected] > >wrote: > > > Hi, > > > > I am new to pig & facing an issue in filtering out a string from a field, > > mentioned is the scenario. > > > > - > I am loading data with several fields, among those fields there is > > field name called 'test_data' > > - > There are lot of things in this field, I wanted to filter out a > string > > from this field which starts from B75 & ends with semi colon. > > - > After taking this string out, wanted to add this as a new field to > the > > existing bag which was loaded > > > > I tried using INDEXOF UDF, but that works for a single character only, > > however when I tried using that for single character, it returns () only > > instead of index number. I was just testing, & by manually providing > > indexes in SUBSTRING UDF, it was generating string. > > > > But unable to get the position using indexof UDF, or may be there could > be > > a better of doing this. > > > > If you have any pointers / suggestions, please share. > > > > Thanks in advance. > > > > > > Best, > > Kartik > > >
