Pradeep,

Thanks for the pointers, but as i mentioned that I need to extract that
string till semicolon, so facing issues with that.

I need to print it before semiclon that's causing pain as when I mention
semicolon in regex it treats it as end of statement & produces error.

However without mentioning semicolon it works fine but produces complete
stuff starting with B75.
eg .
B=foreach D generate REGEX_EXTRACT(test,'(B75.*)',1);

Is there any way by which I can mention semicolon in my above regex, so
that it prints the string before that.


Thanks,
Kartik



On Mon, May 12, 2014 at 2:03 PM, Pradeep Gollakota <[email protected]>wrote:

> Check out
> http://archive.cloudera.com/cdh/3/pig/piglatin_ref2.html#REGEX_EXTRACT
>
> This may suit your needs
>
>
> On Mon, May 12, 2014 at 12:16 AM, kartik manocha <[email protected]
> >wrote:
>
> > Hi,
> >
> > I am new to pig & facing an issue in filtering out a string from a field,
> > mentioned is the scenario.
> >
> > - > I am loading data with several fields, among those fields there is
> > field name called 'test_data'
> > - > There are lot of things in this field, I wanted to filter out a
> string
> > from this field which starts from B75 & ends with semi colon.
> > - > After taking this string out, wanted to add this as a new field to
> the
> > existing bag which was loaded
> >
> > I tried using INDEXOF UDF, but that works for a single character only,
> > however when I tried using that for single character, it returns () only
> > instead of index number. I was just testing, & by manually providing
> > indexes in SUBSTRING UDF, it was generating string.
> >
> > But unable to get the position using indexof UDF, or may be there could
> be
> > a better of doing this.
> >
> > If you have any pointers / suggestions, please share.
> >
> > Thanks in advance.
> >
> >
> > Best,
> > Kartik
> >
>

Reply via email to