kent:
Thank you.  I got the right results with re.findall('(us::.*?);', line).

I used these codes before:
>>> TAG_pattern = re.compile(r"(us::.*?)")
>>> TAG_pattern.findall(line)
and got the unwanted results 'us::'

Tom:
Thank you. But the number of 'us::' terms varies, and kent's solution works
well.

Daniel

On 9/16/07, Kent Johnson <[EMAIL PROTECTED]> wrote:
>
> 王超 wrote:
> > The number of iterms - (us::.*?) - varies.
> >
> > When I use re.findall with (us::*?), only several 'us::' are extracted.
>
> I don't understand what is going wrong now. Please show the code, the
> data, and tell us what you get and what you want to get.
>
> Here is an example:
>
> Without a group you get the whole match:
>
> In [3]: import re
> In [4]: line = """38166 us::Video_Cat::Other; us::Video_Cat::Today Show;
> us::VC_Supplier::bc; 1002::ms://bc.wd.net/a275/video/tdy_is.asf;
> 1003::ms://bc.wd.net/a275/video/tdy_is_.fl;"""
> In [5]: re.findall('us::.*?;', line)
> Out[5]: ['us::Video_Cat::Other;', 'us::Video_Cat::Today Show;',
> 'us::VC_Supplier::bc;']
>
> With a group you get just the group:
>
> In [6]: re.findall('(us::.*?);', line)
> Out[6]: ['us::Video_Cat::Other', 'us::Video_Cat::Today Show',
> 'us::VC_Supplier::bc']
>
> Kent
>
> >
> > Daniel
> >
> > On 9/16/07, * Kent Johnson* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
> wrote:
> >
> >     王超 wrote:
> >      > yes, but I mean if I have the line like this:
> >      >
> >      > line = """38166 us::Video_Cat::Other; us::Video_Cat::Today Show;
> >      > us::VC_Supplier::bc; 1002::ms://bc.wd.net/a275/video/tdy_is.asf;
> >      > 1003::ms://bc.wd.net/a275/video/tdy_is_.fl;"""
> >      >
> >      > I want to get the part "us::MSNVideo_Cat::Other;
> >     us::MSNVideo_Cat::Today
> >      > Show; us::VC_Supplier::Msnbc;"
> >      >
> >      > but re.compile(r"(us::.*) .*(1002|1003).*$") will get the
> >      > "1002::ms://bc.wd.net/a275/video/tdy_is.asf;" included in an lazy
> >     mode.
> >
> >     Of course, you have asked for all the text up to the end of the
> string.
> >
> >     Not sure what you mean by lazy mode...
> >
> >     If there will always be three items you could just repeat the
> relevant
> >     sections of the re, something like
> >
> >     r'(us::.*?); (us::.*?); (us::.*?);'
> >
> >     or even
> >
> >     r'(us::Video_Cat::.*?); (us::Video_Cat::.*?);
> (us::VC_Supplier::.*?);'
> >
> >     If the number of items varies then use re.findall() with (us::.*?);
> >
> >     The non-greedy match is not strictly needed in the first case but it
> is
> >     in the second.
> >
> >     Kent
> >
> >
>
>
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to