kent: Thank you. I got the right results with re.findall('(us::.*?);', line).
I used these codes before: >>> TAG_pattern = re.compile(r"(us::.*?)") >>> TAG_pattern.findall(line) and got the unwanted results 'us::' Tom: Thank you. But the number of 'us::' terms varies, and kent's solution works well. Daniel On 9/16/07, Kent Johnson <[EMAIL PROTECTED]> wrote: > > 王超 wrote: > > The number of iterms - (us::.*?) - varies. > > > > When I use re.findall with (us::*?), only several 'us::' are extracted. > > I don't understand what is going wrong now. Please show the code, the > data, and tell us what you get and what you want to get. > > Here is an example: > > Without a group you get the whole match: > > In [3]: import re > In [4]: line = """38166 us::Video_Cat::Other; us::Video_Cat::Today Show; > us::VC_Supplier::bc; 1002::ms://bc.wd.net/a275/video/tdy_is.asf; > 1003::ms://bc.wd.net/a275/video/tdy_is_.fl;""" > In [5]: re.findall('us::.*?;', line) > Out[5]: ['us::Video_Cat::Other;', 'us::Video_Cat::Today Show;', > 'us::VC_Supplier::bc;'] > > With a group you get just the group: > > In [6]: re.findall('(us::.*?);', line) > Out[6]: ['us::Video_Cat::Other', 'us::Video_Cat::Today Show', > 'us::VC_Supplier::bc'] > > Kent > > > > > Daniel > > > > On 9/16/07, * Kent Johnson* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > wrote: > > > > 王超 wrote: > > > yes, but I mean if I have the line like this: > > > > > > line = """38166 us::Video_Cat::Other; us::Video_Cat::Today Show; > > > us::VC_Supplier::bc; 1002::ms://bc.wd.net/a275/video/tdy_is.asf; > > > 1003::ms://bc.wd.net/a275/video/tdy_is_.fl;""" > > > > > > I want to get the part "us::MSNVideo_Cat::Other; > > us::MSNVideo_Cat::Today > > > Show; us::VC_Supplier::Msnbc;" > > > > > > but re.compile(r"(us::.*) .*(1002|1003).*$") will get the > > > "1002::ms://bc.wd.net/a275/video/tdy_is.asf;" included in an lazy > > mode. > > > > Of course, you have asked for all the text up to the end of the > string. > > > > Not sure what you mean by lazy mode... > > > > If there will always be three items you could just repeat the > relevant > > sections of the re, something like > > > > r'(us::.*?); (us::.*?); (us::.*?);' > > > > or even > > > > r'(us::Video_Cat::.*?); (us::Video_Cat::.*?); > (us::VC_Supplier::.*?);' > > > > If the number of items varies then use re.findall() with (us::.*?); > > > > The non-greedy match is not strictly needed in the first case but it > is > > in the second. > > > > Kent > > > > > >
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor