Hi Marcin,
On 21 January 2013 23:11, Marcin Mleczko <marcin.mlec...@onet.eu> wrote: > first thank you very much for the quick reply. > No problem... > The functions used here i.e. re.match() are taken directly form the > example in the mentioned HowTo. I'd rather use re.findall() but I > think the general interpretetion of the given regexp sould be nearly > the same in both functions. > ... except that the results are fundamentally different due to the different goals for the 2 functions: the one (match) only matches a regex from the first character of a string. (No conceptual "walking forward" unless you've managed to match the string to a regex.) The other (find), matches the first possible match (conceptually walking the starting point forward only as far as necessary to find a possible match.) > So I'd like to neglect the choise of a particular function for a > moment a concentrate on the pure theory. > What I got so far: > in theory form s = '<<html><head><title>Title</title>' > '<.*?>' would match '<html>' '<head>' '<title>' '</title>' > to achieve this the engine should: > 1. walk forward along the text until it finds < > 2. walk forward from that point until in finds > > Here, conceptually the regex engines work for your original regex is complete and it returns a match. > 3. walk backward form that point (the one of >) until it finds < > No. No further walking backward when you've already matched the regex. 4. return the string between < from 3. and > from 2. as this gives the > least possible string between < and > > "Non greedy" doesn't imply the conceptually altering the starting point in a backwards manner after you've already found a match. > Did I get this right so far? Is this (=least possible string between < > and >), what non-greedy really translates to? > No, as explained above. Walter
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor