On 01/05/16 20:04, bruce wrote:
> Hey all..
>
> Yeah, the sample I'm dealing with is html.. I'm doing some "complex"
> extraction, and i'm modifying the text to make it easier/more robust..
>
> So, in this case, the ability to generate the line is what's needed
> for the test..
>

But as Peter explained HTML has no concept of a "line". Trying to extract a
line from HTML depends totally on how the HTML is formatted by the author
in the original file, but if you read it from a web server it may totally
rearrange the content(while maintaining the HTML), thus breaking your code.
Similarly if it gets sent via an email or some other mechanism.

What you really want will be defined by the tags within which it lives.
And that's what a parser does - finds tags and extracts the content.
A regex can only do that for a very limited set of inputs. and it certainly
can't guarantee a "line" of output. Even if it seems to work today it
could fail completely next week even if the original HTML doesn't change.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to