On 24/03/16 21:03, Sam Starfas via Tutor wrote: > I have written some python code to gather a bunch > of sentences from many files. > These sentences contain the content:
OK, They are presumably lines and not sentences. Lines end in a newline character and sentences end with a period. So they are fundamentally different and need to be handled differently so you need to be precise. > blah blah blah blah <uicontrol>1-up printing</uicontrol>blah blah blah blah > blah blah blah blah > blah blah <uicontrol>Preset</uicontrol>blah blah blah blah Also they look like they might be out of an XML or HTML file? In that case the easiest way is probably to use a parser for the original data type (etree for XML, Beautiful Soup for HTML, for examples) That's much easier than trying to do it by yourself. That may involve going back a step and not extracting the lines out first... If its not a recognised format like XML then you may need to do it manually and in that case if the formatting is as precise as you show(no extra spaces etc) then you can simply use string methods to locate the end of the tags. opentag = '<uicontrol>' endtag='</uicontrol>' start = my_string.find(openTag) + len(openTag) find() will return the position of the opening <. You can then add the length of the tag to get the start of your wanted text. Similarly end = my_string.find(endtag) locates the start of the end tag. You can then use string slicing to get the bit in betweeen. data = my_string[start:end] If the tags are not as clean then you might need to use regular expressions to do it and that's a whole new level of complexity. Things you need to be clear about: 1) are there any irregularities in how the tags are spelled? (eg. spaces, caps etc) 2) do the tags ever have attributes? 3) can there be multiple tags in a single line/sentence? 4) can tags be nested? 5) can tags cross line/sentence boundaries? Without more detail that's the best I can offer. hth -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor