2009/5/22 Eduardo Vieira <eduardo.su...@gmail.com>: > I will be looking for lines like these: > Lesson Text: Acts 5:15-20, 25; 10:12; John 3:16; Psalm 23 > > So, references in different chapters are separated by a semicolon. My > main challenge would be make the program guess that 10:12 refers to > the previous book. 15-20 means verses 15 thru 20 inclusive. I'm afraid > that will take more than Regex and I never studied anything about > parser tools, really.
Well, pyparsing is one of the standard python parsing modules. It's not that bad, really :-) Here's some code I knocked out: from pyparsing import * SingleVerse = Word(nums) VerseRange = SingleVerse + '-' + SingleVerse Verse = VerseRange | SingleVerse Verse = Verse.setResultsName('Verse').setName('Verse') Verses = Verse + ZeroOrMore(Suppress(',') + Verse) Verses = Verses.setResultsName('Verses').setName('Verses') ChapterNum = Word(nums) ChapterNum = ChapterNum.setResultsName('Chapter').setName('Chapter') ChapVerses = ChapterNum + ':' + Verses SingleChapter = Group(ChapVerses | ChapterNum) Chapters = SingleChapter + ZeroOrMore(Suppress(';') + SingleChapter) Chapters = Chapters.setResultsName('Chapters').setName('Chapters') BookName = CaselessLiteral('Acts') | CaselessLiteral('Psalm') | CaselessLiteral('John') BookName = BookName.setResultsName('Book').setName('Book') Book = Group(BookName + Chapters) Books = Book + ZeroOrMore(Suppress(';') + Book) Books = Books.setResultsName('Books').setName('Books') All = CaselessLiteral('Lesson Text:') + Books + LineEnd() s = 'Lesson Text: Acts 5:15-20, 25; 10:12; John 3:16; Psalm 23' res = All.parseString(s) for b in res.Books: for c in b.Chapters: if c.Verses: for v in c.Verses: print 'Book', b[0], 'Chapter', c[0], 'Verse', v else: print 'Book', b[0], 'Chapter', c[0] ###### Hopefully you can get the idea of most of it from looking at the code. Suppress() means "parse this token, but don't include it in the results". Group() is necessary for getting access to a list of things -- you can experiment by taking it out and seeing what you get. Obviously you'll need to add more names to the BookName element. Obviously also, there is a bit more work to be done on Verses. You might want to look into the concept of "parse actions". A really simple parse action might be this: def convertToNumber(string_, location, tokens): """ Used in setParseAction to make numeric parsers return numbers. """ return [int(tokens[0])] SingleVerse.setParseAction(convertToNumber) ChapterNum.setParseAction(convertToNumber) That should get you python integers instead of strings. You can probably do more with parseActions to, for instance, turn something like '15-20' into [15,16,17,18,19,20]. HTH! -- John. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor