Nathan Harmston wrote: > I know this isnt the pyparsing list, but it doesnt seem like there is > one. I m trying to use pyparsing to parse a file however I cant get > the Optional keyword to work. My file generally looks like this: > > ALIGNMENT 1020 YS2-10a02.q1k chr09 1295 42 141045 > 142297 C 1254 95.06 1295 reject_bad_break 0 > > or this: > > ALIGNMENT 36 YS2-10a08.q1k chrm 208 165 10745 > 10788 C 44 95.45 593 reject_low 10,14 > > and my grammar work well for these lines, however somethings the row looks like: > ALIGNMENT 53 YS2-10b03.p1k chr12 180 125 1067465 > 1067520 C 56 98.21 532|5,2 reject_low 25 > > So I try to parse the 532 using > > from pyparsing import * > > integer = Word( nums ) > float = Word( nums+".") > identifier = Word( alphanums+"-_." ) > > alignment = Literal("ALIGNMENT ").suppress() > row_1 = integer.setResultsName("row_1")#.setParseAction(make_int) > src_id = identifier.setResultsName("src_id") > dest_id = identifier.setResultsName("dest_id") > src_start = integer.setResultsName("src_start")#.setParseAction(make_int) > src_stop = integer.setResultsName("src_stop")#.setParseAction(make_int) > dest_start = integer.setResultsName("dest_start")#.setParseAction(make_int) > dest_stop = integer.setResultsName("dest_stop")#.setParseAction(make_int) > row_8 = oneOf("F C").setResultsName("row_8") > length = integer.setResultsName("length")#.setParseAction(make_int) > percent_id = float.setResultsName("percent_id")#.setParseAction(make_float) > row_11 = integer + Optional(Literal("|") + commaSeparatedList ) > )#.setResultsName("row_11")#.setParseAction(make_int) > result = Word(alphas+"_").setResultsName("result") > row_13 = commaSeparatedList.setResultsName("row_13") > > def make_alilines_status_parser(): > return alignment + row_1 + src_id + dest_id + src_start + src_stop > + dest_start + dest_stop + row_8 + length + percent_id + row_11 + > result + row_13 > > def parse_alilines_status(ifile): > alilines = make_alilines_status_parser() > for l in ifile: > yield alilines.parseString( l ) > > However my parser always fails on lines of type 3. Does anyone know > why the Optional part is not working.
The commaSeparatedList includes the rest of the line into its last item: >>> commaSeparatedList.parseString("a,b c") (['a', 'b c'], {}) You can fix this by defining your own delimitedList that doesnt accept whitespace, e. g.: >>> delimitedList(Word(alphanums)).parseString("a,b c") (['a', 'b'], {}) Peter -- http://mail.python.org/mailman/listinfo/python-list