Steven Bethard wrote: > Within a larger pyparsing grammar, I have something that looks like:: > > wsj/00/wsj_0003.mrg > > When parsing this, I'd like to keep around both the full string, and the > AAA_NNNN substring of it, so I'd like something like:: > > >>> foo.parseString('wsj/00/wsj_0003.mrg') > (['wsj/00/wsj_0003.mrg', 'wsj_0003'], {}) > > How do I go about this? I was using something like:: > > >>> digits = pp.Word(pp.nums) > >>> alphas = pp.Word(pp.alphas) > >>> wsj_name = pp.Combine(alphas + '_' + digits) > >>> wsj_path = pp.Combine(alphas + '/' + digits + '/' + wsj_name + > ... '.mrg') > > But of course then all I get back is the full path:: > > >>> wsj_path.parseString('wsj/00/wsj_0003.mrg') > (['wsj/00/wsj_0003.mrg'], {}) > The tokens are what the tokens are, so if you want to replicate a sub-field, then you'll need a parse action to insert it into the returned tokens. BUT, if all you want is to be able to easily *access* that sub-field, then why not give it a results name? Like this:
wsj_name = pp.Combine(alphas + '_' + digits).setResultsName("name") Leave everything else the same, but now you can access the name field independently from the rest of the combined tokens. result = wsj_path.parseString('wsj/00/wsj_0003.mrg') print result.dump() print result.name print result.asList() -- Paul -- http://mail.python.org/mailman/listinfo/python-list