> I totally ignore the priority part the lex at PLY documents. > The only difference between NUM and VECTOR tokens are NUM always after > some special words like TSET or repeat.
In that case, you should merge them into a single token (or just delete one of them). Do you really need a different token? > I got a way to do this: t_NUM=r'(?<=TSET\s)\d+|(?<=repeat\s)\d+' > only problem is (?<=) grammar only support fixed length so I can not > use \s* to match the spaces between them. So another scan to sub all > duplicated \s is needed. Usually that kind of regular expression features are not needed. There almost always is a better way to do it. > What would be the usually way to solve this kind problem: the same > character at different locations meaning different TOKEN? You could (as I stated above) merge the tokens, as they are both numbers? That would be the prefered way I think. An alternative would be to look at lexer states, but it wouldn't be my first choice. Dennis zt wrote: >Hi Dennis, > >Thanks a lot. > >I totally ignore the priority part the lex at PLY documents. >The only difference between NUM and VECTOR tokens are NUM always after >some special words like TSET or repeat. >I got a way to do this: t_NUM=r'(?<=TSET\s)\d+|(?<=repeat\s)\d+' >only problem is (?<=) grammar only support fixed length so I can not >use \s* to match the spaces between them. So another scan to sub all >duplicated \s is needed. > >What would be the usually way to solve this kind problem: the same >character at different locations meaning different TOKEN? > >Best Regards, >Adun > >On Dec 31 2008, 5:32 pm, "Hendriks, D." <[email protected]> wrote: > > >>Hello zt, >> >>both r'\d+' and r'0|1|...' match the numbers 0 and 1. Since the r'0|1|...' >>regular expression has a longer length, it is given priority (see Ply >>documentation). Is there any way to differentiate the NUM and VECTOR tokens? >>For instance, can NUM tokens start with a 0 at all? You will need to have two >>regular expressions that only match the given input for that token (that is, >>no overlap). Well, you can have overlap, as long as you know it's there and >>the one that is given priority is the one you want to have priority, but >>still, I think it is better to avoid the overlap alltogether... >> >>Dennis >> >>________________________________ >> >>Van: [email protected] namens zt >>Verzonden: wo 31-12-2008 9:49 >>Aan: ply-hack >>Onderwerp: Lex token problem >> >>Hi all, >> >>I am still learning how to write parser with PLY. I need to parse >>following format data: >> TSET 1 001 X 0 00; >> 001 X 0 00; >> 001 X 0 00; >> TSET 7 001 X 0 00; >>repeat 12 001 X 0 00; >> >>The tokens are defined as: >>t_TSET=r'TSET' >>t_NUM=r'\d+' >>t_MCODE=r'repeat' >>t_VECTOR=r'0|1|H|L|X' >> >>but it kept treating the first "1" at line 1 as VECTOR instead of NUM >>and the "1" after "repeat" as VECTOR. >>Is there a good way to fix this? >> >>Thanks a lot! >> >> winmail.dat >>6KViewDownload >> >> >> > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ply-hack" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ply-hack?hl=en -~----------~----~----~----~------~----~------~--~---
