Re: Lex token problem

D.Hendriks (Dennis) Mon, 05 Jan 2009 01:23:14 -0800

> I totally ignore the priority part the lex at PLY documents.
> The only difference between NUM and VECTOR tokens are NUM always after
> some special words like TSET or repeat.


In that case, you should merge them into a single token (or just delete
one of them). Do you really need a different token?

> I got a way to do this: t_NUM=r'(?<=TSET\s)\d+|(?<=repeat\s)\d+'
> only problem is (?<=) grammar only support fixed length so I can not
> use \s* to match the spaces between them. So another scan to sub all
> duplicated \s is needed.

Usually that kind of regular expression features are not needed. There
almost always is a better way to do it.

> What would be the usually way to solve this kind problem: the same
> character at different locations meaning different TOKEN?

You could (as I stated above) merge the tokens, as they are both
numbers? That would be the prefered way I think.

An alternative would be to look at lexer states, but it wouldn't
be my first choice.

Dennis




zt wrote:

>Hi Dennis,
>
>Thanks a lot.
>
>I totally ignore the priority part the lex at PLY documents.
>The only difference between NUM and VECTOR tokens are NUM always after
>some special words like TSET or repeat.
>I got a way to do this: t_NUM=r'(?<=TSET\s)\d+|(?<=repeat\s)\d+'
>only problem is (?<=) grammar only support fixed length so I can not
>use \s* to match the spaces between them. So another scan to sub all
>duplicated \s is needed.
>
>What would be the usually way to solve this kind problem: the same
>character at different locations meaning different TOKEN?
>
>Best Regards,
>Adun
>
>On Dec 31 2008, 5:32 pm, "Hendriks, D." <[email protected]> wrote:
>  
>
>>Hello zt,
>>
>>both r'\d+' and r'0|1|...' match the numbers 0 and 1. Since the r'0|1|...' 
>>regular expression has a longer length, it is given priority (see Ply 
>>documentation). Is there any way to differentiate the NUM and VECTOR tokens? 
>>For instance, can NUM tokens start with a 0 at all? You will need to have two 
>>regular expressions that only match the given input for that token (that is, 
>>no overlap). Well, you can have overlap, as long as you know it's there and 
>>the one that is given priority is the one you want to have priority, but 
>>still, I think it is better to avoid the overlap alltogether...
>>
>>Dennis
>>
>>________________________________
>>
>>Van: [email protected] namens zt
>>Verzonden: wo 31-12-2008 9:49
>>Aan: ply-hack
>>Onderwerp: Lex token problem
>>
>>Hi all,
>>
>>I am still learning how to write parser with PLY. I need to parse
>>following format data:
>> TSET 1        001 X 0 00;
>>                    001 X 0 00;
>>                    001 X 0 00;
>> TSET 7        001 X 0 00;
>>repeat 12      001 X 0 00;
>>
>>The tokens are defined as:
>>t_TSET=r'TSET'
>>t_NUM=r'\d+'
>>t_MCODE=r'repeat'
>>t_VECTOR=r'0|1|H|L|X'
>>
>>but it kept treating the first "1" at line 1 as VECTOR instead of NUM
>>and the "1" after "repeat" as VECTOR.
>>Is there a good way to fix this?
>>
>>Thanks a lot!
>>
>> winmail.dat
>>6KViewDownload
>>    
>>
>>
>  
>


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Lex token problem

Reply via email to