Lexing numeric literals

Austin Hastings Tue, 07 Jul 2015 14:39:07 -0700

Consider the lowly number:

   0
   01
   0b010101
   0xdeadbeef
   0755
   0o123
   0d299
   0.
   0.0
   1
   1.0e+0
   0xcafe.babep-2
   .17
   1.7
   1..7
   17.
   17...
   0xfastf00d


There is an ANTLR snippet that shows a way to deal with various kinds of 
numeric literals in the presence of '.' and '..' as language tokens. 
(http://bit.ly/1HDwCX5)

My question is, can someone point me at a fairly performant PLY version of 
this? Ideally, it would be robust (as the ANTLR version above) in the face 
of malformed constructs or range errors. Ideally, it would be well 
documented. But I'll settle for it works and it's fast. I'm hoping for 
either the C or Perl6 number formats, but I've got to deal with double-dot 
and triple-dot tokens, so the usual parsing-101 examples won't do.

Right now, I'm using a fairly large regexp. I'm kind of hating it, because 
there's so much backfilling that I have to do - python's re engine insists 
on unique groupnames, so I can't have, for example, "(?P<exponent>...)" in 
more than one location. (Or "range_error".) That in turns leads to lots of 
separate code checks for different spellings of the same thing.

I have wondered if a lexer state would be the right way to deal with this, 
but I don't think it feels quite right. (The state would let me break the 
regexp into separate pieces, and I could then reassemble them in the 
parser. But whitespace and a missing end signal make me leery of this 
approach.)

I have also wondered if there is an efficient way to chew through the input 
text by hand. But I keep thinking this is PLY's job, so there should be a 
way for PLY to do it!

Any advice or links appreciated.

=Austin

-- 
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ply-hack/7d978c6a-7093-44e4-ad88-b65c1d666123%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lexing numeric literals

Reply via email to