[re2c-general] A bunch of features

Robert Mon, 18 Jan 2010 17:05:29 -0800

Hello list,

I use re2c for a while and I think I should share a few ideas with you 
to speed things up, make the result more readably and to have new nice 
features.


### optimize output for certain compilers ###
If the compiler is known, it would be great to add a few compiler 
specific options. The idea is brought through gcc's short-hand syntax 
for switches like:

case 'a':
case 'b':
case 'c':
case 'd':

as

case 'a' .. 'd':

If you use the complete alphabet or some other consecutive char's it 
would make the source more readably and maybe gcc finds a better way to 
optimize this (in the future). I think there could be some other useful 
options, also for other compilers.

### optimize nest threshold ###
When I generate the output, I copy a few things from one place to 
another to save some jumps. For example:

switch(ch) {
case 'a' : goto yy9;
default: goto yy1;
}
yy9:
switch(ch) {
case 'b' : goto yy10;
default: goto yy1;
}

With threshold, I mean a value to remove the one or the other jump by 
copying the new switch direclty in the upper switch, if the number of 
references to a block are less than $threshold. For example yy9 is only 
used once in the context, it would be a good candidate to save the jump 
and write:

switch(ch) {
case 'a' :
   switch(ch) {
   case 'b' : goto yy10;
   default: goto yy1;
}
default: goto yy1;
}

For example, with a default threshold of 4, this would shrink the 
output. Okay maybe it is not readably anymore but it should significant 
improve the performance - and yes also blows the binary. But the 
threshold could also be 1 as default to leave the size and only improve 
performance.

### return upper or lower case ###
This idea comes from a HTML-Parser, I wrote using re2c for lexing. In 
non-strict HTML, it is allowed to use upper- and lowercase tag names. If 
it is allowed to overwrite the original input (for example after it were 
dup'ed) and on the machine is a=97 and A=65, the output of re2c could 
look like this:

case 'A':
*YYCURSOR|= 0x20;
case 'a':

...and vice versa. Maybe you have a better idea to implement such a 
functionality, but it would be a really cool feature.

### getting sub elements ###
Writing a protocol parser is no fun with the current version of re2c. I 
would acclaim a syntax to get sub elements directly. Just a notional 
syntax for that could be:

method = {
 "GET"   {http_method = HTTP_GET; }
| "POST"   {http_method = HTTP_POST; }
| "HEAD"   {http_method = HTTP_HEAD; }
}

version {
 "HTTP/1.1" {http_version = HTTP_1_1; }
| "HTTP/1.0" {http_version = HTTP_1_0; }
}

request = method " " uri " " version

This should be a very easy thing. You can also rewrite the re2c output 
to achive this, but I don't like this annoying job.

### done ###
I hope you can find something useful in it. What do you think about 
these ideas?

greeting
Robert

http://www.xarg.org



------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Re2c-general mailing list
Re2c-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/re2c-general

[re2c-general] A bunch of features

Reply via email to