On Oct 14, 2010, at 7:38 AM, A.T.Hofkamp wrote:

> David Beazley wrote:
>> lexer should be a generator that emits a stream of tokens. It should then be 
>> possible to
>> process/filter that stream with other generators/iterators as needed (for 
>> example, using various
>> functions in itertools). The only reason PLY doesn't use generators is that 
>> they didn't exist when
>> it was first written (2001).
> 
> *THE* thing I'd like to see happening, is to have streaming support of the 
> lexer, that is, I can give it a file-handle to a 50GB text file, and it won't 
> try loading that file data in memory.
> Unfortunately, that would probably mean rewriting RE, which is unlikely to 
> happen :(

+1000.   I agree completely on streaming.  For instance, I think it would be 
cool if PLY could be plugged straight into a network socket.    Does a 
streaming version of re (or any other re-like library) even exist for Python?  

>> The only other big feature is that I might get around to finishing is the 
>> ply/cpp.py module
>> (currently about 95% implemented). What is the cpp.py module you ask? Well, 
>> it's eventually going to
>> be a fully working C preprocessor--complete with support for macros and 
>> everything else. I'd like to
>> finish it simply because having a pure Python C preprocessor would be cool.
> 
> I didn't know that existed. I might be interested in that.
> Do you have an URL? what features are you missing?
> 
> In a processing program I am helping to write, the input is currently being 
> pre-processed by 'gcc -E' first (you may have seen the Position object 
> troubles with lineno that I had in the bug tracker).
> 
> Using your code may give a cleaner solution, if you can use ply/cpp.py as a 
> pre-processing step before the normal lexer.

There is no documentation except for code in the cpp.py file itself. Here are a 
few general things that I can tell you about that module:

1.  In its current form, cpp.py is unusable (or at least half-baked).
2.  The implementation is very roughly based on the preprocessor from Swig 
which I also implemented.
3.  The preprocessor does not operate on raw text, but rather on a token 
stream.  In other words, it takes the output of lex, processes it, and then 
produces another token stream as output.  Because of this, it is more tightly 
coupled with the lexer than one might imagine.
4.  The coupling with the lexer presents an interesting challenge because 
ideally the preprocessor should also be able to work with user-defined lexers.  
 However, this also means that the preprocessor has to integrate with 
user-defined tokens and things going on in the lexer specification.

There are several things missing from the cpp.py module that need to be 
addressed.  It needs tests--lots of tests.  The way in which it integrates with 
the rest of PLY also needs to be fleshed out and documented (as it is 
nontrivial).  For instance, I don't think the programming API for it ever 
settled.   Frankly, I'd probably want to reevaluate the whole implementation 
before starting work on it again.

Cheers,
Dave

-- 
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en.

Reply via email to