Pycparser doesn't use the C compiler to strip comments; it uses the C *preprocessor*.
Even without comments, it'd probably need the C preprocessor anyway for things like macros and includes. On Mon, Mar 9, 2015 at 1:43 PM, anatoly techtonik <techto...@gmail.com> wrote: > I'll start from afar, so that it will be easier to understand what I > am thinking about.. > > CFFI uses pycparser, which parses C files, but! uses C compiler > to strip comments from C files and process defines, but almost > all .c files contain comments, so pycparser is basically useless > as a parser, but maybe it has a good API for working with AST. > > Anyway, I tried to see if I can teach pycparser to strip > comments itself, and in c_lexer.py I found a list of tokens, > among which there were no token representing the comment > start. Stripped list: > > ## > ## All the tokens recognized by the lexer > ## > tokens = keywords + ( > # Identifiers > 'ID', > > # Type identifiers (identifiers previously defined as > # types with typedef) > 'TYPEID', > > # constants > 'INT_CONST_DEC', 'INT_CONST_OCT', 'INT_CONST_HEX', > 'FLOAT_CONST', 'HEX_FLOAT_CONST', > 'CHAR_CONST', > 'WCHAR_CONST', > . ... > > So I thought that I need to add a name for a token > corresponding to comments start //, /* and end */ > and it will be better if the token name would be somewhat > common among parsers, so that people looking at token > could immediately recognize that it is a comment related. > Apparently, properly naming is a little bit ambiguous for a > automated processing. Editors like Spyder could also > benefit information about token and their meaning in > different programming languages. The processing of text > comments that can be catched from the parsing stream is > same for any language and could be IDE independent. > Right now you can't just reuse the language definitions > (such as ASDL) to just feed the IDE so that it can > automatically figure out, what parts of text it can attach > its functions to. > > I read the ontologies is way to express relations between > object in this automatic was as triples. Like; > > COMMENTSTART is a TOKEN > COMMENTSTART starts a COMMENT > > And I wonder, have anybody tried to apply this ontology > stuff to designing and analysing computer languages? > If yes, maybe there are some databases with such > information about parsers. I would like to query names of > all tokens that represent a program comment. > > -- > anatoly t. > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > https://mail.python.org/mailman/listinfo/pypy-dev > -- Ryan If I were in a 10-story building glass-sided building and forced to write either Go or autotools scripts, I’d jump out a window. http://kirbyfan64.github.io/
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev