Pycparser doesn't use the C compiler to strip comments; it uses the C
*preprocessor*.

Even without comments, it'd probably need the C preprocessor anyway for
things like macros and includes.

On Mon, Mar 9, 2015 at 1:43 PM, anatoly techtonik <techto...@gmail.com>
wrote:

> I'll start from afar, so that it will be easier to understand what I
> am thinking about..
>
> CFFI uses pycparser, which parses C files, but! uses C compiler
> to strip comments from C files and process defines, but almost
> all .c files contain comments, so pycparser is basically useless
> as a parser, but maybe it has a good API for working with AST.
>
> Anyway, I tried to see if I can teach pycparser to strip
> comments itself, and in c_lexer.py I found a list of tokens,
> among which there were no token representing the comment
> start. Stripped list:
>
>     ##
>     ## All the tokens recognized by the lexer
>     ##
>     tokens = keywords + (
>         # Identifiers
>         'ID',
>
>         # Type identifiers (identifiers previously defined as
>         # types with typedef)
>         'TYPEID',
>
>         # constants
>         'INT_CONST_DEC', 'INT_CONST_OCT', 'INT_CONST_HEX',
>         'FLOAT_CONST', 'HEX_FLOAT_CONST',
>         'CHAR_CONST',
>         'WCHAR_CONST',
> .    ...
>
> So I thought that I need to add a name for a token
> corresponding to comments start //, /* and end */
> and it will be better if the token name would be somewhat
> common among parsers, so that people looking at token
> could immediately recognize that it is a comment related.
> Apparently, properly naming is a little bit ambiguous for a
> automated processing. Editors like Spyder could also
> benefit information about token and their meaning in
> different programming languages. The processing of text
> comments that can be catched from the parsing stream is
> same for any language and could be IDE independent.
> Right now you can't just reuse the language definitions
> (such as ASDL) to just feed the IDE so that it can
> automatically figure out, what parts of text it can attach
> its functions to.
>
> I read the ontologies is way to express relations between
> object in this automatic was as triples. Like;
>
>   COMMENTSTART is a TOKEN
>   COMMENTSTART starts a COMMENT
>
> And I wonder, have anybody tried to apply this ontology
> stuff to designing and analysing computer languages?
> If yes, maybe there are some databases with such
> information about parsers. I would like to query names of
> all tokens that represent a program comment.
>
> --
> anatoly t.
> _______________________________________________
> pypy-dev mailing list
> pypy-dev@python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
>



-- 
Ryan
If I were in a 10-story building glass-sided building and forced to write
either Go or autotools scripts, I’d jump out a window.
http://kirbyfan64.github.io/
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to