Hi ho!

On 07/22/2016 11:08 PM, Jochen Becher wrote:
currently I am working on a new feature for QtCreator. I need to
analyze the C++ source code (non preprocessed) very precisely. Seems to
be difficult because I couldn't find the token stream of the non-
preprocessed source code.

Yup, the preprocessor lexer runs on demand (Preprocessor::preprocess), the tokens are not pre-generated.

Side note: The c++ lexer, operating on preprocessed code, will pre-generate tokens (TranslationUnit::parse). Those tokens will be released together with the AST after CppModelManager::documentUpdated is emitted.

I tried to use the information from CppTools::SemanticInfo. It gives me
most of the info I need but not in a simple token stream.

If needed, the token stream is generated on demand, at least for small scopes like "current line". See e.g. the use of SimpleLexer in CppHighlighter::highlightBlock or CodeFormatter::tokenizeBlock.

Analyzing the macro definitions I found that CPlusPlus::Macro is
inconsistent: the utf16CharOffset() points to the start of the macro
name while the length() includes the #define token. So I can neither
know the start of the #define token nor the end of the macro definition
(because I do not know how many whitespaces were skipped between
#define and macro name).

Today Macro::length() is used only once and only to compare the values
from two Macro entities. Thus I could fix the length() to not include
the #define token and whitespaces.

But unfortunately that still does not give me the start of the #define
token. Shall I include another offset in Macro pointing to that #define
token start or is there any other simple way (instead of parsing the
source code myself) to find this #define token start position?

Hmm, your code would be the only client and it would increase the memory consumption (probably not significant). Not sure whether that's worth it - is the exact position really necessary? E.g. for FollowSymbol we position the cursor simply on the start of #define line.

If you can access the source code without reading the file from disk (e.g. because it's an opened document), then use SimpleLexer for Macro::line() or even inspect the line manually? Finding the 'd' in "<WS>#<WS>d" shouldn't be too expensive.

BTW: Is there a simple way to get the full token stream of the non-
preprocessed code?

No, see above, the stream is not pre-generated.

> I wouldn't like to start the preprocessor though this might be a solution.

I don't see how this would help, you probably meant the lexer?!

Nikolai


_______________________________________________
Qt-creator mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/qt-creator

Reply via email to