Thanks Thomas. I've read the document http://docs.python.org/py3k/reference/lexical_analysis.html
but I worried it might leak some language features like "tab magic". For I'm working on a parser with JavaScript I need a more strictly defined spec. Currently I have a highlighter here ->http://shaofei.name/python/PyHighlighter.html (Also the lexer http://shaofei.name/python/PyLexer.html) As you can see, I just make its behavior align with CPython, but I'm not sure what the real python lexical grammar is like. Does anyone know if there is a lexical grammar spec like other languages(e.g. http://bclary.com/2004/11/07/#annex-a)? Please help me. Thanks a lot. 在 2011-09-21 19:41:33,"Thomas Jollans" <t...@jollybox.de> 写道: >On 21/09/11 11:44, 程劭非 wrote: >> Hi, everyone, >> I've found there was several tokens used in python's >> grammar(http://docs.python.org/reference/grammar.html) but I didn't see >> their definition anywhere. The tokens listed here: > >They should be documented in >http://docs.python.org/py3k/reference/lexical_analysis.html - though >apparently not using these exact terms. > >> NEWLINE >Trivial: U+000A > >> ENDMARKER >End of file. > >> NAME >documented as "identifier" in 2.3 > >> INDENT >> DEDENT >Documented in 2.1.8. > >> NUMBER >Documented in 2.4.3 - 2.4.6 > >> STRING >Documented in 2.4.2 > >> I've got some infomations from the source >> code(http://svn.python.org/projects/python/trunk/Parser/tokenizer.c) but >> I'm not sure which feature is only for this specified implementaion. (I >> saw tabstop could be modified with comments using "tab-width:", >> ":tabstop=", ":ts=" or "set tabsize=", is this feature really in spec?) > >That sounds like a legacy feature that is no longer used. Somebody >familiar with the early history of Python might be able to shed more >light on the situation. It is inconsisten with the spec (section 2.1.8): > >""" >Indentation is rejected as inconsistent if a source file mixes tabs and >spaces in a way that makes the meaning dependent on the worth of a tab >in spaces; a TabError is raised in that case. >""" > >- Thomas >-- >http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list