On Sat, Jan 26, 2013 at 11:38 AM, Santosh Kumar <[email protected]> wrote: > > Everything starting with hash character in Python is comment and is > not interpreted by the interpreter. So how does that works? Give me > full explanation.
The encoding declaration is parsed in the process of compiling the source. CPython uses the function get_coding_spec in tokenizer.c. CPython 2.7.3 source link: http://hg.python.org/cpython/file/70274d53c1dd/Parser/tokenizer.c#l205 You can use the parser module to represent the nodes of a parsed source tree as a sequence of nested tuples. The first item in each tuple is the node type number. The associated names for each number are split across two dictionaries. symbol.sym_name maps non-terminal node types, and token.tok_name maps terminal nodes (i.e. leaf nodes in the tree). In CPython 2.7/3.3, node types below 256 are terminal. Here's an example source tree for two types of encoding declaration: >>> src1 = '# -*- coding: utf-8 -*-' >>> parser.suite(src1).totuple() (339, (257, (0, '')), 'utf-8') >>> src2 = '# coding=utf-8' >>> parser.suite(src2).totuple() (339, (257, (0, '')), 'utf-8') As expected, src1 and src2 are equivalent. Now find the names of node types 339, 257, and 0: >>> symbol.sym_name[339] 'encoding_decl' >>> symbol.sym_name[257] 'file_input' >>> token.ISTERMINAL(0) True >>> token.tok_name[0] 'ENDMARKER' The base node is type 339 (encoding_decl). The child is type 257 (file_input), which is just the empty body of the source (to keep it simple, src1 and src2 lack statements). Tacked on at the end is the string value of the encoding_decl (e.g. 'utf-8'). _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
