On 12/24/05, Tim Peters <[EMAIL PROTECTED]> wrote: > [Tim] > >> FWIW, test_builtin and test_pep263 both passed on WinXP in rev 39757. > >> That's the last revision before the AST branch was merged. > >> > >> I can't build rev 39758 on WinXP (VC complains that pythoncore.vcproj > >> can't be loaded -- looks like it got checked in with unresolved SVN > >> conflict markers -- which isn't easy to do under SVN ;-( ), so don't > >> know about that. > >> > >> The first revision at which Python built again was 39791 (23 Oct), and > >> test_builtin and test_pep263 both fail under that the same way they > >> fail today. > > [Brett] > > Both syntax errors, right? > > In test_builtin, yes, two syntax errors. test_pep263 is different: > > test test_pep263 failed -- Traceback (most recent call last): > File "C:\Code\python\lib\test\test_pep263.py", line 12, in test_pep263 > '\xd0\x9f\xd0\xb8\xd1\x82\xd0\xbe\xd0\xbd' > AssertionError: > '\xc3\xb0\xc3\x89\xc3\x94\xc3\x8f\xc3\x8e' != > '\xd0\x9f\xd0\xb8\xd1\x82\xd0\xbe\xd0\xbd' > > That's not a syntax error, it's a wrong result. There are other > parsing-related test failures, but those are the only two I've written > up so far (partly because I expect they all have the same underlying > cause, and partly because nobody so far seems to understand the code > well enough to explain why the first one works on any platform ;-)). > > > My mind is partially gone thanks to being on vacation so following this > > thread > > has been abnormally hard. =) > > > > Since it is a syntax error there won't be any bytecode to compare against. > > Shouldn't be needed. The snippet: > > bom = '\xef\xbb\xbf' > compile(bom + 'print 1\n', '', 'exec') > > treats the `bom` prefix like any other sequence of illegal characters. > That's why it raises SyntaxError: > > It peels off the first character (\xef), and says "syntax > error" at that point: > > Py_CompileStringFlags -> > PyParser_ASTFromString -> > PyParser_ParseStringFlagsFilename -> > parsetok -> > PyTokenizer_Get > > That sets `a` to point at the start of the string, `b` to point at the > second character, and returns type==51. Then `len` is set to 1, > `str` is malloc'ed to hold 2 bytes, and `str` is filled in with > "\xef\x00" (the first byte of the input, as a NUL-terminated C > string). > > PyParser_AddToken then calls classify(), which falls off the end of > its last loop and returns -1: syntax error. > > and later: > > I'm getting a strong suspicion that I'm the only developer to _try_ > building the trunk on WinXP since the AST merge was done, and that > something obscure is fundamentally broken with it on this box. For > example, in tokenizer.c, these functions don't even exist on Windows > today (because an enclosing #ifdef says not to compile them): > > error_ret > new_string > get_normal_name > get_coding_spec > check_coding_spec > check_bom > fp_readl > fp_setreadl > fp_getc > fp_ungetc > decoding_fgets > decoding_feof > buf_getc > buf_ungetc > buf_setreadl > translate_into_utf8 > decode_str > > OK, that's not quite true. "Degenerate" forms of three of those > functions exist on Windows: > > static char * > decoding_fgets(char *s, int size, struct tok_state *tok) > { > return fgets(s, size, tok->fp); > } > > static int > decoding_feof(struct tok_state *tok) > { > return feof(tok->fp); > } > > static const char * > decode_str(const char *str, struct tok_state *tok) > { > return str; > } > > In the simple failing test, that degenerate decode_str() is getting > called. If the "fancy" decode_str() were being used instead, that one > _does_ call check_bom(). Why do we have two versions of these > functions? Which set is supposed to be in use now? What's the > meaning of "#ifdef PGEN" today? Should it be true or false? >
Looking at the logs for tokenizer.c, tokenizer.h, and tokenizer_pgen.c, it looks like this stuff has not been heavily touched since Martin did stuff for PEP 263. > >> I'm darned near certain that we're not using the _intended_ parsing > >> code on Windows now -- PGEN is still #define'd when the "final" > >> parsing code is compiled into python25.dll. Don't know how to fix > >> that (I don't understand it). > > > But the AST branch didn't touch the parser (unless you are grouping > > ast.c and compile.c under the "parser" umbrella just to throw me off > > =). > > Possibly. See above for unanswered questions about tokenizer.c, which > appears to be the whole problem wrt test_builtin. Python couldn't be > built under VC7.1 on Windows after the AST merge. However that got > repaired left parsing/tokenizing broken on Windows wrt (at least) some > encoding gimmicks. Since the tests passed immediately before the AST > merge, and failed the first time Python could be built again after > that merge, it's the only natural candidate for finger-wagging. > Did it lead to tokenizer_pgen.c to suddenly be used for the build instead of tokenizer.c? The former seems to be the only place where PGEN is defined. > > What can I do to help? > > I don't know. Enjoying Christmas couldn't hurt :-) What this needs > is someone who understands how > > bom = '\xef\xbb\xbf' > compile(bom + 'print 1\n', '', 'exec') > > is supposed to work at the front-end level. > Hopefully Martin will have some inkling since he committed the phase 1 stuff for PEP 263. > > Do you need me to step through something? > > Why doesn't the little code snippet above fail anywhere else? > "Should" the degenerate decode_str() be getting called during it -- or > should the other decode_str() be getting called? If the latter, what > got broke on Windows during the merge so that the wrong one is getting > called now? > > > Do you need to know how gcc is preprocessing some file? > > No, I just need to know how to fix Python on Windows ;-) =) -Brett _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com