On 5/3/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Untangling the parser from stdio - sure. I also think it would > be desirable to read the whole source into a buffer, rather than > applying a line-by-line input. That might be a bigger change, > making the tokenizer a multi-stage algorithm:
> 1. read input into a buffer > 2. determine source encoding (looking at a BOM, else a > declaration within the first two lines, else default > to UTF-8) > 3. if the source encoding is not UTF-8, pass it through > a codec (decode to string, encode to UTF-8). Otherwise, > check that all bytes are really well-formed UTF-8. > 4. start parsing So people could hook into their own "codec" that, say, replaced native language keywords with standard python keywords? Part of me says that should be an import hook instead of pretending to be a codec... -jJ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com