On 5/3/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Untangling the parser from stdio - sure. I also think it would
> be desirable to read the whole source into a buffer, rather than
> applying a line-by-line input. That might be a bigger change,
> making the tokenizer a multi-stage algorithm:

> 1. read input into a buffer
> 2. determine source encoding (looking at a BOM, else a
>    declaration within the first two lines, else default
>    to UTF-8)
> 3. if the source encoding is not UTF-8, pass it through
>    a codec (decode to string, encode to UTF-8). Otherwise,
>    check that all bytes are really well-formed UTF-8.
> 4. start parsing

So people could hook into their own "codec" that, say, replaced native
language keywords with standard python keywords?

Part of me says that should be an import hook instead of pretending to
be a codec...

-jJ
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to