On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman <v+pyt...@g.nevcal.com> wrote:
> 8) (Content specific variable size caches)  Index each codepoint that is a
> different byte size than the previous codepoint, allowing indexing to be
> used in the intervals. Worst case size is like 2, best case size is a single
> entry for the end, when all code points are represented by the same number
> of bytes.

Conceptually interesting, and I'd love to know how well that'd perform
in real-world usage. Would do very nicely on blocks of text that are
all from the same range of codepoints, but if you intersperse high and
low codepoints it'll be like 2 but with significantly more complicated
lookups (imagine a "name=value\nname=value\n" stream where the names
and values are all in the same language - you'll have a lot of
transitions).

Chrisa
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to