Re: [Python-Dev] Internal representation of strings and Micropython

Glenn Linderman Wed, 04 Jun 2014 15:06:12 -0700

On 6/4/2014 2:28 PM, Chris Angelico wrote:

On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman <[email protected]> wrote:

8) (Content specific variable size caches)  Index each codepoint that is a
different byte size than the previous codepoint, allowing indexing to be
used in the intervals. Worst case size is like 2, best case size is a single
entry for the end, when all code points are represented by the same number
of bytes.

Conceptually interesting, and I'd love to know how well that'd perform
in real-world usage.


So would I :)

Would do very nicely on blocks of text that are
all from the same range of codepoints, but if you intersperse high and
low codepoints it'll be like 2 but with significantly more complicated
lookups (imagine a "name=value\nname=value\n" stream where the names
and values are all in the same language - you'll have a lot of
transitions).

Lookup is binary search on code point index or a search for same in sometree structure, I would think.

"like 2 but ..." well, the data structure would be bigger than for 2,but your example shows 4-5 high codepoints per low codepoint (for somelanguages).

I did just think of another refinement to this technique (my list wasnot intended to be all-inclusive... just a bunch of variations I thoughtof then).

10) (Content specific variable size caches) Like 8, but the lastcharacter in a run is allowed (but not required) to be a differentnumber of bytes than prior characters, because the offset calculationwill still work for the first character of a different size.

So #10 would halve the size of your imagined stream that interspersesone low-byte charater with each sequence of high-byte characters.

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Internal representation of strings and Micropython

Reply via email to