Re: [Python-Dev] Internal representation of strings and Micropython

Serhiy Storchaka Wed, 04 Jun 2014 09:50:55 -0700

04.06.14 18:38, Paul Sokolovsky написав(ла):

Any non-trivial text parsing uses indices or regular expressions (and
regular expressions themself use indices internally).


I keep hearing this stuff, and unfortunately so far don't have enough
time to collect all that stuff and provide detailed response. So,
here's spur of the moment response - hopefully we're in the same
context so it is easy to understand.

So, gentlemen, you keep mixing up character-by-character random access
to string and taking substrings of a string.

Character-by-character random access imply that you would need to scan
thru (possibly almost) all chars in a string. That's O(N) (N-length of
string). With varlength encoding (taking O(N) to index arbitrary char),
there's thus concern that this would be O(N^2) op.

But show me real-world case for that. Common usecase is scanning string
left-to-right, that should be done using iterator and thus O(N).
Right-to-left scanning would be order(s) of magnitude less frequent, as
and also handled by iterator.

html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don'tuse iterators. They use indices, str.find and/or regular expressions.Common use case is quickly find substring starting from current positionusing str.find or re.search, process found token, advance position andrepeat.



_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Internal representation of strings and Micropython

Reply via email to