On 2014-06-04 14:57, Marko Rauhamaa wrote:
> > If you use UTF-8 for everything, then you end up in a world where
> > string-indexing (see ChrisA's other side thread on this topic) is
> > no longer an O(1) operation, but an O(N) operation.
> Most string operations are O(N) anyway. Besides, you could try and
> be smart and keep a recent index cached so simple for loops would
> be O(N) instead of O(N**2). So the idea of keeping strings
> internally in UTF-8 might not be all that bad.
As mentioned elsewhere, I've got a LOT of code that expects that
string indexing is O(1) and rarely are those strings/offsets reused
I'm streaming through customer/provider data files, so caching
wouldn't do much good other than waste space and the time to maintain
If I knew that string indexing was O(something non constant), I'd
have retooled my algorithms to take that into consider, but that
would be a lot of code I'd need to touch.