On Aug 2, 2006, at 7:28 PM, Theodore H. Smith wrote:

From: Brad Rhine <[EMAIL PROTECTED]>
Date: Wed, 2 Aug 2006 14:40:43 -0400

On Aug 2, 2006, at 2:37 PM, Theodore H. Smith wrote:

Why make your code do all sorts of awkward tricks with encodings,
(including but not limited to auto-convert on append), when you can
just assume all your data is UTF-8?

Because assumptions are dangerous. ;)

What if it's a guideline and not an assumption? Something like "use utf-8 for most data processing, and utf-16 simply for input/output"?

Part of the speed increase my FastString class gets over Charles's class based approach is that I don't do anything with encodings. Why should I? I've never had a problem with it and no users have reported it to me.

And you're writing in C, while I'm writing mostly in REALbasic. Now I find it fastest to use Split and Join.


By eliminating a case which might occur less than 1% of the time, I can get maybe 30% extra speed. And even that 1% of the time only proves to be a design error on the developer's part, because he'd get faster speed by using UTF-8 all throughout his app.

If there's anything I've learnt about string processing, it's that it's really best to use one model for your data. Whether that's C++ or RB or anything.

In C++ we have so many string classes, CString (via MFC), stl's string, char*, and then most libraries tend to have their own string class, like CFString, or NSString. Then you need to write an app using libraries, some which use char*, others using string, others using NSString... it becomes a mess, complex, and slow, to do all the interconversion.

Far quicker to just use one model, where possible.

Sure, but RS has chosen to opt for convenience, and it works pretty well for most situations.


Just the same for encodings. UTF-8 does everything so there's no advantage in using anything other than UTF-8 except for input and output.

It should be considered a design error to be processing strings in more than one encoding, except to convert it to and from the dominant encoding.

Well, I think you can assume that people should stick to your suggested design principles.

Probably Apple has some good developers, and they think UTF-16 is the better choice.

Charles Yeomans



_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to