Re: [PD] request for objections: any2string -> unsigned char

Mathieu Bouchard Thu, 15 Jan 2009 11:47:36 -0800

On Thu, 15 Jan 2009, Bryan Jurish wrote:

Unicode might be more immediately intuitive to most users, but when it
comes down to it, byte-strings are IMHO the more basic representation (a
char* is still a char*, even in this post-unicode world).

What happened is that people switched to UTF-8 instead of some fixed-sizeencoding because many apps that assume that a character is a byte willwork anyway. Just don't ask those apps to say how many characters thereare in a string though. You have to pretend that all the "special"characters are pairs of characters instead (when they are not triplets).

A good string handling mechanism should have a good general defaultrepresentation (e.g. as UTF-${MachineWordBits}), but should likewiseallow access to "raw" byte strings, and be able to accommodate variousencodings. Not that I'm really hankering to write any of that, mind you;-) Perhaps a better name for the external as I think of it would be[any2bytes]. I'm perfectly willing to cede the "string" name tosomething better (Martin's string patch comes to mind),


I gather that it'll take a long time before Pd gets unicode support...

... except if you're building rsp. reading a persistent index for a
large file, in which case tell() & seek() are likely to be a wee bit
faster than parsing and counting variable-length-encoded characters ...


right.

 _ _ __ ___ _____ ________ _____________ _____________________ ...
| Mathieu Bouchard - tél:+1.514.383.3801, Montréal, Québec

_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

Re: [PD] request for objections: any2string -> unsigned char

Reply via email to