Re: [Caml-list] Storing UTF-8 in plain strings

2009-08-13 Thread Florian Hars
Dario Teixeira schrieb: So, can someone find any problems with this reasoning? No, the kind of compatibility with legacy code you described is one of the original design goals of UTF-8, see http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt - Florian.

Re: [Caml-list] Storing UTF-8 in plain strings

2009-08-13 Thread Dario Teixeira
Hi, I'm using Ulex + Menhir to parse UTF-8 encoded source code, and I'm relying on plain strings for processing and storing data.  I *think* I can get away with using only the String module to handle this variable-length encoding as long as I am careful with the way I treat these strings. 

Re: [Caml-list] Storing UTF-8 in plain strings

2009-08-13 Thread Richard Jones
On Wed, Aug 12, 2009 at 10:36:56AM -0700, Dario Teixeira wrote: Hi, I'm using Ulex + Menhir to parse UTF-8 encoded source code, and I'm relying on plain strings for processing and storing data. I *think* I can get away with using only the String module to handle this variable-length

Re: [Caml-list] Storing UTF-8 in plain strings

2009-08-13 Thread Dario Teixeira
Hi, Thank you all for your comments.  Ulex has caught all the intentionally malformed code points I've inserted in the stream, so I'm fairly confident it's up to the task.  But if I find a problem I'll keep Netconversion's and Extlib's validation functions in mind... By the way, I just

[Caml-list] IFL 2009: Final Call for Papers and Participation

2009-08-13 Thread IFL 2009
Call for Papers and ParticipationIFL 2009Seton Hall UniversitySouth Orange, NJ, USAhttp://tltc.shu.edu/blogs/projects/IFL2009/Register at: http://tltc.shu.edu/blogs/projects/IFL2009/registration.html* NEW *Registration and talk submission deadline fast approaching: August 23,