On 08/19/2011 08:40 PM, Jonathan M Davis wrote:
On Friday, August 19, 2011 19:58:34 Benjamin Shropshire wrote:
On 08/18/2011 02:21 AM, unDEFER wrote:
Hello!
D language specification says that it supports UTF-8 strings, but I
can't
find how to slice UTF-8 string by character index, not by bytes numbers.
Why there is no simple slice function in std.utf like attached code?
BTW: your code is flawed. Feed it some of the stuff near the end of this
post and it will fail:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtm
l-self-contained-tags/1732454#1732454
tl;dr; your code doesn't slice on characters but something called (IIRC)
code points. If you start worrying about diacritic (and many end user
will want you to)
you need to do a bunch more processing.
http://en.wikipedia.org/wiki/Diacritic
His code works as well as slicing a dstring does - save for the efficiency
issues. There is no way in Phobos at present to deal with graphemes. All of
the string processing in Phobos deals with code points. For the most part,
this works great, but it is true that it isn't complete. I expect that we'll
get grapheme support eventually (Ibelieve that Dmitry has done some work on a
grapheme range for the updates that he's been doing to std.regex for GSoC, so
we may get it from there). But for now, none of the string processing in D
worries about graphemes - just code points.
My thought on that subject is: I can see good reason to index on proper
characters (get the 4th char in the word), good reason to index to a
character (or sometimes a code point) near some byte position and there
are clearly good reason to iterate thought code points, but I don't see
much value to be had from asking for a random Nth code point that can't
be had via something that has fewer problem and/or is cheaper.
- Jonathan M Davis
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos