[Issue 5016] to!() can not convert from wide characters to char

2011-01-22 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=5016


Andrei Alexandrescu and...@metalanguage.com changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


--- Comment #4 from Andrei Alexandrescu and...@metalanguage.com 2011-01-22 
15:11:51 PST ---
std.conv.to for narrowing conversions acts as a checked cast. This bug was
fixed in http://www.dsource.org/projects/phobos/changeset/2359 and
http://www.dsource.org/projects/phobos/changeset/2363

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 5016] to!() can not convert from wide characters to char

2011-01-09 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=5016


Andrei Alexandrescu and...@metalanguage.com changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||and...@metalanguage.com
 AssignedTo|nob...@puremagic.com|and...@metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 5016] to!() can not convert from wide characters to char

2011-01-09 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=5016



--- Comment #1 from Marcin Kuszczak aa...@interia.pl 2011-01-09 13:18:44 PST 
---
After rethinking problem it seems that real problem is that char and wchar are
not real characters. These two types are just artificial things which cause
more troubles than necessary. The only true character is dchar and all other
character types should be depreciated.

In such a case:
string = ubyte[] = dchar[]
wstring = ushort[] = dchar[]

... and maybe also:
dstring = uint[] = dchar[]

where = means can be viewed as

It would solve cleanly and properly problems with strange and unnecessary
conversions like dchar - char

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 5016] to!() can not convert from wide characters to char

2011-01-09 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=5016


Jonathan M Davis jmdavisp...@gmx.com changed:

   What|Removed |Added

 CC||jmdavisp...@gmx.com


--- Comment #2 from Jonathan M Davis jmdavisp...@gmx.com 2011-01-09 15:16:32 
PST ---
char is explictly defined to be a UTF-8 code unit. wchar is explicitly defined
to be a UTF-16 code unit. dchar is explicitly defined to be a UTF-32 code unit.
In UTF-8 and UTF-16, it can take multiple code units to make up a code point,
whereas it always takes one code one UTF-32 code unit to make a code point. A
code point is what you would normally think of as a character. This is all
standard unicode stuff and getting rid of it would be foolish. It's used all
over the place in computing, not just in D.

Part of the trick to dealing with char and wchar correctly is that if you wish
to deal with code points / characters (_not_ code units), then _never_ deal
with char and wchar individually. That's why most of std.string deals with
entire strings at time. If you want to deal with an individual character, you
either use a dchar or one of the string types - e.g. 'a' as a dchar or a as a
string type. You shouldn't be converting from dchar to char and vice versa (or
between either of those and wchar). It really doesn't make sense. What makes
sense is converting between string types.

On the whole, what D does works fantastically, but you need to understand the
basics of unicode. The best place to look would probably be The D Programming
Language by Andrei Alexandrescu, since it applies directly to D, but there are
plenty of places online to find info on unicode, and you can look at the online
docs on arrays for more info about them: http://is.gd/krYRH .

What it comes down to really is that you use whatever string type you need
based on size - string, wstring, or dstring - or the need to be able to treat
an individual array index as a character. If you need to be able to use random
access on a string (including using them in algorithms in std.algorithm which
require random access ranges), or if you need to be able to alter individual
characters in place, then use dstring or dchar[]. Otherwise, save space and use
either string or wstring (string would generally be better unless you're using
primarily asian characters, since they tend to take 3 bytes in UTF-8 and 2 in
UTF-16).

There are functions which specifically take a dchar, so you can give them a
character then, but most deal entirely in strings, even if what you really care
about is an individual character. So, generally just treat individual
characters as strings with one character.

Take a look at the functions in std.utf: http://is.gd/krZLW . e.g.
std.utf.count() can be used to tell you how many code points / characters there
are in a string, and std.utf.stride() will tell you how many code units a
particular character is so that you can index into a string or wstring if you
have to.

When using foreach, make sure that you give the type as dchar. e.g.

string str = hello world;

foreach(dchar c; str)
writeln(c);

will print out each character individually, whereas as using char (which is the
default if you don't give a type) or wchar would print out the individual code
units (which isn't generally very useful). foreach is smart enough to convert
the string to the appropriate type on the fly while iterating over it, so if
you give it dchar, it'll take each code point at a time instead of each code
unit.

I'm sure that there are other things that would be useful to point out, but
that's all that comes to mind at the moment. On the whole, the way D handles
strings is fantastic. You just have to realize that you're dealing with UTF-8,
UTF-16, and UTF-32 code units instead of code points when you have a char,
wchar, or dchar respectively. dchar/UTF-32 is the only type where code units
and code points are the same size.

There has been some talk of various improvements to how all of this works (like
possibly making dchar the default type for foreach with string types), so some
incremental improvements may be made to iron out some of the wrinkles, but
strings in D are designed the way that they are on purpose, and it's not likely
to be drastically changed. For the most part, the problem is not the design but
rather understanding what the design is so that you can use it properly.

If you want to avoid the whole issue, then you can just use dstring everywhere,
but that _will_ result in using about 4 times the amount of memory as you would
need with string if you're dealing primarily with ASCII characters.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 5016] to!() can not convert from wide characters to char

2011-01-09 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=5016



--- Comment #3 from Jonathan M Davis jmdavisp...@gmx.com 2011-01-09 15:20:07 
PST ---
std.conv.to!() does need to be fixed to better handle the situation though. It
should probably either outright refuse to convert between each of the character
types on the theory that there's pretty much no way that that's a good idea and
that the programmer can just use cast if they really, actually need to do such
a conversion. Or it should throw when the character can't fit in a single code
unit of the target type, though that's going to result in code that is rather
hit or miss as to whether it's going to succeed or not and wouldn't likely be a
good idea to use in code generally.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---