Re: auto-decoding

2018-04-01 Thread Seb via Digitalmars-d-learn

On Sunday, 1 April 2018 at 02:44:32 UTC, Uknown wrote:
If you want to stop auto-decoding, you can use 
std.string.representation like this:


import std.string : representation;
auto no_decode = some_string.representation;

Now no_decode wont be auto-decoded, and you can use it in place 
of some_string. You can also use std.utf to decode by graphemes 
instead.


.representation gives you an const(ubyte)[]

What you typically want is const(char)[], for this you can use 
std.utf.byCodeUnit


https://dlang.org/phobos/std_utf.html#byCodeUnit

There's also this good article:

https://tour.dlang.org/tour/en/gems/unicode


Re: auto-decoding

2018-03-31 Thread Uknown via Digitalmars-d-learn

On Sunday, 1 April 2018 at 01:19:08 UTC, auto wrote:

What is auto decoding and why it is a problem?


Auto-decoding is essentially related to UTF representation of 
Unicode strings. In D, `char[]` and `string` represent UTF8 
strings, `wchar[]` and `wstring` represent UTF16 strings and 
`dchar[]` and `dstring` represent UTF32 strings. You need to know 
how UFT works in order to understand auto-decoding. Since in 
practice most code deals with UTF8 I'll explain wrt that. 
Essentially, the problem comes down to the fact that not all the 
Unicode characters are representable by 8 bit `char`s (for UTF8). 
Only the ASCII stuff is represented by the "normal" way. UTF8 
uses the fact that the first few buts in a char are never used in 
ASCII, to tell how many more `char`s ahead that character is 
encoded in. You can watch this video for a better 
understanding[0]. By default though, if one were to traverse a 
`char` looking for characters, they would get unexpected results 
with Unicode data


Auto-decoding tries to solve this by automatically applying the 
algorithm to decode the characters to Unicode "Code-Points". This 
is where my knowledge ends though. I'll give you pros and cons of 
auto-decoding.


Pros:
 * It makes Unicode string handeling much more easier for 
beginners.

 * Much less effort in general, it seems to "just work™"

Cons:
 * It makes string handling slow by default
 * It may be the wrong thing, since you may not want Unicode 
code-points, but graphemes instead.
 * Auto-decoding throws exceptions on reaching invalid 
code-points, so all string

handling code in general throws exceptions.

If you want to stop auto-decoding, you can use 
std.string.representation like this:


import std.string : representation;
auto no_decode = some_string.representation;

Now no_decode wont be auto-decoded, and you can use it in place 
of some_string. You can also use std.utf to decode by graphemes 
instead.


You should also read this blog post: 
https://jackstouffer.com/blog/d_auto_decoding_and_you.html


And this forum post: 
https://forum.dlang.org/post/eozguhavggchzzruz...@forum.dlang.org


[0]: https://www.youtube.com/watch?v=MijmeoH9LT4


auto-decoding

2018-03-31 Thread auto via Digitalmars-d-learn

What is auto decoding and why it is a problem?




Re: Auto-decoding

2017-07-15 Thread Seb via Digitalmars-d-learn

On Saturday, 15 July 2017 at 18:47:25 UTC, Joakim wrote:

On Saturday, 15 July 2017 at 18:14:48 UTC, aberba wrote:

So what is the current plan? :)


Andrei has talked about having a non-auto-decoding path for 
those who know what they're doing and actively choose that 
path, while keeping auto-decoding the default, so as not to 
break existing code.  Jack has been submitting PRs for this, 
but it is probably tedious work, so progress is slow and I 
don't know how much more remains to be done:


https://github.com/dlang/phobos/pulls?q=is%3Apr+auto-decoding+is%3Aclosed


The idea is that once DIP1000 has matured, more focus on compiler 
support for reference-counting will be given with the aim of 
improving the @nogc experience. One example is DIP1008 for @nogc 
exceptions [1], but another one that is important in this context 
is RCString [2]. The idea is that RCString will be a new opt-in 
string type without auto-decoding and GC.


Another idea in the game is `version(NoAutoDecode)`:

https://github.com/dlang/phobos/pull/5513

However, here's unfortunately still unclear whether that could 
result in a working solution.


[1] https://github.com/dlang/DIPs/blob/master/DIPs/DIP1008.md
[2] https://github.com/dlang/phobos/pull/4878


Re: Auto-decoding

2017-07-15 Thread Joakim via Digitalmars-d-learn

On Saturday, 15 July 2017 at 18:14:48 UTC, aberba wrote:

On Saturday, 15 July 2017 at 05:54:32 UTC, ag0aep6g wrote:

On 07/15/2017 06:21 AM, bauss wrote:

[...]


1) Drop two elements from "Bär". With auto-decoding you get 
"r", which is nice. Without auto-decoding you get [0xA4, 'r'] 
where 0xA4 is the second half of the encoding of 'ä'. You have 
to know your Unicode to understand what is going on there.


[...]


So what is the current plan? :)


Andrei has talked about having a non-auto-decoding path for those 
who know what they're doing and actively choose that path, while 
keeping auto-decoding the default, so as not to break existing 
code.  Jack has been submitting PRs for this, but it is probably 
tedious work, so progress is slow and I don't know how much more 
remains to be done:


https://github.com/dlang/phobos/pulls?q=is%3Apr+auto-decoding+is%3Aclosed


Re: Auto-decoding

2017-07-15 Thread ag0aep6g via Digitalmars-d-learn

On 07/15/2017 08:14 PM, aberba wrote:

So what is the current plan? :)


As far as I'm aware, there's no concrete plan to change anything. We 
just gotta deal with auto-decoding for the time being.


Re: Auto-decoding

2017-07-15 Thread aberba via Digitalmars-d-learn

On Saturday, 15 July 2017 at 05:54:32 UTC, ag0aep6g wrote:

On 07/15/2017 06:21 AM, bauss wrote:

[...]


1) Drop two elements from "Bär". With auto-decoding you get 
"r", which is nice. Without auto-decoding you get [0xA4, 'r'] 
where 0xA4 is the second half of the encoding of 'ä'. You have 
to know your Unicode to understand what is going on there.


[...]


So what is the current plan? :)


Re: Auto-decoding

2017-07-14 Thread ag0aep6g via Digitalmars-d-learn

On 07/15/2017 06:21 AM, bauss wrote:
I understand what it is and how it works, but I don't understand 
anything of how it solves any problems?


Could someone give an example of when auto-decoding actually is useful 
in contrast to not using it?


1) Drop two elements from "Bär". With auto-decoding you get "r", which 
is nice. Without auto-decoding you get [0xA4, 'r'] where 0xA4 is the 
second half of the encoding of 'ä'. You have to know your Unicode to 
understand what is going on there.


2) Search for 'ä' (one wchar/dchar) in the `string` "Bär". With 
auto-decoding, you pop the 'B' and then there's your 'ä'. Without 
auto-decoding, you can't find 'ä', because "Bär" doesn't have a single 
element that matches 'ä'. You have to search for "ä" (two `char`s) instead.


The goal of auto-decoding was to make it so that you don't have to think 
about Unicode all the time when processing strings. Instead you could 
think in terms of "characters". But auto-decoding falls flat on that 
goal, which is why it's disliked. You still have to think about Unicode 
stuff for correctness (combining characters, graphemes), and now you 
also have to worry about the performance of auto-decoding.


Auto-decoding

2017-07-14 Thread bauss via Digitalmars-d-learn
I understand what it is and how it works, but I don't understand 
anything of how it solves any problems?


Could someone give an example of when auto-decoding actually is 
useful in contrast to not using it?


Just trying to get an understanding of what exactly its purpose 
is.


I did read 
https://jackstouffer.com/blog/d_auto_decoding_and_you.html


But I still feel like there's not a clear explanation of what 
issues exist when you don't have it.


If I need to be more clear, just let me know.