[Issue 15710] Replacement for std.utf.validate which does not throw

2022-12-17 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15710

Iain Buclaw  changed:

   What|Removed |Added

   Priority|P1  |P4

--


[Issue 15710] Replacement for std.utf.validate which does not throw

2020-01-02 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15710

berni44  changed:

   What|Removed |Added

 CC||bugzi...@d-ecke.de
 Blocks||16262


Referenced Issues:

https://issues.dlang.org/show_bug.cgi?id=16262
[Issue 16262] assumeUTF attributes change between debug and release mode
--


[Issue 15710] Replacement for std.utf.validate which does not throw

2018-10-20 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15710

Ioana Stefan  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||ioana...@yahoo.com

--


[Issue 15710] Replacement for std.utf.validate which does not throw

2018-03-31 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15710

--- Comment #2 from Jonathan M Davis  ---
Well having isValidUTF rather than validate would just make it so that you can
use an if-else block instead of a try-catch. It wouldn't really clean that code
up much.

If you really want to clean up what you're doing in that example, you need to
use byCodeUnit or byUTF, which use the replacement character. In that case, you
wouldn't need to check for valid Unicode. If you want a string out the other 
side instead of a range of code units or code points, you then just call
to!string or toUTF8 on it. e.g.

auto codeUnits = str.byCodeUnit();

or

auto dchars = str.byDchar(); // byUTF!dchar

or

str = str.byCodeUnit().to!string();

Now, if you want a string and don't want to allocate a new string if the string
is valid, then you'd need to check whether the string is valid Unicode, but in
that case you still don't need anything as complicated as what you wrote for
toValidUTF. You'd just need something like

try
str.validate();
catch(UnicodeException)
str = str.byCodeUnit().to!string();

and if we had isValidUTF, then you'd have

if(!str.isValidUTF())
str = str.byCodeUnit().to!string();

So, while isValidUTF would help, it's mostly just getting rid of an unnecessary
exception, which does clean up the code in this case, but not drastically. It's
byCodeUnit or byUTF that really clean things up here.

Now, as for adding isValidUTF, I have a PR for it in the PR queue, and Andrei
approved the symbol, but he rejected the implementation. He basically wanted me
to completely redesign how decode works internally and that the superficial
changes I had made to make it work with isValidUTF were too ugly to live. So,
at some point here, I need to go back and figure out how to rework all that
again, which is not going to be pretty.

--


[Issue 15710] Replacement for std.utf.validate which does not throw

2018-03-31 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15710

Seb  changed:

   What|Removed |Added

   Keywords||bootcamp
 CC||greensunn...@gmail.com

--- Comment #1 from Seb  ---
Yeah, I would be really cool if I don't have to do such ugly hacks when I just
want to handle invalid UTF.

---
private string toValidUTF(string s)
{
import std.algorithm.iteration : map;
import std.range : iota;
import std.utf;
return s.representation.length
.iota
.map!(i => s.decode!(UseReplacementDchar.yes)(i))
.toUTF8;
}

try {
outStream.validate;
} catch (UTFException) {
outStream = outStream.toValidUTF;
}
---

--