[Issue 20134] autodecode should use replacementDchar rather than throwing on invalid

2019-08-15 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=20134

Jon Degenhardt  changed:

   What|Removed |Added

 CC||jrdemail2000-dl...@yahoo.co
   ||m

--- Comment #5 from Jon Degenhardt  ---
Correct handling of invalid UTF sequences is often known only by the
application. That is, it is task dependent. And in some applications, the
appropriate handling may not be known until runtime, making compile-time
decisions problematic.

A related piece of the puzzle is that in many high performance string
processing applications, it is useful to switch between modes of processing
where strings are handled as bytes for some algorithms, then switch back to
modes where strings are character sequences. When operating as bytes, UTF
interpretation is not needed or desired (so no detection of invalid UTF
sequences). But when algorithms are operating on characters, then invalid UTF
detection/handling is desired/required. (Note: Many of these algorithms are
possible because ASCII characters in UTF-8 can be used as single byte markers
without interpretation of other parts of the byte stream.)

This makes it difficult for libraries to implement a single policy and still
nicely support the wide range of application use-cases. Especially when there
may be many layers of code between the application layer making a call and the
lower level function where opportunity for detection occurs.

As an application developer, what I'd really like to have is a magical context
object where the current detection and handling policies are set, and have all
code invoked with the scope of that object obey them. I'd gladly take a
performance hit to get it. This may too big change, but it's worth considering
how well other solutions compare from an application development perspective.

--


[Issue 20134] autodecode should use replacementDchar rather than throwing on invalid

2019-08-15 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=20134

--- Comment #4 from Vladimir Panteleev  ---
(In reply to Walter Bright from comment #1)
> Over time, common practice has evolved from rejecting malformed UTF to
> replacing it with replacementDchar, which enables the application (like a
> web browser) to continue processing.

BTW, I don't think this is quite correct. Web browsers both raise an error (in
the dev console) AND continue processing. By using replacementDchar implicitly,
D programs would not know that there was ever a problem.

--


[Issue 20134] autodecode should use replacementDchar rather than throwing on invalid

2019-08-15 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=20134

Vladimir Panteleev  changed:

   What|Removed |Added

 CC||dlang-bugzilla@thecybershad
   ||ow.net

--- Comment #3 from Vladimir Panteleev  ---
(In reply to Walter Bright from comment #1)
> Over time, common practice has evolved from rejecting malformed UTF to
> replacing it with replacementDchar, which enables the application (like a
> web browser) to continue processing.

In applications where not crashing is preferrable to corrupting data, yes, but
I don't think we can make that decision in place of the user. Corrupted data
spreads and seeps into archives and can be very hard to rectify once it's
discovered, but crashes are immediately visible and usually easily fixable.

> Code should also be faster with this change.

So should either assuming that the strings are valid, or throwing Errors
instead of Exceptions, right?

--


[Issue 20134] autodecode should use replacementDchar rather than throwing on invalid

2019-08-15 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=20134

Walter Bright  changed:

   What|Removed |Added

   See Also||https://issues.dlang.org/sh
   ||ow_bug.cgi?id=14519

--


[Issue 20134] autodecode should use replacementDchar rather than throwing on invalid

2019-08-15 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=20134

Dlang Bot  changed:

   What|Removed |Added

   Keywords||pull

--- Comment #2 from Dlang Bot  ---
@WalterBright updated dlang/phobos pull request #7144 "fix Issue 20134 -
autodecode should use replacementDchar rather than throwing on invalid" fixing
this issue:

- fix Issue 20134 - autodecode should use replacementDchar rather than throwing
on invalid

https://github.com/dlang/phobos/pull/7144

--


[Issue 20134] autodecode should use replacementDchar rather than throwing on invalid

2019-08-15 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=20134

--- Comment #1 from Walter Bright  ---
Over time, common practice has evolved from rejecting malformed UTF to
replacing it with replacementDchar, which enables the application (like a web
browser) to continue processing.

Code should also be faster with this change.

--