[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-05-02 Thread Egmont Koblinger
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #15 from Egmont Koblinger  ---
I did indeed make a technical mistake.

For some mysterious reason, what I incorrectly had in my mind was that the
boundary beyond BMP (i.e. at U+) is where UTF-8 increases from 2 bytes to
3. This is indeed not correct, this is where it increases from 3 bytes to 4.
The increase from 2 bytes to 3 happens at a much lower codepoint. Therefore,
indeed, there are many letter-based scripts that use 3 bytes per letter in
UTF-8.

I wrote the CJK stuff with this mistake in mind. I was technically incorrect,
and for this I do apologize from everyone.

* * *

That being said:

- Jayadevan still does not understand, and apparently still refuses to even try
to understand, that UTF-16 cannot be made to work in terminals for a plethora
of reasons, including, but not limited to the fact that all components of the
system that interact with terminals only support ASCII-compatible encodings
there;

- ignores that UTF-16 mode never worked; that is, firmly speaks up against
removing something that, again, *never* *worked*, *still* *does* *not* *work*
and *can* *not* *be* *fixed*;

- ignores that the work happening inside terminal emulators is typically
English-centric by its nature;

- ignores that the difference in the byte count simply does not matter at all;

- ignores that choosing one technical solution over the other, even if that
technical solution results in a higher network traffic for some languages vs. a
lower one for some others, is no discrimination whatsoever (and even if it was,
the right place to complain would be the Unicode Consortium, and not Konsole);

- ignores that even if terminals and their surrounding infrastructure could
implement UTF-16 support (which, again, they cannot reasonably do), this would
mean switching from 1 standard to 2 incompatible ones being used concurrently,
which would obviously bring plenty of problems (the very exact problem that
Unicode was meant to fix) and would not solve anything;

- in comments 7 & 9 used wording that are at the very least borderline
unacceptable.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-05-02 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #14 from tcanabr...@kde.org ---
Jayadevan, 

As a Konsole developer, I prefer to have things based on facts as explained by
Egmont. *Even* if it was just *one* issue with a language it's enough reason to
not use UTF16.

*I don't care* if UTF16 makes things less anglocentric, I care that "cat"
understands what's a newline and I care that grep undestands what's a "\t", not
to mention pipe, <<, and other special chars that are handled by the terminal
running inside of konsole.


You stepped out of line as soon as you used the 'racist' card, and *i will not*
tolarate this kind of accusation. You had one warning, I urge you to not have
three.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-05-02 Thread Jayadevan
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #13 from Jayadevan  ---
(In reply to tcanabrava from comment #11)
> This thread is now under Community Working Group supervision.
> 
> (1) All strings should be sanitised, so that they will be perfectly safe,
> and will not break anything.
> 
> You clearly are ignoring the issues pointed out by Egmond, sanitization has
> nothing to do with this.
> 
> (2) It is racist to suggest that all non-English people are Chinese (or
> Japanese or Korean). 
> 
> Please take a look at the KDE Code of Conduct, we will not tolerate
> accusations of racism on what as meat to be an explanation based on a
> example. if there is more than CJK that uses more bytes per enconding, is
> irrelevant.
> 
> Most scripts in the world are given only 3 byte encodings per character in
> UTF-8, and not a code point per spoken word, as you say. That is a lie.
> 
> (3) The world has still not settled on UTF-16. But modern languages and
> platforms tend to do so. Java, Dotnet, ICU, KDE, QT, Windows NT, JavaScript,
> Dart, Flutter...
> In today's world, support for both the modern UTF-16 and the legacy UTF-8 is
> needed.
> 
> Patches welcome, I won't spend time working on this untill the *base
> software* (bash, zsh, etc) supports it.


He mentioned that Scripts other than English are having one code point to stand
for one "syllable or an entire word". He used CJK as an example. That is a
cherry-picked example to prove a wrong point. The conclusion was that 1 code
point can have 3 bytes for non-Latin scripts, as they have one word per code
point.

Most scripts like Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil,
Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Georgian,
Ethiopic, Cherokee, Unified Canadian Aboriginal Syllabics, Khmer, and many
others, used by billions of people are having 3 bytes per code-point, and have
only one phoneme per code point, unlike he mentioned.

His cherry-picking of examples to prove a wrong point. He said "The only sense
in which one can perhaps claim that UTF-8 is Anglo-centric, is that it uses 1
byte for English letters vs. 3 bytes for CJK (Chinese, Japanese, Korean)
symbols; whereas UTF-16 uses 2 for both. Given that an English letter
represents, well, a single letter of a word, whereas a CJK symbol represents a
syllable or an entire word, I actually do think UTF-8's 1:3 split is a way more
fair system." The implication is clearly that other than English (or Latin),
the only scripts which matter is CJK. That is clearly inappropriate against
people from South Asia, SE Asia, Cherokee, Canadian Aboriginals etc.

The scripts of South Asia, SE Asia, Cherokee, Canadian Aboriginals etc. deserve
equal status as English. These scripts are used by billions of people. Claiming
that "The only sense in which one can perhaps claim that UTF-8 is
Anglo-centric, is that it uses 1 byte for English letters vs. 3 bytes for CJK"
ignores the importance of scripts used by billions of humans. It is a factually
wrong statement, and not just a case of using a bad example.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-05-02 Thread Egmont Koblinger
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #12 from Egmont Koblinger  ---
tcanabrava,

Thanks a lot for stepping in!

As anyone can see here, Jayadevan accused my proposal of being discriminatory,
and then, when I provided purely technical arguments to disprove this claim (by
the way, those technical arguments are backed up by 20+ years of understanding
terminal emulators and their surrounding infrastructure; including 6+ years of
being a developer of a terminal emulator (not Konsole)), the said person,
obviously without even attempting to understand the tecnhical arguments,
accused my words of being racist.

This behavior is utterly outrageous and unacceptable, and I believe that the
said person already deserves to be banned. I was about to report this behavior,
but as you say you've already taken action – thanks again for that –, I think
there's nothing more I could or should do here.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-05-02 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=395171

tcanabr...@kde.org changed:

   What|Removed |Added

 CC||tcanabr...@kde.org

--- Comment #11 from tcanabr...@kde.org ---
This thread is now under Community Working Group supervision.

(1) All strings should be sanitised, so that they will be perfectly safe, and
will not break anything.

You clearly are ignoring the issues pointed out by Egmond, sanitization has
nothing to do with this.

(2) It is racist to suggest that all non-English people are Chinese (or
Japanese or Korean). 

Please take a look at the KDE Code of Conduct, we will not tolerate accusations
of racism on what as meat to be an explanation based on a example. if there is
more than CJK that uses more bytes per enconding, is irrelevant.

Most scripts in the world are given only 3 byte encodings per character in
UTF-8, and not a code point per spoken word, as you say. That is a lie.

(3) The world has still not settled on UTF-16. But modern languages and
platforms tend to do so. Java, Dotnet, ICU, KDE, QT, Windows NT, JavaScript,
Dart, Flutter...
In today's world, support for both the modern UTF-16 and the legacy UTF-8 is
needed.

Patches welcome, I won't spend time working on this untill the *base software*
(bash, zsh, etc) supports it.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-05-02 Thread Jayadevan
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #10 from Jayadevan  ---
(In reply to Jayadevan from comment #9)
> You said you won't respond, but for the sake of clarity for others, I have
> to reply.
> 
> 
> (1) All strings should be sanitised, so that they will be perfectly safe,
> and will not break anything.
> (2) It is racist to suggest that all non-English people are Chinese (or
> Japanese or Korean). Most scripts in the world are given only 3 byte
> encodings per character in UTF-8, and not a code point per spoken word, as
> you say. That is a lie.
> (3) The world has still not settled on UTF-16. But modern languages and
> platforms tend to do so. Java, Dotnet, ICU, KDE, QT, Windows NT, JavaScript,
> Dart, Flutter...
> 
> In today's world, support for both the modern UTF-16 and the legacy UTF-8 is
> needed.

The above comment is in response to Egmont Koblinger)

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-05-02 Thread Jayadevan
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #9 from Jayadevan  ---
You said you won't respond, but for the sake of clarity for others, I have to
reply.


(1) All strings should be sanitised, so that they will be perfectly safe, and
will not break anything.
(2) It is racist to suggest that all non-English people are Chinese (or
Japanese or Korean). Most scripts in the world are given only 3 byte encodings
per character in UTF-8, and not a code point per spoken word, as you say. That
is a lie.
(3) The world has still not settled on UTF-16. But modern languages and
platforms tend to do so. Java, Dotnet, ICU, KDE, QT, Windows NT, JavaScript,
Dart, Flutter...

In today's world, support for both the modern UTF-16 and the legacy UTF-8 is
needed.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-04-30 Thread Egmont Koblinger
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #8 from Egmont Koblinger  ---
(In reply to Jayadevan from comment #7)

I stopped working on terminal emulation about a year ago. Yet, I'm making a
single exception here to respond (i.e. I most likely won't follow up, don't
bother writing in order to expect a response from me).


> Please reject such proposals, as those are discriminatory.

I firmly refute this claim.

There is nothing discriminatory in the proposal whatsoever.

The reason behind this request – and this should be obvious to everyone who
takes time to really _understand_ the post and the linked article – is that
UTF-16 (and a few friends) as the _I/O_ encoding *does not work*, *never
worked* and even more importantly, *cannot be fixed to work*.

More precisely, you can write a terminal emulator that speaks this encoding,
but when placed in its context (i.e. surrounded by a Unix kernel, libc, higher
level libraries, tools, apps, tmux-likes, other computers to ssh to/from, etc.)
it won't do anything that makes sense, since all the surrounding infrastructure
only support ASCII-compatible encodings for the communication with the
terminal.

In order to support UTF-16 as the _I/O_ encoding, in a way that you actually
get a working ecosystem around the terminal with this encoding, you'd need
modifications to the kernel's tty handling (line discipline, stty special
characters etc.), the kernel's tty-accessing API (to enforce UTF-16, or at
least an even number of bytes on all opertaions that write to / read from a
tty, or work with 16-bit units instead of 8-bit ones, in order to exclude the
possibility of going out of sync, causing permanent breakages), accompanied
with the corresponding changes in standards (e.g. POSIX), you'd need these
changes in libc too, you'd need heavy modifications in all the apps (e.g.
change from '\0'-terminated byte strings to wide strings or whatnot); you'd
need to throw out any shell script that contains even an "echo foo" (in an
ASCII-compatible encoding) beacuse that would outright break the terminal if
sent out as-is, you'd need to rethink "cat" (how to transfer potentially odd
number of bytes into a channel that expects even numbers), you'd need to add
UTF-16 locales, and so on and so forth... I just sketched up a tiny subset of
the problems. You'd need to essentially rethink and adjust all the APIs,
libraries, every single tool or application inside the terminal, literally
everything. All these in order to create a system that's utterly incompatible
with what we already have, and regarding the user-visible outcome is not any
tad bit better. It's clearly not going to happen, and even if happened, would
be clearly harmful.

There is no politics or discrimination at all here, this is purely technical.


> UTF-8 is Anglo-centric. UTF-16 treats each writing system more fairly.

UTF-8 can represent the exact same things as UTF-16. They support all writing
systems to the very same extent.

The only sense in which one can perhaps claim that UTF-8 is Anglo-centric, is
that it uses 1 byte for English letters vs. 3 bytes for CJK (Chinese, Japanese,
Korean) symbols; whereas UTF-16 uses 2 for both. Given that an English letter
represents, well, a single letter of a word, whereas a CJK symbol represents a
syllable or an entire word, I actually do think UTF-8's 1:3 split is a way more
fair system. (Let alone that the typical work happening inside a terminal is
usually English-centric.)

By the way: who cares? With today's network speeds, combined with the tiny
amount of terminal data compared to any other activity you do over any network,
the difference in the byte count just simply does not matter at all.


> Since KDE Internally uses UTF-16, UTF-16 should be supported.

Trying to make a connection between the _internal_ encoding and the _I/O_
encoding is not justified at all.

As an occasional user of Konsole I don't have the slightest idea what encoding
it uses _internally_, and it should be this way. Users shouldn't care, users
shouldn't need to care. If users needed to care, it would mean that the
developers did a terrible job. The internal encoding is subject to change by
the developers at any time, without any user noticing it.

What _I/O_ encodings Konsole supports (or, in this case: incorrectly claims to
support) is an utterly independent story.


> Also, UTF-16 is used by KDE, QT, C/C++ (From ICU), Java, Windows,
> JavaScript, Android, DartVM, Dart Language, and modern frameworks
> like Flutter.

You see: they made a choice. They don't offer alternatives, they decided on one
encoding.

The same goes for terminals. They decided on UTF-8; unsurprisingly, since for
millions of technical reasons, the encoding needs to be ASCII-compatible,
whereas there's a natural need to encode any text.

Many modern terminal emulators only support UTF-8 encoding and nothing else.
Many other terminal emulators support some legacy deprecated ones for backwards
compatibility, back from 

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-04-29 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=395171

jayadevanr...@yandex.com changed:

   What|Removed |Added

 CC||jayadevanr...@yandex.com

--- Comment #7 from jayadevanr...@yandex.com ---
Please reject such proposals, as those are discriminatory. UTF-8 is
Anglo-centric. UTF-16 treats each writing system more fairly.

Since KDE Internally uses UTF-16, UTF-16 should be supported. Also, UTF-16 is
used by KDE, QT, C/C++ (From ICU), Java, Windows, JavaScript, Android, DartVM,
Dart Language, and modern frameworks like Flutter. UTF-16 should get first
class native support.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2021-02-08 Thread Kurt Hindenburg
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #6 from Kurt Hindenburg  ---
Konsole uses KCodecAction which uses KCodecs/KCharsets.  I'm not sure it is
even possible to ask for a certain sub-set or how much extra work would be
required.

Leave this BKO open; perhaps someone will have to time to research.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2020-11-18 Thread Justin Zobel
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #5 from Justin Zobel  ---
(In reply to Egmont Koblinger from comment #4)
> > Can you please confirm this issue still occurs in recent konsole versions.
> 
> I have version 19.12.3.
> 
> It's Settings -> Edit Current Profile -> Advanced -> Default character
> encoding.
> 
> Also right-click on the terminal -> Set Encoding.
> 
> I assume you're a Konsole developer. I'm pretty certain that you can come up
> with a definite answer, that is, either locate this feature in newest
> Konsole (even if the menus were rearranged), or find the commit which
> removed it, in no more time than it would take for me to test the newest
> version. Honestly, I don't quite understand why you needed this feedback
> from me at all. If the said menus are no longer there, could you please do
> the research yourself? Thanks!

Thank you for the update Egmont. I am not a konsole developer, I am part of the
KDE Bug Triage team and we are working to confirm bugs that have been reported
so that the developers can work on the fixes.

I've asked one of the developers to look in on this bug as it's a bit above my
level of knowledge.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2020-11-18 Thread Egmont Koblinger
https://bugs.kde.org/show_bug.cgi?id=395171

Egmont Koblinger  changed:

   What|Removed |Added

 Resolution|WAITINGFORINFO  |---
 Status|NEEDSINFO   |REPORTED

--- Comment #4 from Egmont Koblinger  ---
> Can you please confirm this issue still occurs in recent konsole versions.

I have version 19.12.3.

It's Settings -> Edit Current Profile -> Advanced -> Default character
encoding.

Also right-click on the terminal -> Set Encoding.

I assume you're a Konsole developer. I'm pretty certain that you can come up
with a definite answer, that is, either locate this feature in newest Konsole
(even if the menus were rearranged), or find the commit which removed it, in no
more time than it would take for me to test the newest version. Honestly, I
don't quite understand why you needed this feedback from me at all. If the said
menus are no longer there, could you please do the research yourself? Thanks!

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2020-11-17 Thread Bug Janitor Service
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #3 from Bug Janitor Service  ---
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2020-11-02 Thread Justin Zobel
https://bugs.kde.org/show_bug.cgi?id=395171

Justin Zobel  changed:

   What|Removed |Added

 Resolution|--- |WAITINGFORINFO
 Status|REPORTED|NEEDSINFO
 CC||justin.zo...@gmail.com

--- Comment #2 from Justin Zobel  ---
Thanks for the detailed report Egmont. Can you please confirm this issue still
occurs in recent konsole versions. I couldn't find anything about encodings in
konsole.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

2018-06-09 Thread Egmont Koblinger
https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #1 from Egmont Koblinger  ---
(Forget the last paragraph. I didn't realize it wasn't a konsole bug.)

-- 
You are receiving this mail because:
You are watching all bug changes.