Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
T> Then this item can be closed. I think you had in mind something like #578825 Well I read that but I still don't know the workaround that will show U+9109 etc. super common characters in xterm.
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
- Original Message - | From: "Zenaan Harkness" | To: "Thomas Dickey" , "872778" <872...@bugs.debian.org> | Sent: Wednesday, September 19, 2018 8:32:09 PM | Subject: Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars |> > Thomas is there any other test I can run on Debian stable? |> |> fwiw "locale" says |> |> LANG=en_US.UTF-8 |> LANGUAGE= |> LC_CTYPE="en_US.UTF-8" |> LC_NUMERIC="en_US.UTF-8" |> LC_TIME="en_US.UTF-8" |> LC_COLLATE="en_US.UTF-8" |> LC_MONETARY="en_US.UTF-8" |> LC_MESSAGES="en_US.UTF-8" |> LC_PAPER="en_US.UTF-8" |> LC_NAME="en_US.UTF-8" |> LC_ADDRESS="en_US.UTF-8" |> LC_TELEPHONE="en_US.UTF-8" |> LC_MEASUREMENT="en_US.UTF-8" |> LC_IDENTIFICATION="en_US.UTF-8" |> LC_ALL=en_US.UTF-8 |> |> and "env|grep -E '(LANG|LC_)'" says |> |> LANG=en_US.UTF-8 |> GDM_LANG=en_US.UTF-8 | | Thank you - I set all those and xterm now works the same as | xfce4-terminal. Problem isolated, thanks. Then we can close this item. -- Thomas E. Dickey http://invisible-island.net ftp://ftp.invisible-island.net
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
> > Thomas is there any other test I can run on Debian stable? > > fwiw "locale" says > > LANG=en_US.UTF-8 > LANGUAGE= > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_PAPER="en_US.UTF-8" > LC_NAME="en_US.UTF-8" > LC_ADDRESS="en_US.UTF-8" > LC_TELEPHONE="en_US.UTF-8" > LC_MEASUREMENT="en_US.UTF-8" > LC_IDENTIFICATION="en_US.UTF-8" > LC_ALL=en_US.UTF-8 > > and "env|grep -E '(LANG|LC_)'" says > > LANG=en_US.UTF-8 > GDM_LANG=en_US.UTF-8 Thank you - I set all those and xterm now works the same as xfce4-terminal. Problem isolated, thanks.
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
On Wed, Sep 19, 2018 at 08:17:54PM +1000, Zenaan Harkness wrote: > On Mon, Sep 17, 2018 at 04:47:28AM -0400, Thomas Dickey wrote: > > On Wed, Aug 22, 2018 at 08:05:51PM +1000, Zenaan Harkness wrote: > > > Create a text file containing e.g. the musical natural symbol, and > > > the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function > > > symbols, a space, and three natural symbols, inside plain quotes). > > > > > > Now in an xterm -lc instance, with a UTF-8 locale, cat the file. > > > > > > xterm displays the function and the natural symbols. > > > > > > Now start the utf-8 compatible gui editor Geany, and open the same > > > file in Geany. > > > > > > Copy and paste those characters from Geany, into Geany - works. > > > > > > Copy from Geany, paste to xterm - this also works. > > > > > > Select/copy from xterm, middle-click paste into Geany - only the > > > natural symbols, and not the function symbols, are pasted, also > > > pasting to xterm (from copying from xterm) does not work. > > > > > > SO, xterm is not properly copying some UTF-8 Unicode characters. > > > > This update is unrelated to the original report, which deals with > > characters past BMP (the example uses U+0192 and U+266E). > > > > I have not been able to reproduce the problem. > > > > See also: > > > https://lists.debian.org/debian-user/2017/09/msg00518.html > > > https://lists.debian.org/debian-user/2017/09/msg00527.html > > > > > > Should I file a different bug for this, or just leave this here? > > > > It might be related to #901249, but I cannot say. The other client > > (Geany) seems to be a factor - if you can reproduce the problem with > > xsel, that would be helpful. copy and paste rely on the source to > > provide the data in different formats, and the target to request > > what's appropriate. > > OK, so I've tested just using xsel: > > The string I start with is "# ƒƒ ♮♮" without the quotes, and that > should appear as: None of your comments mention the locale you're using. The manual page description of "-lc" and the associated "locale" resource goes into some detail, mentioning that it uses LC_CTYPE. If you have some unexpected value for that, you'll get unexpected results. > > hash space function function space natural natural > > In vim in xfce4-terminal (to write this email), that sequence pastes > correctly. > > Now, in xfce4-terminal, after selecting those chars, xsel -o > correctly dumps them. > > Jumping immediately to xterm -lc, then: > > xsel -o -also- correctly dumps those chars to the xterm. > > That's good. > > Next, select those chars in xterm, and xsel -o no longer dumps the > function symbols; > > That's not good. > > xfce4-terminal now has the same problem with xsel -o NOT dumping the > function symbols, as does middle click pasting into geany - > SO, in my setup at least, the problem is copying the function symbol > -from- xterm (copying from other apps, such as geany and from vim in > xfce4-terminal, and straight from xfce4-terminal, all works > correctly for xsel -o (in both xfce4-terminal and xterm -lc). I made a shell script to check 901249, and adapted it to this report. Running that, I don't see any discrepancies (that exercises both primary and clipboard). > According to https://en.wikipedia.org/wiki/%C6%91 this "function > symbol" is actually called the "florin sign", but in any case has the > code U+0192 which seems well within the 16-bit code plane. > > > Here's what a little test run looks like in xterm -l (I've bound the > function symbol to my keyboard so I can type it successfully): > > $ echo > > $ # select above string, and: > $ xsel -o > $ > $ # now middle click: > $ ?^C > $ # now select from xfce4-terminal, then come back here: > $ xsel -o > $ > $ # now middle click: > $ I tried this also, without seeing a problem. > Thomas is there any other test I can run on Debian stable? fwiw "locale" says LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8 and "env|grep -E '(LANG|LC_)'" says LANG=en_US.UTF-8 GDM_LANG=en_US.UTF-8 -- Thomas E. Dickey https://invisible-island.net ftp://ftp.invisible-island.net signature.asc Description: Digital signature
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
On Mon, Sep 17, 2018 at 04:47:28AM -0400, Thomas Dickey wrote: > On Wed, Aug 22, 2018 at 08:05:51PM +1000, Zenaan Harkness wrote: > > Create a text file containing e.g. the musical natural symbol, and > > the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function > > symbols, a space, and three natural symbols, inside plain quotes). > > > > Now in an xterm -lc instance, with a UTF-8 locale, cat the file. > > > > xterm displays the function and the natural symbols. > > > > Now start the utf-8 compatible gui editor Geany, and open the same > > file in Geany. > > > > Copy and paste those characters from Geany, into Geany - works. > > > > Copy from Geany, paste to xterm - this also works. > > > > Select/copy from xterm, middle-click paste into Geany - only the > > natural symbols, and not the function symbols, are pasted, also > > pasting to xterm (from copying from xterm) does not work. > > > > SO, xterm is not properly copying some UTF-8 Unicode characters. > > This update is unrelated to the original report, which deals with > characters past BMP (the example uses U+0192 and U+266E). > > I have not been able to reproduce the problem. > > See also: > > https://lists.debian.org/debian-user/2017/09/msg00518.html > > https://lists.debian.org/debian-user/2017/09/msg00527.html > > > > Should I file a different bug for this, or just leave this here? > > It might be related to #901249, but I cannot say. The other client > (Geany) seems to be a factor - if you can reproduce the problem with > xsel, that would be helpful. copy and paste rely on the source to > provide the data in different formats, and the target to request > what's appropriate. OK, so I've tested just using xsel: The string I start with is "# ƒƒ ♮♮" without the quotes, and that should appear as: hash space function function space natural natural In vim in xfce4-terminal (to write this email), that sequence pastes correctly. Now, in xfce4-terminal, after selecting those chars, xsel -o correctly dumps them. Jumping immediately to xterm -lc, then: xsel -o -also- correctly dumps those chars to the xterm. That's good. Next, select those chars in xterm, and xsel -o no longer dumps the function symbols; That's not good. xfce4-terminal now has the same problem with xsel -o NOT dumping the function symbols, as does middle click pasting into geany - SO, in my setup at least, the problem is copying the function symbol -from- xterm (copying from other apps, such as geany and from vim in xfce4-terminal, and straight from xfce4-terminal, all works correctly for xsel -o (in both xfce4-terminal and xterm -lc). According to https://en.wikipedia.org/wiki/%C6%91 this "function symbol" is actually called the "florin sign", but in any case has the code U+0192 which seems well within the 16-bit code plane. Here's what a little test run looks like in xterm -l (I've bound the function symbol to my keyboard so I can type it successfully): $ echo $ # select above string, and: $ xsel -o $ $ # now middle click: $ ?^C $ # now select from xfce4-terminal, then come back here: $ xsel -o $ $ # now middle click: $ Thomas is there any other test I can run on Debian stable?
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
- Original Message - | From: "積丹尼 Dan Jacobson" | To: "Thomas Dickey" | Cc: 872...@bugs.debian.org | Sent: Tuesday, September 18, 2018 9:54:28 PM | Subject: Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars | These three show as boxes in xterm. | T> (0x1f618, "FACE THROWING A KISS"); | T> (0x7522, "CJK UNIFIED IDEOGRAPH-7522"); | T> (0x9109, "CJK UNIFIED IDEOGRAPH-9109"); | Yes, copying them from xterm to elsewhere reveals them. Then this item can be closed. I think you had in mind something like #578825 -- Thomas E. Dickey http://invisible-island.net ftp://ftp.invisible-island.net
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
These three show as boxes in xterm. T> (0x1f618, "FACE THROWING A KISS"); T> (0x7522, "CJK UNIFIED IDEOGRAPH-7522"); T> (0x9109, "CJK UNIFIED IDEOGRAPH-9109"); Yes, copying them from xterm to elsewhere reveals them.
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
All I know is U+7522 CJK UNIFIED IDEOGRAPH-7522 U+9109 CJK UNIFIED IDEOGRAPH-9109 show up as empty squares.
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
On Tue, Sep 18, 2018 at 04:37:00AM +0800, 積丹尼 Dan Jacobson wrote: > All I know is > U+7522 CJK UNIFIED IDEOGRAPH-7522 > U+9109 CJK UNIFIED IDEOGRAPH-9109 > show up as empty squares. Agreed, but that wasn't the point of this particular bug report. I just created a test-script to print the 5 codepoints mentioned to a text-file, and use xterm to copy and paste the result. As I pointed out "select/paste should work". It worked for me - there may be some locale dependency or resource-setting which is creating the problem you reported. (script attached) -- Thomas E. Dickey https://invisible-island.net ftp://ftp.invisible-island.net #!/usr/bin/env perl use strict; use warnings; use Encode 'encode_utf8'; binmode( STDOUT, ":utf8" ); sub show($$) { my $code = shift; my $name = shift; my $show = sprintf("U+%04X", $code); printf "%8s \"%c%c%c\" %s\n", $show, $code, $code, $code, $name; } (0x1f618, "FACE THROWING A KISS"); (0x7522, "CJK UNIFIED IDEOGRAPH-7522"); (0x9109, "CJK UNIFIED IDEOGRAPH-9109"); (0x0192, "LATIN SMALL LETTER F WITH HOOK"); (0x266E, "MUSIC NATURAL SIGN"); 1; signature.asc Description: Digital signature
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
On Wed, Aug 22, 2018 at 08:05:51PM +1000, Zenaan Harkness wrote: > Create a text file containing e.g. the musical natural symbol, and > the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function > symbols, a space, and three natural symbols, inside plain quotes). > > Now in an xterm -lc instance, with a UTF-8 locale, cat the file. > > xterm displays the function and the natural symbols. > > Now start the utf-8 compatible gui editor Geany, and open the same > file in Geany. > > Copy and paste those characters from Geany, into Geany - works. > > Copy from Geany, paste to xterm - this also works. > > Select/copy from xterm, middle-click paste into Geany - only the > natural symbols, and not the function symbols, are pasted, also > pasting to xterm (from copying from xterm) does not work. > > SO, xterm is not properly copying some UTF-8 Unicode characters. This update is unrelated to the original report, which deals with characters past BMP (the example uses U+0192 and U+266E). I have not been able to reproduce the problem. See also: > https://lists.debian.org/debian-user/2017/09/msg00518.html > https://lists.debian.org/debian-user/2017/09/msg00527.html > > Should I file a different bug for this, or just leave this here? It might be related to #901249, but I cannot say. The other client (Geany) seems to be a factor - if you can reproduce the problem with xsel, that would be helpful. copy and paste rely on the source to provide the data in different formats, and the target to request what's appropriate. -- Thomas E. Dickey https://invisible-island.net ftp://ftp.invisible-island.net signature.asc Description: Digital signature
Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars
Create a text file containing e.g. the musical natural symbol, and the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function symbols, a space, and three natural symbols, inside plain quotes). Now in an xterm -lc instance, with a UTF-8 locale, cat the file. xterm displays the function and the natural symbols. Now start the utf-8 compatible gui editor Geany, and open the same file in Geany. Copy and paste those characters from Geany, into Geany - works. Copy from Geany, paste to xterm - this also works. Select/copy from xterm, middle-click paste into Geany - only the natural symbols, and not the function symbols, are pasted, also pasting to xterm (from copying from xterm) does not work. SO, xterm is not properly copying some UTF-8 Unicode characters. See also: https://lists.debian.org/debian-user/2017/09/msg00518.html https://lists.debian.org/debian-user/2017/09/msg00527.html Should I file a different bug for this, or just leave this here?