Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-24 Thread 積丹尼 Dan Jacobson
T> Then this item can be closed.  I think you had in mind something like #578825

Well I read that but I still don't know the workaround that will show
U+9109 etc. super common characters in xterm.



Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-19 Thread Thomas Dickey
- Original Message -
| From: "Zenaan Harkness" 
| To: "Thomas Dickey" , "872778" <872...@bugs.debian.org>
| Sent: Wednesday, September 19, 2018 8:32:09 PM
| Subject: Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some 
utf-8 unicode chars

|> > Thomas is there any other test I can run on Debian stable?
|> 
|> fwiw "locale" says
|> 
|> LANG=en_US.UTF-8
|> LANGUAGE=
|> LC_CTYPE="en_US.UTF-8"
|> LC_NUMERIC="en_US.UTF-8"
|> LC_TIME="en_US.UTF-8"
|> LC_COLLATE="en_US.UTF-8"
|> LC_MONETARY="en_US.UTF-8"
|> LC_MESSAGES="en_US.UTF-8"
|> LC_PAPER="en_US.UTF-8"
|> LC_NAME="en_US.UTF-8"
|> LC_ADDRESS="en_US.UTF-8"
|> LC_TELEPHONE="en_US.UTF-8"
|> LC_MEASUREMENT="en_US.UTF-8"
|> LC_IDENTIFICATION="en_US.UTF-8"
|> LC_ALL=en_US.UTF-8
|> 
|> and "env|grep -E '(LANG|LC_)'" says
|> 
|> LANG=en_US.UTF-8
|> GDM_LANG=en_US.UTF-8
| 
| Thank you - I set all those and xterm now works the same as
| xfce4-terminal. Problem isolated, thanks.

Then we can close this item.

-- 
Thomas E. Dickey 
http://invisible-island.net
ftp://ftp.invisible-island.net



Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-19 Thread Zenaan Harkness
> > Thomas is there any other test I can run on Debian stable?
> 
> fwiw "locale" says
> 
> LANG=en_US.UTF-8
> LANGUAGE=
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=en_US.UTF-8
> 
> and "env|grep -E '(LANG|LC_)'" says
> 
> LANG=en_US.UTF-8
> GDM_LANG=en_US.UTF-8

Thank you - I set all those and xterm now works the same as
xfce4-terminal. Problem isolated, thanks.



Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-19 Thread Thomas Dickey
On Wed, Sep 19, 2018 at 08:17:54PM +1000, Zenaan Harkness wrote:
> On Mon, Sep 17, 2018 at 04:47:28AM -0400, Thomas Dickey wrote:
> > On Wed, Aug 22, 2018 at 08:05:51PM +1000, Zenaan Harkness wrote:
> > > Create a text file containing e.g. the musical natural symbol, and
> > > the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function
> > > symbols, a space, and three natural symbols, inside plain quotes).
> > > 
> > > Now in an xterm -lc instance, with a UTF-8 locale, cat the file.
> > > 
> > > xterm displays the function and the natural symbols.
> > > 
> > > Now start the utf-8 compatible gui editor Geany, and open the same
> > > file in Geany.
> > > 
> > > Copy and paste those characters from Geany, into Geany - works.
> > > 
> > > Copy from Geany, paste to xterm - this also works.
> > > 
> > > Select/copy from xterm, middle-click paste into Geany - only the
> > > natural symbols, and not the function symbols, are pasted, also
> > > pasting to xterm (from copying from xterm) does not work.
> > > 
> > > SO, xterm is not properly copying some UTF-8 Unicode characters.
> > 
> > This update is unrelated to the original report, which deals with
> > characters past BMP (the example uses U+0192 and U+266E).
> > 
> > I have not been able to reproduce the problem.
> >  
> >  See also:
> > > https://lists.debian.org/debian-user/2017/09/msg00518.html
> > > https://lists.debian.org/debian-user/2017/09/msg00527.html
> > > 
> > > Should I file a different bug for this, or just leave this here?
> > 
> > It might be related to #901249, but I cannot say.  The other client
> > (Geany) seems to be a factor - if you can reproduce the problem with
> > xsel, that would be helpful.  copy and paste rely on the source to
> > provide the data in different formats, and the target to request
> > what's appropriate.
> 
> OK, so I've tested just using xsel:
> 
> The string I start with is "# ƒƒ ♮♮" without the quotes, and that
> should appear as:

None of your comments mention the locale you're using.  The manual page
description of "-lc" and the associated "locale" resource goes into some
detail, mentioning that it uses LC_CTYPE.  If you have some unexpected
value for that, you'll get unexpected results.
> 
> hash space function function space natural natural
> 
> In vim in xfce4-terminal (to write this email), that sequence pastes
> correctly.
> 
> Now, in xfce4-terminal, after selecting those chars, xsel -o
> correctly dumps them.
> 
> Jumping immediately to xterm -lc, then:
> 
>   xsel -o -also- correctly dumps those chars to the xterm.
> 
> That's good.
> 
> Next, select those chars in xterm, and xsel -o no longer dumps the
> function symbols;
> 
> That's not good.
> 
> xfce4-terminal now has the same problem with xsel -o NOT dumping the
> function symbols, as does middle click pasting into geany -
> SO, in my setup at least, the problem is copying the function symbol
> -from- xterm (copying from other apps, such as geany and from vim in
> xfce4-terminal, and straight from xfce4-terminal, all works
> correctly for xsel -o (in both xfce4-terminal and xterm -lc).

I made a shell script to check 901249, and adapted it to this report.
Running that, I don't see any discrepancies (that exercises both primary
and clipboard).
 
> According to https://en.wikipedia.org/wiki/%C6%91 this "function
> symbol" is actually called the "florin sign", but in any case has the
> code U+0192 which seems well within the 16-bit code plane.
> 
> 
> Here's what a little test run looks like in xterm -l (I've bound the
> function symbol to my keyboard so I can type it successfully):
> 
> $ echo 
> 
> $ # select above string, and:
> $ xsel -o
> $ 
> $ # now middle click:
> $ ?^C
> $ # now select from xfce4-terminal, then come back here:
> $ xsel -o
> $ 
> $ # now middle click:
> $ 

I tried this also, without seeing a problem.

> Thomas is there any other test I can run on Debian stable?

fwiw "locale" says

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

and "env|grep -E '(LANG|LC_)'" says

LANG=en_US.UTF-8
GDM_LANG=en_US.UTF-8

-- 
Thomas E. Dickey 
https://invisible-island.net
ftp://ftp.invisible-island.net


signature.asc
Description: Digital signature


Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-19 Thread Zenaan Harkness
On Mon, Sep 17, 2018 at 04:47:28AM -0400, Thomas Dickey wrote:
> On Wed, Aug 22, 2018 at 08:05:51PM +1000, Zenaan Harkness wrote:
> > Create a text file containing e.g. the musical natural symbol, and
> > the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function
> > symbols, a space, and three natural symbols, inside plain quotes).
> > 
> > Now in an xterm -lc instance, with a UTF-8 locale, cat the file.
> > 
> > xterm displays the function and the natural symbols.
> > 
> > Now start the utf-8 compatible gui editor Geany, and open the same
> > file in Geany.
> > 
> > Copy and paste those characters from Geany, into Geany - works.
> > 
> > Copy from Geany, paste to xterm - this also works.
> > 
> > Select/copy from xterm, middle-click paste into Geany - only the
> > natural symbols, and not the function symbols, are pasted, also
> > pasting to xterm (from copying from xterm) does not work.
> > 
> > SO, xterm is not properly copying some UTF-8 Unicode characters.
> 
> This update is unrelated to the original report, which deals with
> characters past BMP (the example uses U+0192 and U+266E).
> 
> I have not been able to reproduce the problem.
>  
>  See also:
> > https://lists.debian.org/debian-user/2017/09/msg00518.html
> > https://lists.debian.org/debian-user/2017/09/msg00527.html
> > 
> > Should I file a different bug for this, or just leave this here?
> 
> It might be related to #901249, but I cannot say.  The other client
> (Geany) seems to be a factor - if you can reproduce the problem with
> xsel, that would be helpful.  copy and paste rely on the source to
> provide the data in different formats, and the target to request
> what's appropriate.

OK, so I've tested just using xsel:

The string I start with is "# ƒƒ ♮♮" without the quotes, and that
should appear as:

hash space function function space natural natural

In vim in xfce4-terminal (to write this email), that sequence pastes
correctly.

Now, in xfce4-terminal, after selecting those chars, xsel -o
correctly dumps them.

Jumping immediately to xterm -lc, then:

  xsel -o -also- correctly dumps those chars to the xterm.

That's good.

Next, select those chars in xterm, and xsel -o no longer dumps the
function symbols;

That's not good.

xfce4-terminal now has the same problem with xsel -o NOT dumping the
function symbols, as does middle click pasting into geany -
SO, in my setup at least, the problem is copying the function symbol
-from- xterm (copying from other apps, such as geany and from vim in
xfce4-terminal, and straight from xfce4-terminal, all works
correctly for xsel -o (in both xfce4-terminal and xterm -lc).

According to https://en.wikipedia.org/wiki/%C6%91 this "function
symbol" is actually called the "florin sign", but in any case has the
code U+0192 which seems well within the 16-bit code plane.


Here's what a little test run looks like in xterm -l (I've bound the
function symbol to my keyboard so I can type it successfully):

$ echo 

$ # select above string, and:
$ xsel -o
$ 
$ # now middle click:
$ ?^C
$ # now select from xfce4-terminal, then come back here:
$ xsel -o
$ 
$ # now middle click:
$ 



Thomas is there any other test I can run on Debian stable?



Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-19 Thread Thomas Dickey
- Original Message -
| From: "積丹尼 Dan Jacobson" 
| To: "Thomas Dickey" 
| Cc: 872...@bugs.debian.org
| Sent: Tuesday, September 18, 2018 9:54:28 PM
| Subject: Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some 
utf-8 unicode chars

| These three show as boxes in xterm.
| T> (0x1f618, "FACE THROWING A KISS");
| T> (0x7522, "CJK UNIFIED IDEOGRAPH-7522");
| T> (0x9109, "CJK UNIFIED IDEOGRAPH-9109");
| Yes, copying them from xterm to elsewhere reveals them.

Then this item can be closed.  I think you had in mind something like #578825

-- 
Thomas E. Dickey 
http://invisible-island.net
ftp://ftp.invisible-island.net



Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-18 Thread 積丹尼 Dan Jacobson
These three show as boxes in xterm.
T> (0x1f618, "FACE THROWING A KISS");
T> (0x7522, "CJK UNIFIED IDEOGRAPH-7522");
T> (0x9109, "CJK UNIFIED IDEOGRAPH-9109");
Yes, copying them from xterm to elsewhere reveals them.



Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-18 Thread 積丹尼 Dan Jacobson
All I know is
U+7522 CJK UNIFIED IDEOGRAPH-7522
U+9109 CJK UNIFIED IDEOGRAPH-9109
show up as empty squares.



Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-17 Thread Thomas Dickey
On Tue, Sep 18, 2018 at 04:37:00AM +0800, 積丹尼 Dan Jacobson wrote:
> All I know is
> U+7522 CJK UNIFIED IDEOGRAPH-7522
> U+9109 CJK UNIFIED IDEOGRAPH-9109
> show up as empty squares.

Agreed, but that wasn't the point of this particular bug report.

I just created a test-script to print the 5 codepoints mentioned to a
text-file, and use xterm to copy and paste the result.  As I pointed
out "select/paste should work".  It worked for me - there may be some
locale dependency or resource-setting which is creating the problem
you reported.

(script attached)

-- 
Thomas E. Dickey 
https://invisible-island.net
ftp://ftp.invisible-island.net
#!/usr/bin/env perl

use strict;
use warnings;

use Encode 'encode_utf8';
binmode( STDOUT, ":utf8" );

sub show($$) {
	my $code = shift;
	my $name = shift;
	my $show = sprintf("U+%04X", $code);
	printf "%8s \"%c%c%c\" %s\n", $show, $code, $code, $code, $name;
}

(0x1f618, "FACE THROWING A KISS");
(0x7522, "CJK UNIFIED IDEOGRAPH-7522");
(0x9109, "CJK UNIFIED IDEOGRAPH-9109");
(0x0192, "LATIN SMALL LETTER F WITH HOOK");
(0x266E, "MUSIC NATURAL SIGN");


1;


signature.asc
Description: Digital signature


Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-09-17 Thread Thomas Dickey
On Wed, Aug 22, 2018 at 08:05:51PM +1000, Zenaan Harkness wrote:
> Create a text file containing e.g. the musical natural symbol, and
> the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function
> symbols, a space, and three natural symbols, inside plain quotes).
> 
> Now in an xterm -lc instance, with a UTF-8 locale, cat the file.
> 
> xterm displays the function and the natural symbols.
> 
> Now start the utf-8 compatible gui editor Geany, and open the same
> file in Geany.
> 
> Copy and paste those characters from Geany, into Geany - works.
> 
> Copy from Geany, paste to xterm - this also works.
> 
> Select/copy from xterm, middle-click paste into Geany - only the
> natural symbols, and not the function symbols, are pasted, also
> pasting to xterm (from copying from xterm) does not work.
> 
> SO, xterm is not properly copying some UTF-8 Unicode characters.

This update is unrelated to the original report, which deals with
characters past BMP (the example uses U+0192 and U+266E).

I have not been able to reproduce the problem.
 
 See also:
> https://lists.debian.org/debian-user/2017/09/msg00518.html
> https://lists.debian.org/debian-user/2017/09/msg00527.html
> 
> Should I file a different bug for this, or just leave this here?

It might be related to #901249, but I cannot say.  The other client
(Geany) seems to be a factor - if you can reproduce the problem with
xsel, that would be helpful.  copy and paste rely on the source to
provide the data in different formats, and the target to request
what's appropriate.

-- 
Thomas E. Dickey 
https://invisible-island.net
ftp://ftp.invisible-island.net


signature.asc
Description: Digital signature


Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars

2018-08-22 Thread Zenaan Harkness
Create a text file containing e.g. the musical natural symbol, and
the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function
symbols, a space, and three natural symbols, inside plain quotes).

Now in an xterm -lc instance, with a UTF-8 locale, cat the file.

xterm displays the function and the natural symbols.

Now start the utf-8 compatible gui editor Geany, and open the same
file in Geany.

Copy and paste those characters from Geany, into Geany - works.

Copy from Geany, paste to xterm - this also works.

Select/copy from xterm, middle-click paste into Geany - only the
natural symbols, and not the function symbols, are pasted, also
pasting to xterm (from copying from xterm) does not work.

SO, xterm is not properly copying some UTF-8 Unicode characters.

See also:
https://lists.debian.org/debian-user/2017/09/msg00518.html
https://lists.debian.org/debian-user/2017/09/msg00527.html

Should I file a different bug for this, or just leave this here?