Bug#419832: zsh: expanding non-ASCII filenames with TAB

2007-08-18 Thread Alan Curry
Here's a backtrace showing where the extra characters get lost during an
expand operation (touch a$'\300' and then cat a*TAB)

#0  stringaszleline (instr=0x100e0850 cat a\300, incs=6, outll=0xfc97658,
outsz=0xfc97890, outcs=0xfc97684) at zle_utils.c:244
#1  0x0fc70224 in unmetafy_line () at zle_tricky.c:979
#2  0x0fc74ac0 in docomplete (lst=3) at zle_tricky.c:870
#3  0x0fc7697c in expandorcompleteprefix (args=0x100e0850) at zle_tricky.c:2742
#4  0x0fc60870 in execzlefunc (func=0xfc94990, args=0xfc97614,
set_bindk=264861272) at zle_main.c:1261
#5  0x0fc60dbc in zlecore () at zle_main.c:1019
#6  0x0fc614cc in zleread (lp=value optimized out, rp=value optimized out,
flags=value optimized out, context=value optimized out)
at zle_main.c:1174
#7  0x1003df68 in inputline () at input.c:278
#8  0x1003e970 in ingetc () at input.c:214
#9  0x1003813c in ihgetc () at hist.c:240
#10 0x1004a70c in yylex () at lex.c:646
#11 0x100709d8 in parse_event () at parse.c:451
#12 0x1003c73c in loop (toplevel=1, justonce=0) at init.c:129
#13 0x1003d8cc in zsh_main (argc=value optimized out,
argv=value optimized out) at init.c:1347
#14 0x1000b410 in main (argc=269355088, argv=0x6) at ./main.c:93

stringaszleline() gets an MB_INVALID from mbrtowc and doesn't handle it well.

When doing completion instead of expanding, the string gets generated by
add_match_data, which handles MB_INVALID by generating the $'\300'
replacement string. Maybe stringaszleline should be doing that too.

-- 
Alan Curry
[EMAIL PROTECTED]


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419832: zsh: expanding non-ASCII filenames with TAB

2007-08-17 Thread Alan Curry
Clint Adams writes the following:

On Wed, Apr 18, 2007 at 02:31:42AM -0400, Alan Curry wrote:
 In the following demonstration, the first TAB keypress inserted the $'\300'
 for me. The second TAB keypress, typed immediately after the asterisk,
 should expand the glob into $'\300' also, but instead it just erases the
 asterisk, replacing it with nothing at all. If Return is pressed after the
 tab, the cat is executed with no arguments and reads from the tty.

 Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Non-ASCII characters don't exist in the C locale; maybe you want to pick
a better one.


That's a pretty lame brush-off.

My locale is set correctly (to be precise, it is unset correctly; none of
those environment variables are set). It represents the type of output I want
to get from all programs that recognize locales: text in English if possible,
and traditional sort order, not that new-fangled chaotic LANG=en order, where
ls hides your Makefile in the middle of all your lowercase source files! (Why
do you think they made make(1) recognize Makefiles with a capital M? Because
it belongs at the start of the listing, that's why.)

If you think this behavior is justified, for what am I being punished? Using
the default (C) locale? It accurately describes what language I can read.
Having a file that is not a valid sequence of characters in that locale?
Maybe I should go file bug reports on all the programs that allow me to
create a file with such a name. That will be a lot of bug reports.

Or maybe we could admit that regardless of one's preferred locale, it is
inevitable that one will occasionally obtain files whose names are not valid
character strings in that locale. It would be nice if our tools would not
choke on those, would it not?

The $'\300' notation is a vast improvement over what older zsh versions did,
just dump the wacky bytes directly to the terminal. The current version
already automatically inserts $'\300' when completing; I only suggest that it
behave identically when expanding.

Expanding a glob to an empty list, when in fact it matched something, surely
can't be considered acceptable behavior. Even worse if it matched several
things and only one of them had a nasty byte and got omitted, you might not
notice and then go ahead and act on the wrong set of files.

Come on.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419832: zsh: expanding non-ASCII filenames with TAB

2007-08-17 Thread Clint Adams
On Fri, Aug 17, 2007 at 03:22:10PM +0100, Peter Stephenson wrote:
 I can believe there's some logic missing here, but it's not currently
 clear to me here what.  Could you post an explicit recipe for
 getting from an unconfigured shell to an expansion that doesn't
 (somehow) display all the elements?

% zsh -f
percebes% autoload -U compinit;compinit
percebes% mkdir /tmp/blah
percebes% cd !$
cd /tmp/blah
percebes% export LANG=en_US.UTF-8
percebes% touch a b$'\300' c
percebes% cat *TAB

(expands to)
percebes% cat a b


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419832: zsh: expanding non-ASCII filenames with TAB doesn't work

2007-08-16 Thread Clint Adams
On Wed, Apr 18, 2007 at 02:31:42AM -0400, Alan Curry wrote:
 In the following demonstration, the first TAB keypress inserted the $'\300'
 for me. The second TAB keypress, typed immediately after the asterisk,
 should expand the glob into $'\300' also, but instead it just erases the
 asterisk, replacing it with nothing at all. If Return is pressed after the
 tab, the cat is executed with no arguments and reads from the tty.

 Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Non-ASCII characters don't exist in the C locale; maybe you want to pick
a better one.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419832: zsh: expanding non-ASCII filenames with TAB doesn't work

2007-04-18 Thread Alan Curry
Package: zsh
Version: 4.3.2-25
Severity: normal

In the following demonstration, the first TAB keypress inserted the $'\300'
for me. The second TAB keypress, typed immediately after the asterisk,
should expand the glob into $'\300' also, but instead it just erases the
asterisk, replacing it with nothing at all. If Return is pressed after the
tab, the cat is executed with no arguments and reads from the tty.

% ls -b
\300
% cat TAB$'\300' 
This is the content of a file whose name is a single non-ASCII character
% cat *
This is the content of a file whose name is a single non-ASCII character
% cat *TAB

If any files are present whose names are ASCII-only, the glob expands into a
list of those, with the non-ASCII files excluded from the list.

Previous versions of zsh, for example the one in sarge, handle this situation
better. (The $'\300' construct isn't used; instead the non-ASCII character is
passed through as-is to the terminal, which is not ideal but better than
completely refusing to expand certain filenames.)

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: powerpc (ppc)
Shell:  /bin/sh linked to /bin/dash
Kernel: Linux 2.6.20.4
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Versions of packages zsh depends on:
ii  debconf [debconf-2.0]   1.5.11   Debian configuration management sy
ii  libc6   2.3.6.ds1-13 GNU C Library: Shared libraries
ii  libncurses5 5.5-5Shared libraries for terminal hand

Versions of packages zsh recommends:
ii  libcap1   1:1.10-14  support for getting/setting POSIX.
ii  libpcre3  6.7-1  Perl 5 Compatible Regular Expressi

-- debconf-show failed


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]