Marc Lehmann wrote:-
> On Sat, Apr 26, 2008 at 10:34:10AM +0900, Neil Booth <[EMAIL PROTECTED]>
> wrote:
>
> Thanks for digging out this issue :) One minor nitpick, however:
No problem. I encountered the same issue trying to get one of my
own apps to handle multibyte conversion properly. In doing so I
found bugs in NetBSD too. And having had the experience of requiring
a state reset on error, I realized that is probably what was missing
in rxvt-unicode when I had issues with mojibake with it. I just
grepped the source looking for these functions, and got a bit
overzealous "fixing" them as you point out.
> > @@ -2380,13 +2380,19 @@ rxvt_term::next_char () NOTHROW
> >
> > if (len == (size_t)-2)
> > {
> > + // Reset to initial conversion state from undefined
> > + mbrtowc (0, 0, 0, mbstate);
>
> This would introduce a bug (it would eat characters at the end of the
> buffer). Are you sure netbsd requires this? If yes, then netbsd is buggy.
I'm not sure about your point about a buffer, I assume this is related
to the context in which rxvt uses this function. I've not looked into
that, but you're right that we shouldn't be resetting state on a return
of -2; the characters were fine just there wasn't enough of them, and
the function remains in mid-multibyte-character state ready for the
remaining bytes. My mistake.
> > if (len == (size_t)-1)
> > + {
> > + // Reset to initial conversion state from undefined
> > + mbrtowc (0, 0, 0, mbstate);
>
> well spotted, this is a real bug! thanks for tracking it down.
>
> (gnu/linux manpages are abysmally bad)
I agree this is a bug. The C standard is not great either in this area...
> > if (l < 0)
> > + {
> > + // Reset to initial conversion state from undefined
> > + wcrtomb (0, 0, mbs);
>
>
> this is a bug, too!
Agreed.
> however, should not, in theory, in this case, the buffer returned by
> wcrtomb added to the string (wcrtomb returns a reste shift state sequence
> in this case)?
Indeed, you're right, if I understand what you're saying the attached
revised patches are probably better. I can confirm that that
rxvt-unicode with these revised patches applied also behaves
properly on NetBSD after cat-ting a file badly encoded for the current
locale.
> although, after na illegal sequence was encountered, it doesn't really
> matter in what state the conversion really is.
>
> Both fixes are applied and will be part of the next release, which will
> include a fix for the select backend used on netbsd as well... :)
That's great, thanks! Not sure what the select backend is though....
Neil.
$NetBSD$
--- src/command.C.orig 2008-04-26 14:51:29.000000000 +0900
+++ src/command.C
@@ -2386,7 +2386,11 @@ rxvt_term::next_char () NOTHROW
}
if (len == (size_t)-1)
+ {
+ // Return to well-defined initial shift state
+ mbrtowc (0, 0, 0, mbstate);
return (unsigned char)*cmdbuf_ptr++; // the _occasional_ latin1
character is allowed to slip through
+ }
// assume wchar == unicode
cmdbuf_ptr += len;
$NetBSD$
--- src/misc.C.orig 2008-04-26 14:51:36.000000000 +0900
+++ src/misc.C
@@ -39,10 +39,12 @@ rxvt_wcstombs (const wchar_t *str, int l
{
ssize_t l = wcrtomb (dst, *str++, mbs);
+ // On encoding error, return to initial shift state for both
+ // our buffer and wcrtomb's internal state.
if (l < 0)
- *dst++ = '?';
- else
- dst += l;
+ l = wcrtomb (dst, 0, mbs);
+
+ dst += l;
}
*dst++ = 0;
_______________________________________________
rxvt-unicode mailing list
[email protected]
http://lists.schmorp.de/cgi-bin/mailman/listinfo/rxvt-unicode