Hi,

2014/11/11 Tue 5:56:06 UTC+9 Bram Moolenaar wrote:
> Ken Takata wrote:
> 
> > 2014/11/6 Thu 5:11:26 UTC+9 Bram Moolenaar wrote:
> > > Yasuhiro Matsumoto wrote:
> > > 
> > > > Bram. you seems removed this issue from todo list.
> > > > But I'm thinking merging patch above is better than keeps current 
> > > > status.
> > > > 
> > > > There is two problems.
> > > > 
> > > > 1. diff.vim contains several encodings. So if DBCS is used on vim, vim
> > > > may handle invalid-characters.
> > > > 2. locale message of svn is encoded to system locale encoding. So it's
> > > > not match as vim's encoding.
> > > > 
> > > > The first of those problems will be fixed with my patch.
> > > > To fix the second of the problems, I suggest removing syntax of
> > > > 'diffOnly' for multi-byte encodings.
> > > 
> > > If I remember correctly, your patch breaks recognizing diff headers if
> > > the text does not match the current locale.  E.g., when my locale is
> > > German and I edit a diff file generated by someone in Italy, I still
> > > expect the headers to be recognized.
> > > 
> > > When the file's encoding differs from what Vim has detected then all
> > > bets are off, it will be impossible to compare the text correctly.
> > > Unless we have a regexp that works around it, it's probably very
> > > difficult.
> > > 
> > > What is the error that is reported when using a DBCS encoding?
> > > A reproducible example is useful.
> > 
> > It occurs when enc=cp932 on Cygwin/MSYS/Linux.
> > E.g.:
> > 
> > $ vim -u NONE -N -c "set enc=cp932" -c "syntax on" -c "set ft=diff"
> > Error detected while processing /usr/local/share/vim/vim74/syntax/diff.vim:
> > line  128:
> > E401: Pattern delimiter not found: "^\\ 
> > ????????????????????????????????????? ??
> > ?? ???????
> > E475: Invalid argument: diffNoEOL^I"^\\ 
> > ????????????????????????????????????? ??
> > ?? ???????
> > 
> > It doesn't occur on Win32. Maybe it occurs only when libiconv is used.
> > libiconv fails to convert the encoding of diff.vim from utf-8 to cp932, so 
> > Vim
> > opens diff.vim without converting the encoding. 
> 
> Yes, libiconv is strict about rejecting characters it cannot convert.
> 
> > The root cause of this problem is handling of invalid characters.
> 
> I suppose you mean characters that are valid in utf-8 but cannot be
> converted to cp932.

No, it doesn't matter whether the characters are valid or not in utf-8. It
only matters when the characters are invalid in cp932.

The last two characters of the line 128 are <U+05d7><U+0022>, the byte
sequence is <d7><97><22>.  The character <U+05d7> cannot be converted to cp932,
so Vim loads diff.vim without converting. So the line is loaded as is.  Then
the last two bytes <97><22> are handled as one character <9722> in cp932, so
the last " disappears. But actually <9722> is not a valid character in cp932.
This is the cause of E401.

BTW, when you open diff.vim with setting the encoding explicitly
(:e ++enc=utf-8), the character sequence <U+05d7><U+0022> is converted to '?"'.
In this case, the problem doesn't occur.


> > The last two bytes of the line 128 are 0x97 0x22 (").
> > 0x97 can be a lead byte in cp932, but 0x22 cannot be a trail byte in cp932.
> > However, Vim wrongly handle the byte sequence 0x97 0x22 as one character.
> > Thus Vim cannot find the ending double quotation mark (0x22).
> > Maybe we also need to check the trail byte (not only the lead byte), but it
> > might be a little bit slow.  BTW, I think enc=cp932 is a legacy setting
> > (especially on Cygwin/Linux), so I don't want to make an effort to fix this.
> > 
> > Instead of fixing Vim itself, I have two ideas to work around this problem:
> > 
> > 1. Add a dummy ending quotation ( | ") at the end of the line 128.
> > 
> > --- a/runtime/syntax/diff.vim
> > +++ b/runtime/syntax/diff.vim
> > @@ -125,7 +125,7 @@
> >  syn match diffDiffer       "^הזמ הז םינוש `.*'-ו `.*' םיצבקה$"
> >  syn match diffBDiffer      "^הזמ הז םינוש `.*'-ו `.*' םיירניב םיצבק$"
> >  syn match diffIsA  "^.* .*-ל .* .* תוושהל ןתינ אל$"
> > -syn match diffNoEOL        "^\\ ץבוקה ףוסב השדח-הרוש ות רסח"
> > +syn match diffNoEOL        "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
> 
> The quotes seem wrong here.  Is it supposed to be:
> 
>       syn match diffNoEOL     "^\\ ץבוקה ףוסב השדח-הרוש ות רסח| "

No, it's exactly what I intended. This is a trick. | is a command separator
and the last " is start of a comment:

        syn match diffNoEOL     "pattern<U+05d7>" | "
                                                    ^ start of a comment

But the line is handled as the following in cp932: 

        syn match diffNoEOL     "pattern<d7><9722> | "
                                                     ^ ending quotation

Now Vim can find an ending quotation, so the error disappears.


> Note that one can also use "." to match any character.  So long as there
> are enough characters left to avoid a false match.

I don't know the output of the diff command in Hebrew, but comparing with
other translations, the line might end with <U+05d7>, so the last "." won't
match.  ".\?" would be better.

So, there are three workarounds:

1. A tricky way.
        syn match diffNoEOL     "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "

2. Exactly the same meaning as before.
        syn match diffNoEOL     "^\\ ץבוקה ףוסב השדח-הרוש ות רסח\%(\)"

3. Not exactly the same, but easier.
        syn match diffNoEOL     "^\\ ץבוקה ףוסב השדח-הרוש ות רסח.\?"

No.3 is the best?

Regards,
Ken Takata

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui