Ken Takata wrote:

> 2014/11/11 Tue 5:56:06 UTC+9 Bram Moolenaar wrote:
> > Ken Takata wrote:
> > 
> > > 2014/11/6 Thu 5:11:26 UTC+9 Bram Moolenaar wrote:
> > > > Yasuhiro Matsumoto wrote:
> > > > 
> > > > > Bram. you seems removed this issue from todo list.
> > > > > But I'm thinking merging patch above is better than keeps current 
> > > > > status.
> > > > > 
> > > > > There is two problems.
> > > > > 
> > > > > 1. diff.vim contains several encodings. So if DBCS is used on vim, vim
> > > > > may handle invalid-characters.
> > > > > 2. locale message of svn is encoded to system locale encoding. So it's
> > > > > not match as vim's encoding.
> > > > > 
> > > > > The first of those problems will be fixed with my patch.
> > > > > To fix the second of the problems, I suggest removing syntax of
> > > > > 'diffOnly' for multi-byte encodings.
> > > > 
> > > > If I remember correctly, your patch breaks recognizing diff headers if
> > > > the text does not match the current locale.  E.g., when my locale is
> > > > German and I edit a diff file generated by someone in Italy, I still
> > > > expect the headers to be recognized.
> > > > 
> > > > When the file's encoding differs from what Vim has detected then all
> > > > bets are off, it will be impossible to compare the text correctly.
> > > > Unless we have a regexp that works around it, it's probably very
> > > > difficult.
> > > > 
> > > > What is the error that is reported when using a DBCS encoding?
> > > > A reproducible example is useful.
> > > 
> > > It occurs when enc=cp932 on Cygwin/MSYS/Linux.
> > > E.g.:
> > > 
> > > $ vim -u NONE -N -c "set enc=cp932" -c "syntax on" -c "set ft=diff"
> > > Error detected while processing 
> > > /usr/local/share/vim/vim74/syntax/diff.vim:
> > > line  128:
> > > E401: Pattern delimiter not found: "^\\ 
> > > ????????????????????????????????????? ??
> > > ?? ???????
> > > E475: Invalid argument: diffNoEOL^I"^\\ 
> > > ????????????????????????????????????? ??
> > > ?? ???????
> > > 
> > > It doesn't occur on Win32. Maybe it occurs only when libiconv is used.
> > > libiconv fails to convert the encoding of diff.vim from utf-8 to cp932, 
> > > so Vim
> > > opens diff.vim without converting the encoding. 
> > 
> > Yes, libiconv is strict about rejecting characters it cannot convert.
> > 
> > > The root cause of this problem is handling of invalid characters.
> > 
> > I suppose you mean characters that are valid in utf-8 but cannot be
> > converted to cp932.
> 
> No, it doesn't matter whether the characters are valid or not in utf-8. It
> only matters when the characters are invalid in cp932.
> 
> The last two characters of the line 128 are <U+05d7><U+0022>, the byte
> sequence is <d7><97><22>.  The character <U+05d7> cannot be converted
> to cp932,
> so Vim loads diff.vim without converting. So the line is loaded as is.  Then
> the last two bytes <97><22> are handled as one character <9722> in cp932, so
> the last " disappears. But actually <9722> is not a valid character in cp932.
> This is the cause of E401.

Yes, that's what I meant.  The original file is valid utf-8, but because
of the failing conversion you end up with something invalid.

> BTW, when you open diff.vim with setting the encoding explicitly
> (:e ++enc=utf-8), the character sequence <U+05d7><U+0022> is converted to 
> '?"'.
> In this case, the problem doesn't occur.
> 
> 
> > > The last two bytes of the line 128 are 0x97 0x22 (").
> > > 0x97 can be a lead byte in cp932, but 0x22 cannot be a trail byte in 
> > > cp932.
> > > However, Vim wrongly handle the byte sequence 0x97 0x22 as one character.
> > > Thus Vim cannot find the ending double quotation mark (0x22).
> > > Maybe we also need to check the trail byte (not only the lead byte), but 
> > > it
> > > might be a little bit slow.  BTW, I think enc=cp932 is a legacy setting
> > > (especially on Cygwin/Linux), so I don't want to make an effort to fix 
> > > this.
> > > 
> > > Instead of fixing Vim itself, I have two ideas to work around this 
> > > problem:
> > > 
> > > 1. Add a dummy ending quotation ( | ") at the end of the line 128.
> > > 
> > > --- a/runtime/syntax/diff.vim
> > > +++ b/runtime/syntax/diff.vim
> > > @@ -125,7 +125,7 @@
> > >  syn match diffDiffer     "^הזמ הז םינוש `.*'-ו `.*' םיצבקה$"
> > >  syn match diffBDiffer    "^הזמ הז םינוש `.*'-ו `.*' םיירניב םיצבק$"
> > >  syn match diffIsA        "^.* .*-ל .* .* תוושהל ןתינ אל$"
> > > -syn match diffNoEOL      "^\\ ץבוקה ףוסב השדח-הרוש ות רסח"
> > > +syn match diffNoEOL      "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
> > 
> > The quotes seem wrong here.  Is it supposed to be:
> > 
> >     syn match diffNoEOL     "^\\ ץבוקה ףוסב השדח-הרוש ות רסח| "
> 
> No, it's exactly what I intended. This is a trick. | is a command separator
> and the last " is start of a comment:
> 
>       syn match diffNoEOL     "pattern<U+05d7>" | "
>                                                   ^ start of a comment
> 
> But the line is handled as the following in cp932: 
> 
>       syn match diffNoEOL     "pattern<d7><9722> | "
>                                                    ^ ending quotation
> 
> Now Vim can find an ending quotation, so the error disappears.

Aha, clever.

> > Note that one can also use "." to match any character.  So long as there
> > are enough characters left to avoid a false match.
> 
> I don't know the output of the diff command in Hebrew, but comparing with
> other translations, the line might end with <U+05d7>, so the last "." won't
> match.  ".\?" would be better.
> 
> So, there are three workarounds:
> 
> 1. A tricky way.
>       syn match diffNoEOL     "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
> 
> 2. Exactly the same meaning as before.
>       syn match diffNoEOL     "^\\ ץבוקה ףוסב השדח-הרוש ות רסח\%(\)"
> 
> 3. Not exactly the same, but easier.
>       syn match diffNoEOL     "^\\ ץבוקה ףוסב השדח-הרוש ות רסח.\?"
> 
> No.3 is the best?

I was thinking of dropping the character that causes the conversion error:

        syn match diffNoEOL     "^\\ ץבוקה ףוסב השד.-הרוש ות רס."

Are there any other characters that can't be converted?


-- 
ARTHUR:   Ni!
BEDEVERE: Nu!
ARTHUR:   No.  Ni!  More like this. "Ni"!
BEDEVERE: Ni, ni, ni!
                 "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui