Ken Takata wrote:
> 2014/11/11 Tue 5:56:06 UTC+9 Bram Moolenaar wrote:
> > Ken Takata wrote:
> >
> > > 2014/11/6 Thu 5:11:26 UTC+9 Bram Moolenaar wrote:
> > > > Yasuhiro Matsumoto wrote:
> > > >
> > > > > Bram. you seems removed this issue from todo list.
> > > > > But I'm thinking merging patch above is better than keeps current
> > > > > status.
> > > > >
> > > > > There is two problems.
> > > > >
> > > > > 1. diff.vim contains several encodings. So if DBCS is used on vim, vim
> > > > > may handle invalid-characters.
> > > > > 2. locale message of svn is encoded to system locale encoding. So it's
> > > > > not match as vim's encoding.
> > > > >
> > > > > The first of those problems will be fixed with my patch.
> > > > > To fix the second of the problems, I suggest removing syntax of
> > > > > 'diffOnly' for multi-byte encodings.
> > > >
> > > > If I remember correctly, your patch breaks recognizing diff headers if
> > > > the text does not match the current locale. E.g., when my locale is
> > > > German and I edit a diff file generated by someone in Italy, I still
> > > > expect the headers to be recognized.
> > > >
> > > > When the file's encoding differs from what Vim has detected then all
> > > > bets are off, it will be impossible to compare the text correctly.
> > > > Unless we have a regexp that works around it, it's probably very
> > > > difficult.
> > > >
> > > > What is the error that is reported when using a DBCS encoding?
> > > > A reproducible example is useful.
> > >
> > > It occurs when enc=cp932 on Cygwin/MSYS/Linux.
> > > E.g.:
> > >
> > > $ vim -u NONE -N -c "set enc=cp932" -c "syntax on" -c "set ft=diff"
> > > Error detected while processing
> > > /usr/local/share/vim/vim74/syntax/diff.vim:
> > > line 128:
> > > E401: Pattern delimiter not found: "^\\
> > > ????????????????????????????????????? ??
> > > ?? ???????
> > > E475: Invalid argument: diffNoEOL^I"^\\
> > > ????????????????????????????????????? ??
> > > ?? ???????
> > >
> > > It doesn't occur on Win32. Maybe it occurs only when libiconv is used.
> > > libiconv fails to convert the encoding of diff.vim from utf-8 to cp932,
> > > so Vim
> > > opens diff.vim without converting the encoding.
> >
> > Yes, libiconv is strict about rejecting characters it cannot convert.
> >
> > > The root cause of this problem is handling of invalid characters.
> >
> > I suppose you mean characters that are valid in utf-8 but cannot be
> > converted to cp932.
>
> No, it doesn't matter whether the characters are valid or not in utf-8. It
> only matters when the characters are invalid in cp932.
>
> The last two characters of the line 128 are <U+05d7><U+0022>, the byte
> sequence is <d7><97><22>. The character <U+05d7> cannot be converted
> to cp932,
> so Vim loads diff.vim without converting. So the line is loaded as is. Then
> the last two bytes <97><22> are handled as one character <9722> in cp932, so
> the last " disappears. But actually <9722> is not a valid character in cp932.
> This is the cause of E401.
Yes, that's what I meant. The original file is valid utf-8, but because
of the failing conversion you end up with something invalid.
> BTW, when you open diff.vim with setting the encoding explicitly
> (:e ++enc=utf-8), the character sequence <U+05d7><U+0022> is converted to
> '?"'.
> In this case, the problem doesn't occur.
>
>
> > > The last two bytes of the line 128 are 0x97 0x22 (").
> > > 0x97 can be a lead byte in cp932, but 0x22 cannot be a trail byte in
> > > cp932.
> > > However, Vim wrongly handle the byte sequence 0x97 0x22 as one character.
> > > Thus Vim cannot find the ending double quotation mark (0x22).
> > > Maybe we also need to check the trail byte (not only the lead byte), but
> > > it
> > > might be a little bit slow. BTW, I think enc=cp932 is a legacy setting
> > > (especially on Cygwin/Linux), so I don't want to make an effort to fix
> > > this.
> > >
> > > Instead of fixing Vim itself, I have two ideas to work around this
> > > problem:
> > >
> > > 1. Add a dummy ending quotation ( | ") at the end of the line 128.
> > >
> > > --- a/runtime/syntax/diff.vim
> > > +++ b/runtime/syntax/diff.vim
> > > @@ -125,7 +125,7 @@
> > > syn match diffDiffer "^הזמ הז םינוש `.*'-ו `.*' םיצבקה$"
> > > syn match diffBDiffer "^הזמ הז םינוש `.*'-ו `.*' םיירניב םיצבק$"
> > > syn match diffIsA "^.* .*-ל .* .* תוושהל ןתינ אל$"
> > > -syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח"
> > > +syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
> >
> > The quotes seem wrong here. Is it supposed to be:
> >
> > syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח| "
>
> No, it's exactly what I intended. This is a trick. | is a command separator
> and the last " is start of a comment:
>
> syn match diffNoEOL "pattern<U+05d7>" | "
> ^ start of a comment
>
> But the line is handled as the following in cp932:
>
> syn match diffNoEOL "pattern<d7><9722> | "
> ^ ending quotation
>
> Now Vim can find an ending quotation, so the error disappears.
Aha, clever.
> > Note that one can also use "." to match any character. So long as there
> > are enough characters left to avoid a false match.
>
> I don't know the output of the diff command in Hebrew, but comparing with
> other translations, the line might end with <U+05d7>, so the last "." won't
> match. ".\?" would be better.
>
> So, there are three workarounds:
>
> 1. A tricky way.
> syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
>
> 2. Exactly the same meaning as before.
> syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח\%(\)"
>
> 3. Not exactly the same, but easier.
> syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח.\?"
>
> No.3 is the best?
I was thinking of dropping the character that causes the conversion error:
syn match diffNoEOL "^\\ ץבוקה ףוסב השד.-הרוש ות רס."
Are there any other characters that can't be converted?
--
ARTHUR: Ni!
BEDEVERE: Nu!
ARTHUR: No. Ni! More like this. "Ni"!
BEDEVERE: Ni, ni, ni!
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD
/// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.