Hi,
2014/11/11 Tue 5:56:06 UTC+9 Bram Moolenaar wrote:
> Ken Takata wrote:
>
> > 2014/11/6 Thu 5:11:26 UTC+9 Bram Moolenaar wrote:
> > > Yasuhiro Matsumoto wrote:
> > >
> > > > Bram. you seems removed this issue from todo list.
> > > > But I'm thinking merging patch above is better than keeps current
> > > > status.
> > > >
> > > > There is two problems.
> > > >
> > > > 1. diff.vim contains several encodings. So if DBCS is used on vim, vim
> > > > may handle invalid-characters.
> > > > 2. locale message of svn is encoded to system locale encoding. So it's
> > > > not match as vim's encoding.
> > > >
> > > > The first of those problems will be fixed with my patch.
> > > > To fix the second of the problems, I suggest removing syntax of
> > > > 'diffOnly' for multi-byte encodings.
> > >
> > > If I remember correctly, your patch breaks recognizing diff headers if
> > > the text does not match the current locale. E.g., when my locale is
> > > German and I edit a diff file generated by someone in Italy, I still
> > > expect the headers to be recognized.
> > >
> > > When the file's encoding differs from what Vim has detected then all
> > > bets are off, it will be impossible to compare the text correctly.
> > > Unless we have a regexp that works around it, it's probably very
> > > difficult.
> > >
> > > What is the error that is reported when using a DBCS encoding?
> > > A reproducible example is useful.
> >
> > It occurs when enc=cp932 on Cygwin/MSYS/Linux.
> > E.g.:
> >
> > $ vim -u NONE -N -c "set enc=cp932" -c "syntax on" -c "set ft=diff"
> > Error detected while processing /usr/local/share/vim/vim74/syntax/diff.vim:
> > line 128:
> > E401: Pattern delimiter not found: "^\\
> > ????????????????????????????????????? ??
> > ?? ???????
> > E475: Invalid argument: diffNoEOL^I"^\\
> > ????????????????????????????????????? ??
> > ?? ???????
> >
> > It doesn't occur on Win32. Maybe it occurs only when libiconv is used.
> > libiconv fails to convert the encoding of diff.vim from utf-8 to cp932, so
> > Vim
> > opens diff.vim without converting the encoding.
>
> Yes, libiconv is strict about rejecting characters it cannot convert.
>
> > The root cause of this problem is handling of invalid characters.
>
> I suppose you mean characters that are valid in utf-8 but cannot be
> converted to cp932.
No, it doesn't matter whether the characters are valid or not in utf-8. It
only matters when the characters are invalid in cp932.
The last two characters of the line 128 are <U+05d7><U+0022>, the byte
sequence is <d7><97><22>. The character <U+05d7> cannot be converted to cp932,
so Vim loads diff.vim without converting. So the line is loaded as is. Then
the last two bytes <97><22> are handled as one character <9722> in cp932, so
the last " disappears. But actually <9722> is not a valid character in cp932.
This is the cause of E401.
BTW, when you open diff.vim with setting the encoding explicitly
(:e ++enc=utf-8), the character sequence <U+05d7><U+0022> is converted to '?"'.
In this case, the problem doesn't occur.
> > The last two bytes of the line 128 are 0x97 0x22 (").
> > 0x97 can be a lead byte in cp932, but 0x22 cannot be a trail byte in cp932.
> > However, Vim wrongly handle the byte sequence 0x97 0x22 as one character.
> > Thus Vim cannot find the ending double quotation mark (0x22).
> > Maybe we also need to check the trail byte (not only the lead byte), but it
> > might be a little bit slow. BTW, I think enc=cp932 is a legacy setting
> > (especially on Cygwin/Linux), so I don't want to make an effort to fix this.
> >
> > Instead of fixing Vim itself, I have two ideas to work around this problem:
> >
> > 1. Add a dummy ending quotation ( | ") at the end of the line 128.
> >
> > --- a/runtime/syntax/diff.vim
> > +++ b/runtime/syntax/diff.vim
> > @@ -125,7 +125,7 @@
> > syn match diffDiffer "^הזמ הז םינוש `.*'-ו `.*' םיצבקה$"
> > syn match diffBDiffer "^הזמ הז םינוש `.*'-ו `.*' םיירניב םיצבק$"
> > syn match diffIsA "^.* .*-ל .* .* תוושהל ןתינ אל$"
> > -syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח"
> > +syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
>
> The quotes seem wrong here. Is it supposed to be:
>
> syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח| "
No, it's exactly what I intended. This is a trick. | is a command separator
and the last " is start of a comment:
syn match diffNoEOL "pattern<U+05d7>" | "
^ start of a comment
But the line is handled as the following in cp932:
syn match diffNoEOL "pattern<d7><9722> | "
^ ending quotation
Now Vim can find an ending quotation, so the error disappears.
> Note that one can also use "." to match any character. So long as there
> are enough characters left to avoid a false match.
I don't know the output of the diff command in Hebrew, but comparing with
other translations, the line might end with <U+05d7>, so the last "." won't
match. ".\?" would be better.
So, there are three workarounds:
1. A tricky way.
syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
2. Exactly the same meaning as before.
syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח\%(\)"
3. Not exactly the same, but easier.
syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח.\?"
No.3 is the best?
Regards,
Ken Takata
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.