Hi,
2014/11/12 Wed 21:08:32 UTC+9 Bram Moolenaar wrote:
> Ken Takata wrote:
>
> > 2014/11/11 Tue 5:56:06 UTC+9 Bram Moolenaar wrote:
> > > Ken Takata wrote:
> > >
> > > > 2014/11/6 Thu 5:11:26 UTC+9 Bram Moolenaar wrote:
> > > > > Yasuhiro Matsumoto wrote:
> > > > >
> > > > > > Bram. you seems removed this issue from todo list.
> > > > > > But I'm thinking merging patch above is better than keeps current
> > > > > > status.
> > > > > >
> > > > > > There is two problems.
> > > > > >
> > > > > > 1. diff.vim contains several encodings. So if DBCS is used on vim,
> > > > > > vim
> > > > > > may handle invalid-characters.
> > > > > > 2. locale message of svn is encoded to system locale encoding. So
> > > > > > it's
> > > > > > not match as vim's encoding.
> > > > > >
> > > > > > The first of those problems will be fixed with my patch.
> > > > > > To fix the second of the problems, I suggest removing syntax of
> > > > > > 'diffOnly' for multi-byte encodings.
> > > > >
> > > > > If I remember correctly, your patch breaks recognizing diff headers if
> > > > > the text does not match the current locale. E.g., when my locale is
> > > > > German and I edit a diff file generated by someone in Italy, I still
> > > > > expect the headers to be recognized.
> > > > >
> > > > > When the file's encoding differs from what Vim has detected then all
> > > > > bets are off, it will be impossible to compare the text correctly.
> > > > > Unless we have a regexp that works around it, it's probably very
> > > > > difficult.
> > > > >
> > > > > What is the error that is reported when using a DBCS encoding?
> > > > > A reproducible example is useful.
> > > >
> > > > It occurs when enc=cp932 on Cygwin/MSYS/Linux.
> > > > E.g.:
> > > >
> > > > $ vim -u NONE -N -c "set enc=cp932" -c "syntax on" -c "set ft=diff"
> > > > Error detected while processing
> > > > /usr/local/share/vim/vim74/syntax/diff.vim:
> > > > line 128:
> > > > E401: Pattern delimiter not found: "^\\
> > > > ????????????????????????????????????? ??
> > > > ?? ???????
> > > > E475: Invalid argument: diffNoEOL^I"^\\
> > > > ????????????????????????????????????? ??
> > > > ?? ???????
> > > >
> > > > It doesn't occur on Win32. Maybe it occurs only when libiconv is used.
> > > > libiconv fails to convert the encoding of diff.vim from utf-8 to cp932,
> > > > so Vim
> > > > opens diff.vim without converting the encoding.
> > >
> > > Yes, libiconv is strict about rejecting characters it cannot convert.
> > >
> > > > The root cause of this problem is handling of invalid characters.
> > >
> > > I suppose you mean characters that are valid in utf-8 but cannot be
> > > converted to cp932.
> >
> > No, it doesn't matter whether the characters are valid or not in utf-8. It
> > only matters when the characters are invalid in cp932.
> >
> > The last two characters of the line 128 are <U+05d7><U+0022>, the byte
> > sequence is <d7><97><22>. The character <U+05d7> cannot be converted
> > to cp932,
> > so Vim loads diff.vim without converting. So the line is loaded as is. Then
> > the last two bytes <97><22> are handled as one character <9722> in cp932, so
> > the last " disappears. But actually <9722> is not a valid character in
> > cp932.
> > This is the cause of E401.
>
> Yes, that's what I meant. The original file is valid utf-8, but because
> of the failing conversion you end up with something invalid.
>
> > BTW, when you open diff.vim with setting the encoding explicitly
> > (:e ++enc=utf-8), the character sequence <U+05d7><U+0022> is converted to
> > '?"'.
> > In this case, the problem doesn't occur.
> >
> >
> > > > The last two bytes of the line 128 are 0x97 0x22 (").
> > > > 0x97 can be a lead byte in cp932, but 0x22 cannot be a trail byte in
> > > > cp932.
> > > > However, Vim wrongly handle the byte sequence 0x97 0x22 as one
> > > > character.
> > > > Thus Vim cannot find the ending double quotation mark (0x22).
> > > > Maybe we also need to check the trail byte (not only the lead byte),
> > > > but it
> > > > might be a little bit slow. BTW, I think enc=cp932 is a legacy setting
> > > > (especially on Cygwin/Linux), so I don't want to make an effort to fix
> > > > this.
> > > >
> > > > Instead of fixing Vim itself, I have two ideas to work around this
> > > > problem:
> > > >
> > > > 1. Add a dummy ending quotation ( | ") at the end of the line 128.
> > > >
> > > > --- a/runtime/syntax/diff.vim
> > > > +++ b/runtime/syntax/diff.vim
> > > > @@ -125,7 +125,7 @@
> > > > syn match diffDiffer "^הזמ הז םינוש `.*'-ו `.*' םיצבקה$"
> > > > syn match diffBDiffer "^הזמ הז םינוש `.*'-ו `.*' םיירניב םיצבק$"
> > > > syn match diffIsA "^.* .*-ל .* .* תוושהל ןתינ אל$"
> > > > -syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח"
> > > > +syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
> > >
> > > The quotes seem wrong here. Is it supposed to be:
> > >
> > > syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח| "
> >
> > No, it's exactly what I intended. This is a trick. | is a command separator
> > and the last " is start of a comment:
> >
> > syn match diffNoEOL "pattern<U+05d7>" | "
> > ^ start of a comment
> >
> > But the line is handled as the following in cp932:
> >
> > syn match diffNoEOL "pattern<d7><9722> | "
> > ^ ending quotation
> >
> > Now Vim can find an ending quotation, so the error disappears.
>
> Aha, clever.
>
> > > Note that one can also use "." to match any character. So long as there
> > > are enough characters left to avoid a false match.
> >
> > I don't know the output of the diff command in Hebrew, but comparing with
> > other translations, the line might end with <U+05d7>, so the last "." won't
> > match. ".\?" would be better.
> >
> > So, there are three workarounds:
> >
> > 1. A tricky way.
> > syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח" | "
> >
> > 2. Exactly the same meaning as before.
> > syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח\%(\)"
> >
> > 3. Not exactly the same, but easier.
> > syn match diffNoEOL "^\\ ץבוקה ףוסב השדח-הרוש ות רסח.\?"
> >
> > No.3 is the best?
>
> I was thinking of dropping the character that causes the conversion error:
>
> syn match diffNoEOL "^\\ ץבוקה ףוסב השד.-הרוש ות רס."
Ah, now I understand. Seems good.
> Are there any other characters that can't be converted?
There are still many other characters that can't be converted, but
there are no other characters that cause error. So this fix is enough.
Regards,
Ken Takata
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.