Re: line2byte() returns wrong result at multi-byte characters

Дмитрий Франк Mon, 19 Dec 2011 06:16:02 -0800

This could be great if line2byte() is able to return file offset instead of
internal offset (optional flag seems like good solution).


Regards,
Dmitry.

19 декабря 2011 г. 18:04 пользователь Ingo Karkat <[email protected]>написал:

> On 19-Dec-2011 14:40, Дмитрий Франк wrote:
>
> > 19 декабря 2011 г. 17:03 пользователь Ingo Karkat <[email protected]
> > <mailto:[email protected]>>написал:
> >
> >     On 19-Dec-2011 13:35, Дмитрий Франк wrote:
> >
> >     > Citation from help: "Return the *byte count* from the start of the
> buffer for
> >     > line {lnum}"
> >     >
> >     > Returned *byte count* is wrong. It returns character count instead
> of
> >     > byte count.
> >
> >     I cannot reproduce this, neither with Vim 7.3.0 on Windows/x64, nor
> with Vim
> >     7.3.353 on Linux/x86:
> >
> >     $ vim -N -u NONE --cmd "set enc=utf-8" -c "call setline(1,
> ['foobaN', ''])" -c
> >     "2|echo line2byte('.')"
> >     8
> >     $ vim -N -u NONE --cmd "set enc=utf-8" -c "call setline(1,
> >     ['fooba'.nr2char(1049), ''])" -c "2|echo line2byte('.')"
> >     9
> >
> >     Please post your Vim version, and steps to reproduce.
> >
> >     -- regards, ingo
> >
> >     PS: Please bottom-post on vim_dev.
> >
> >
> > i use Windows, and i have to keep Vim's encoding cp1251. (standard
> encoding for
> > russian Windows)
> >
> > Vim 7.3.46 on Windows/x86
> >
> > $ vim -N -u NONE --cmd "set enc=cp1251 | set fenc=utf-8 | set ff=unix"
> -c "call
> > setline(1,['foobaN', ''])" -c "2|echo line2byte('.')"
> > 8
> > $ vim -N -u NONE --cmd "set enc=cp1251 | set fenc=utf-8 | set ff=unix"
> -c "call
> > setline(1,['fooba'.nr2char(1049), ''])" -c "2|echo line2byte('.')"
> > 8
> >
> > seems like line2byte() looks on the &encoding , but it should look on the
> > &fileencoding .
>
> Your analysis looks right, and probably doesn't surprise the devs, because
> internally Vim always uses 'encoding' to represent the buffer (and only
> converts
> to 'fileencoding' during writes).
>
> This raises the question how line2byte() (and go/:goto commands) should
> behave.
> I would side with you, using byte counts of the file, not the internal
> representation (especially because for all Unicode encodings, Vim
> internally
> uses UTF-8, so it wouldn't be possible to jump to UTF-16 / UTF-32 offsets
> even
> by setting 'encoding' to it).
>
> :help line2byte() indirectly supports this (*emphasis* mine):
> > This can also be used to get the byte count for the line just
> > below the last line: >
> >       line2byte(line("$") + 1)
> > *This is the file size plus one.*
>
> I think this issue needs at least a note in the documentation, and I wonder
> whether it's feasible to implement in the way you suggest. (For maximum
> flexibility, line2byte() could take an optional flag whether file- or
> internal-offset is wanted.)
>
> -- regards, ingo
>
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: line2byte() returns wrong result at multi-byte characters

Raspunde prin e-mail lui