Re: [vim/vim] GVIM not reporting correct byte offsets (Issue #13731)

2023-12-20 Fir de Conversatie Gary Johnson
On 2023-12-20, zeertzjq wrote:
> And, some bytes in the file correspond to a multibyte char in latin-1 
> encoding,
> so such a byte counts as two bytes.

I didn't understand that statement at first, but now I do.  Thanks.

When Vim's 'encoding' is utf-8 and it reads a file it sees as having
a 'fileencoding' of latin1, it expands the latin1-encoded characters
into utf-8-encoded characters in the buffer.  Latin1-encoding uses
1 byte per character while UTF-8 uses 1, 2, 3 or 4 bytes per
character.  So the number of bytes in Vim's buffer may exceed the
number of bytes in the file, as it does in the OP's case.

If that's a problem, you can fix it by forcing Vim to use latin1
internally:

$ vim --cmd 'set enc=latin1 nofixeol' index_video_5_0_1.mp4

or set binary mode:

$ vim -b --cmd 'set noeol' index_video_5_0_1.mp4

Regards,
Gary

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_dev/20231220200821.GE4044%40phoenix.


Re: [vim/vim] GVIM not reporting correct byte offsets (Issue #13731)

2023-12-19 Fir de Conversatie Gary Johnson
On 2023-12-19, 3052 wrote:
> Steps to reproduce
> 
> using this file (inside, not the zip):
> 
> https://github.com/vim/vim/files/13720982/index_video_5_0_1.zip
> 
> If I open the same file in GVIM and enter /mdat, enter, g, ctrl+g I get:
> 
> Byte 2785
> 
> Expected behaviour
> 
> if I run this Go program:
> 
> package main
> 
> import (
>"bytes"
>"os"
> )
> 
> func main() {
>b, err := os.ReadFile("index_video_5_0_1.mp4")
>if err != nil {
>   panic(err)
>}
>i := bytes.Index(b, []byte("mdat"))
>println(i)
> }
> 
> I get 2578. why is Vim off by over 200 bytes?
> 
> Version of Vim
> 
> https://github.com/vim/vim-win32-installer/releases/tag/v9.0.2175
> 
> Environment
> 
> https://github.com/vim/vim-win32-installer/releases/tag/v9.0.2175

I can replicate it with vim 9.0.2130 on Ubuntu 20.04, but I can't
explain it.  Rather than use a custom program to count the bytes,
I just used hexdump.  The full output of g Ctrl-G in vim is this:

Col 551-983 of 1063-1559; Line 8 of 4914; Word 713 of 35890; Char 2579 of 
1282363; Byte 2785 of 1920490

Note that it reports "Char 2579", which should be byte 2579.

Regards,
Gary

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_dev/20231220071933.GD4044%40phoenix.