On 11 September 2014, John Little <[email protected]> wrote:
> On Friday, September 12, 2014 2:55:17 AM UTC+12, Ben Fritz wrote:
>
> > I have an idea:
> >
> > If the unsorted file has "bad" characters early in the file, then
> > the early encodings in 'fileencodings' will fail quickly.
> >
> > But if the sorted file places those bad characters late in the file,
> > then the conversion may need to read most of the file before it
> > fails, repeated for possibly multiple encodings.
> 
> Yes, something like this is happening.  After
> :g/[^ -~]/move 1
> 
> The file then loads quickly.  If those 13 lines are moved to the end
> of the file the file takes nearly 3 minutes to load.

    Here's a simple experiment that shows that this is indeed what's
going on.

    In what follows ascii.txt is a 35M file of purely ASCII text:

        $ LC_ALL=C pcregrep '[^[:print:]]' ascii.txt

        $ ls -hs ascii.txt
        35M ascii.txt

    Then we add character \x83 at the beginning and at end:

        $ perl -e 'print "\x83\n"' | cat - ascii.txt >test1.txt
        $ perl -e 'print "\x83\n"' | cat ascii.txt - >test2.txt

    Opening the first file is fast, and opening the second one is slow:

        $ time LC_CTYPE=en_US.UTF-8 vim -u NONE -i NONE -N -X test1.txt -c q
        real    0m0.273s
        user    0m0.247s
        sys     0m0.025s

        $ time LC_CTYPE=en_US.UTF-8 vim -u NONE -i NONE -N -X test2.txt -c q
        real    0m1.296s
        user    0m1.256s
        sys     0m0.042s

    But LC_CTYPE to C makes opening both files a lot faster:

        $ time LC_CTYPE=C vim -u NONE -i NONE -N -X test1.txt -c q
        real    0m0.109s
        user    0m0.084s
        sys     0m0.024s

        $ time LC_CTYPE=C vim -u NONE -i NONE -N -X test2.txt -c q
        real    0m0.111s
        user    0m0.098s
        sys     0m0.013s

    The difference is fileencodings:

        $ LC_CTYPE=en_US.UTF-8 vim -u NONE -i NONE -N -X -c 'redir >out1 | echo 
&fencs | q'
        $ cat out1
        ucs-bom,utf-8,default,latin1

        $ LC_CTYPE=C vim -u NONE -i NONE -N -X -c 'redir >out2 | echo &fencs | 
q'
        $ cat out2
        ucs-bom

    And indeed, setting fileencodings to ucs-bom makes reading test2.txt
fast:

        $ time LC_CTYPE=en_US.UTF-8 vim -u NONE -i NONE -N -X -c 'set 
fencs=ucs-bom | e test2.txt | q'
        real    0m0.119s
        user    0m0.106s
        sys     0m0.013s

> However, using
> 
>     vim -u NONE ++enc=latin1 file.txt

    That's because:

        E492: Not an editor command: +enc=latin1

> or
>
>     vim -u NONE -c "set fencs=latin1" file.txt

    That's because "-c" commands are run after the file was loaded:

        $ vim -h | fgrep -w -- -c
             -c <command>         Execute <command> after loading the first file

> or setting fencs=latin1 in my .vimrc do not avoid the
> slowness. Starting vim with just -u NONE then
> 
> :e ++enc=latin1 file.txt
> 
> does.  I don't understand.

    That's because your test file contains character \x83, which is
illegal in latin1.  Try ucs-bom instead.

    /lcd

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to