Re: A performance question (patch included)

2007-05-26 Thread Yakov Lerner

On 5/25/07, Charles E Campbell Jr [EMAIL PROTECTED] wrote:

John Beckett wrote:

 A.J.Mechelynck wrote:

 What about a different function to return, say, the number of
 1K blocks (or the number of times 2^n bytes, with a parameter
 passed to the function) that a file uses?


 Yes, that's a much more general and better idea.

 Since there's probably not much need for this, I think that
 simplicity would be good. That is, have the function work in a
 fixed way with no options.

 Re Dr.Chip's LargeFile script: It occurs to me that another
 workaround would be to use system() to capture the output of
 'ls -l file' or 'dir file' (need an option for which).

 Then do some funky editing to calculate the number of digits in
 the file length. If more than 9, treat file as large.

 I'm playing with a tiny utility to help the LargeFile script.
 Bluesky: Its code (64-bit file size) could potentially be
 incorporated in Vim. I'll post results in vim-dev.


(I've moved this over to vim-dev)

I've attached a patch to vim 7.1 which extends getfsize(); with the
patch, getfsize() takes an optional
second parameter which gives one the ability to specify a unitsize.
In other words,

getfsize(eval.c)  - 478347 (after the patch)

getfsize(eval.c,1000)  - 479   (truncated upwards)

I'll be awaiting Bram's input before making use of this in LargeFile.vim !

Regards,
Chip Campbell




*** src/o_eval.c2007-05-25 08:52:12.0 -0400
--- src/eval.c  2007-05-25 09:04:43.0 -0400
***
*** 7094,7100 
  {getcwd,0, 0, f_getcwd},
  {getfontname,   0, 1, f_getfontname},
  {getfperm,  1, 1, f_getfperm},
! {getfsize,  1, 1, f_getfsize},
  {getftime,  1, 1, f_getftime},
  {getftype,  1, 1, f_getftype},
  {getline,   1, 2, f_getline},
--- 7094,7100 
  {getcwd,0, 0, f_getcwd},
  {getfontname,   0, 1, f_getfontname},
  {getfperm,  1, 1, f_getfperm},
! {getfsize,  1, 2, f_getfsize},
  {getftime,  1, 1, f_getftime},
  {getftype,  1, 1, f_getftype},
  {getline,   1, 2, f_getline},
***
*** 10135,10142 
  {
if (mch_isdir(fname))
rettv-vval.v_number = 0;
!   else
rettv-vval.v_number = (varnumber_T)st.st_size;
  }
  else
  rettv-vval.v_number = -1;
--- 10135,10151 
  {
if (mch_isdir(fname))
rettv-vval.v_number = 0;
!   else if (argvars[1].v_type == VAR_UNKNOWN)
rettv-vval.v_number = (varnumber_T)st.st_size;
+   else
+   {
+   unsigned long unitsize;
+   unsigned long stsize;
+   unitsize= get_tv_number(argvars[1]);
+   stsize= st.st_size/unitsize;
+   if(stsize*unitsize  st.st_size) ++stsize;
+   rettv-vval.v_number = (varnumber_T) stsize;
+   }
  }
  else
  rettv-vval.v_number = -1;
*** runtime/doc/o_eval.txt  2007-05-25 09:00:08.0 -0400
--- runtime/doc/eval.txt2007-05-25 09:06:19.0 -0400
***
*** 1615,1621 
  getcmdtype()  String  return the current command-line type
  getcwd()  String  the current working directory
  getfperm( {fname})String  file permissions of file {fname}
! getfsize( {fname})Number  size in bytes of file {fname}
  getfontname( [{name}])String  name of font being used
  getftime( {fname})Number  last modification time of file
  getftype( {fname})String  description of type of file {fname}
--- 1615,1621 
  getcmdtype()  String  return the current command-line type
  getcwd()  String  the current working directory
  getfperm( {fname})String  file permissions of file {fname}
! getfsize( {fname} [,unitsize])Number  size in bytes of file {fname}
  getfontname( [{name}])String  name of font being used
  getftime( {fname})Number  last modification time of file
  getftype( {fname})String  description of type of file {fname}
***
*** 2819,2827 
  getcwd()  The result is a String, which is the name of the current
working directory.

! getfsize({fname}) *getfsize()*
The result is a Number, which is the size in bytes of the
given file {fname}.
If {fname} is a directory, 0 is returned.
If the file {fname} can't be found, -1 is returned.

--- 2819,2829 
  getcwd()  The result is a String, which is the name of the current
working directory.

! getfsize({fname} [,unitsize]) *getfsize()*
The result is a Number, which is the size in bytes of the
given file {fname}.
+   If unitsize is given, then the file {fname}'s 

Re: A performance question (patch included)

2007-05-25 Thread Charles E Campbell Jr

John Beckett wrote:


A.J.Mechelynck wrote:


What about a different function to return, say, the number of
1K blocks (or the number of times 2^n bytes, with a parameter
passed to the function) that a file uses?



Yes, that's a much more general and better idea.

Since there's probably not much need for this, I think that
simplicity would be good. That is, have the function work in a
fixed way with no options.

Re Dr.Chip's LargeFile script: It occurs to me that another
workaround would be to use system() to capture the output of
'ls -l file' or 'dir file' (need an option for which).

Then do some funky editing to calculate the number of digits in
the file length. If more than 9, treat file as large.

I'm playing with a tiny utility to help the LargeFile script.
Bluesky: Its code (64-bit file size) could potentially be
incorporated in Vim. I'll post results in vim-dev.



(I've moved this over to vim-dev)

I've attached a patch to vim 7.1 which extends getfsize(); with the 
patch, getfsize() takes an optional
second parameter which gives one the ability to specify a unitsize.  
In other words,


getfsize(eval.c)  - 478347 (after the patch)

getfsize(eval.c,1000)  - 479   (truncated upwards)

I'll be awaiting Bram's input before making use of this in LargeFile.vim !

Regards,
Chip Campbell



*** src/o_eval.c2007-05-25 08:52:12.0 -0400
--- src/eval.c  2007-05-25 09:04:43.0 -0400
***
*** 7094,7100 
  {getcwd,0, 0, f_getcwd},
  {getfontname,   0, 1, f_getfontname},
  {getfperm,  1, 1, f_getfperm},
! {getfsize,  1, 1, f_getfsize},
  {getftime,  1, 1, f_getftime},
  {getftype,  1, 1, f_getftype},
  {getline,   1, 2, f_getline},
--- 7094,7100 
  {getcwd,0, 0, f_getcwd},
  {getfontname,   0, 1, f_getfontname},
  {getfperm,  1, 1, f_getfperm},
! {getfsize,  1, 2, f_getfsize},
  {getftime,  1, 1, f_getftime},
  {getftype,  1, 1, f_getftype},
  {getline,   1, 2, f_getline},
***
*** 10135,10142 
  {
if (mch_isdir(fname))
rettv-vval.v_number = 0;
!   else
rettv-vval.v_number = (varnumber_T)st.st_size;
  }
  else
  rettv-vval.v_number = -1;
--- 10135,10151 
  {
if (mch_isdir(fname))
rettv-vval.v_number = 0;
!   else if (argvars[1].v_type == VAR_UNKNOWN)
rettv-vval.v_number = (varnumber_T)st.st_size;
+   else
+   {
+   unsigned long unitsize;
+   unsigned long stsize;
+   unitsize= get_tv_number(argvars[1]);
+   stsize= st.st_size/unitsize;
+   if(stsize*unitsize  st.st_size) ++stsize;
+   rettv-vval.v_number = (varnumber_T) stsize;
+   }
  }
  else
  rettv-vval.v_number = -1;
*** runtime/doc/o_eval.txt  2007-05-25 09:00:08.0 -0400
--- runtime/doc/eval.txt2007-05-25 09:06:19.0 -0400
***
*** 1615,1621 
  getcmdtype()  String  return the current command-line type
  getcwd()  String  the current working directory
  getfperm( {fname})String  file permissions of file {fname}
! getfsize( {fname})Number  size in bytes of file {fname}
  getfontname( [{name}])String  name of font being used
  getftime( {fname})Number  last modification time of file
  getftype( {fname})String  description of type of file {fname}
--- 1615,1621 
  getcmdtype()  String  return the current command-line type
  getcwd()  String  the current working directory
  getfperm( {fname})String  file permissions of file {fname}
! getfsize( {fname} [,unitsize])Number  size in bytes of file {fname}
  getfontname( [{name}])String  name of font being used
  getftime( {fname})Number  last modification time of file
  getftype( {fname})String  description of type of file {fname}
***
*** 2819,2827 
  getcwd()  The result is a String, which is the name of the current
working directory.
  
! getfsize({fname}) *getfsize()*
The result is a Number, which is the size in bytes of the
given file {fname}.
If {fname} is a directory, 0 is returned.
If the file {fname} can't be found, -1 is returned.
  
--- 2819,2829 
  getcwd()  The result is a String, which is the name of the current
working directory.
  
! getfsize({fname} [,unitsize]) *getfsize()*
The result is a Number, which is the size in bytes of the
given file {fname}.
+   If unitsize is given, then the file {fname}'s size will be
+   returned in units of size unitsize 

Re: A performance question (patch included)

2007-05-25 Thread Charles E Campbell Jr

A.J.Mechelynck wrote:

I'm not sure what varnumber_T means: will st.stsize (the dividend) be 
wide enough to avoid losing bits on the left?


varnumber_T is int (long if an sizeof(int) = 3).

st.stsize 's size depends on whether 32bit or 64bit integers are available.

So, its possible to lose bits: pick a small enough unitsize and a large 
enough file, st.stsize will end up not being able
to fit into a varnumber_T.  After all, unitsize could be 1, and 
getfsize() will behave no differently than it does now.
However, unitsize could be 100.  My patch divides st.stsize by the 
unitsize first; presumably in whatever

arithmetic is appropriate for working with st.stsize.

Regards,
Chip Campbell



Re: A performance question (patch included)

2007-05-25 Thread Charles E Campbell Jr

A.J.Mechelynck wrote:

Yes, yes, but before the division, will it be able to hold the file 
size? (sorry, I meant st.st_size) Will mch_stat (at line 10134, one 
line before the context of your patch) be able to return huge file 
sizes?


mch_stat is variously defined, depending on o/s.
Under unix, that's the fstat function.
This function returns a pointer to a struct stat; the member in question 
is: st_size.

(off_t st_size;/* total size, in bytes */)

So, st_size is an off_t.

Under linux, an off_t is  typedef __kernel_off_toff_t

So, I suspect that st_size will be sized by the o/s to handle whatever 
size files it can handle.

Someone with a 64-bit machine, perhaps, could examine this further?

BTW, I'm also under the impression that ls itself uses fstat(), so its 
not likely to be any

more informative.

Regards,
Chip Campbell



Re: A performance question (patch included)

2007-05-25 Thread A.J.Mechelynck

Charles E Campbell Jr wrote:

A.J.Mechelynck wrote:

I'm not sure what varnumber_T means: will st.stsize (the dividend) be 
wide enough to avoid losing bits on the left?


varnumber_T is int (long if an sizeof(int) = 3).

st.stsize 's size depends on whether 32bit or 64bit integers are available.

So, its possible to lose bits: pick a small enough unitsize and a large 
enough file, st.stsize will end up not being able
to fit into a varnumber_T.  After all, unitsize could be 1, and 
getfsize() will behave no differently than it does now.
However, unitsize could be 100.  My patch divides st.stsize by the 
unitsize first; presumably in whatever

arithmetic is appropriate for working with st.stsize.

Regards,
Chip Campbell



Yes, yes, but before the division, will it be able to hold the file size? 
(sorry, I meant st.st_size) Will mch_stat (at line 10134, one line before the 
context of your patch) be able to return huge file sizes?



Best regards,
Tony.
--
Real Programmers don't play tennis, or any other sport that requires
you to change clothes.  Mountain climbing is OK, and real programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the machine room.


Re: A performance question (patch included)

2007-05-25 Thread A.J.Mechelynck

Yakov Lerner wrote:
[...]
stat() on Linux has 32-bit st_size field (off_t is 32-bit). There is 
stat64()

syscall which uses 'struct stat64' structure where st_size is 64-bit. By
defining __USE_LARGEFILE64 at compile-time, stat() is redirected to
stat64(). I don't know whether default Linux vim build defines
__USE_LARGEFILE64 or not.

Yakov



:version says:

[...]
Compilation: gcc -c -I. -Iproto -DHAVE_CONFIG_H -DFEAT_GUI_GTK 
-I/usr/include/cairo -I/usr/include/freetype2 -I/usr/include/libpng12 
-I/opt/gnome/include/gtk-2.0 -I/opt/gnome/lib/gtk-2.0/include 
-I/opt/gnome/include/atk-1.0 -I/opt/gnome/include/pango-1.0 
-I/opt/gnome/include/glib-2.0 -I/opt/gnome/lib/glib-2.0/include   -DORBIT2=1 
-pthread -I/usr/include/libart-2.0 -I/usr/include/cairo 
-I/usr/include/freetype2 -I/usr/include/libpng12 -I/usr/include/libxml2 
-I/opt/gnome/include/libgnomeui-2.0 -I/opt/gnome/include/libgnome-2.0 
-I/opt/gnome/include/libgnomecanvas-2.0 -I/opt/gnome/include/gtk-2.0 
-I/opt/gnome/include/gconf/2 -I/opt/gnome/include/libbonoboui-2.0 
-I/opt/gnome/include/gnome-vfs-2.0 -I/opt/gnome/lib/gnome-vfs-2.0/include 
-I/opt/gnome/include/gnome-keyring-1 -I/opt/gnome/include/glib-2.0 
-I/opt/gnome/lib/glib-2.0/include -I/opt/gnome/include/orbit-2.0 
-I/opt/gnome/include/libbonobo-2.0 -I/opt/gnome/include/bonobo-activation-2.0 
-I/opt/gnome/include/pango-1.0 -I/opt/gnome/lib/gtk-2.0/include 
-I/opt/gnome/include/atk-1.0 -O2 -fno-strength-reduce -Wall 
-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING 
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 
-I/usr/lib/perl5/5.8.8/i586-linux-thread-multi/CORE  -I/usr/include/python2.5 
-pthread -I/usr/include  -D_LARGEFILE64_SOURCE=1  -I/usr/lib/ruby/1.8/i586-linux

Linking: [...]

so, maybe we'll have to check what happens when _LARGEFILE64_SOURCE is 
defined? I don't find a match in src/ or src/auto/.



Best regards,
Tony.
--
If all these sweet young things were laid end-to-end, I wouldn't be a
bit surprised.
-- Dorothy Parker


Re: A performance question (utility included)

2007-05-25 Thread John Beckett

Charles E Campbell Jr wrote:

I've attached a patch to vim 7.1 which extends getfsize()


As I've mentioned, I think further testing will be needed before
patching Vim for 64-bit file lengths.

Here is a possible interim workaround to allow Dr.Chip's
LargeFile.vim script to accurately detect large files on many
platforms.

Attached is a tiny C program to build a tool called filemeg.
 Example usage:  filemeg /path/to/file
 gives output:   42

which means that the specified file is 42 megabytes (actually,
any value from 42 to nearly 43).

I was going to work out how to adapt the LargeFile script to use
this tool, if the user sets an option to invoke it. But it's
taking too long because I don't know enough about Vim, so I'm
just presenting the tool at this stage.

People may like to check how filemeg works on various systems
and report back. I have tried it on files over 4GB on Fedora
Core 6 and Windows XP (x86-32 platform).

Putting something like this inside Vim would be a bit of a
nightmare IMHO because of the extraordinary range of supported
compilers, operating systems and hardware.

Adapting LargeFile.vim to work with filemeg:
- Compile the source and test running at command line.
- Put the executable in your path (better: in a Vim
 directory which the script could invoke somehow).
- Set a new script option to use filemeg.
- The script BufReadPre would call a new script function.
- That function would check the file size with:
   let result = system('filemeg /path/to/file')
- If result is a number, it is the file size in megabytes.
- Otherwise, result is Error... and the script should
 treat the file as large (or maybe not...).

I've attached the C source, and included it below for those who
don't mind a little wrapping.

John

/* Output length of specified file in megabytes.
* John Beckett 2007/05/25
* For Linux with LFS (large file support), and Win32.
*
* Output is suitable for reading by a script.
* Output is always one line.
* If any problem occurs, line starts with Error.
* Otherwise, line is the size of the specified file in megabytes.
* Size is a truncated integer (file of 3.9MB would give result 3).
* The size won't overflow a 32-bit signed integer (Error if it does).
* If argument is a directory, result is 0 (done by stat64()).
*/

#if defined(__linux)
# define _LARGEFILE64_SOURCE
#elif defined(_WIN32)
# define off64_t__int64
# define stat64 _stati64
#endif

#include stdio.h
#include sys/types.h
#include sys/stat.h

int main(int argc, char *argv[])
{
   off64_t size, overflowmask;
   struct stat64 sb;
   if ( argc != 2 ) {
   puts(Error: Need path of file to report its size in megabytes.);
   return 1;
   }
   if ( stat64(argv[1], sb) != 0 ) {
   puts(Error: Could not get file information.);
   return 1;
   }
   size = sb.st_size  20;  /* 2^20 = 1 meg */
   overflowmask = 0x7fff;/* ensure 64-bit calculation */
   if ( (size  ~overflowmask) != 0 ) {
   puts(Error: File size in megabytes overflows 32-bit signed 
integer.);

   return 1;
   }
   printf(%d\n, (int)size);
   return 0;
}


filemeg.c
Description: Binary data


Re: A performance question (patch included)

2007-05-25 Thread John Beckett

Charles E Campbell Jr wrote:

I'm also under the impression that ls itself uses fstat(),
so its not likely to be any more informative.


That's likely on some systems, but 'ls -l' gives correct results
for files over 4GB on Fedora Core 6 using x86-32.

John



Re: A performance question

2007-05-25 Thread Yakov Lerner

On 5/25/07, John Beckett [EMAIL PROTECTED] wrote:

A.J.Mechelynck wrote:
 What about a different function to return, say, the number of
 1K blocks (or the number of times 2^n bytes, with a parameter
 passed to the function) that a file uses?

Yes, that's a much more general and better idea.

Since there's probably not much need for this, I think that
simplicity would be good. That is, have the function work in a
fixed way with no options.

Re Dr.Chip's LargeFile script: It occurs to me that another
workaround would be to use system() to capture the output of
'ls -l file' or 'dir file' (need an option for which).

Then do some funky editing to calculate the number of digits in
the file length. If more than 9, treat file as large.


9-digit number can still be larger than 2^32-1, or than 2^31-1. It's
possible to compare large numbers safely with the following method:
(1) right-align them to fixed with (say, to width 20 to be on the
safe side), then
(2) prepend non-ditit  character on the left to prevent vim from terating them
 as string and getting numeric overflow, then
(3) compare them as strings.
Like this:

bignum1=987654321
bignum2=876543210
x=printf(x%20s, bignum1)  do not use %d to avoid overflow, use %s
y=printf(x%20s, bignum2)  do not use %d to avoid overflow, use %s
if x  y
  bignum1 is bigger than bignum2
endif

Yakov


Re: A performance question

2007-05-25 Thread Yakov Lerner

On 5/25/07, Yongwei Wu [EMAIL PROTECTED] wrote:

On 24/05/07, Robert M Robinson [EMAIL PROTECTED] wrote:

 On Wed, 23 May 2007, fREW wrote:
 |Someone recently was emailing the list about looking at a small
 |section of DNA with vim as text and it was a number of gigs.  I think
 |he ended up using other unix tools (sed and grep I think), but
 |nontheless, text files can be big too ;-)
 |
 |-fREW
 |

 A maxim that comes up here is A lack of imagination doesn't prove anything.
 The fact that Condoleeza Rice couldn't imagine the degree of chaos that would
 ensue if we invaded Iraq does not prove that Iraq is not currently in chaos!

 I use vim for _structured_ text files, largely because regular expression
 search is much more useful than word search when the text is structured.
 Whether those files are large or not usually depends on whether I'm editing
 programs (small) or viewing/editing their output (often quite large).  Emacs
 also provides regular expression search, but I find vim's commands simpler
 and easier to type--and therefore faster to use.

I do not understand your statements: what's your problem of using
regular expressions in grep and sed?


I think Robert implied that it takes lot of imagination
to use vim on multi-gigabyte size. I might be wrong.

I don't exactly understand the connection size of one's
imagination and size of the file on which one applies vim.
But the connection is perfectly possible. For example, I never tried to
run vim on anything bigger than 0.5GB and I do indeed have
average or lesser than average imagination.

Hell starting tomorrow, I am going to vim the 2+0.2*day_count sized
files, every day,
It only remains to buy imagine-o-meter, and apply it daily.

Yakov average-sized imagination Lerner


Re: A performance question

2007-05-25 Thread fREW

On 5/25/07, Yakov Lerner [EMAIL PROTECTED] wrote:

On 5/25/07, Yongwei Wu [EMAIL PROTECTED] wrote:
 On 24/05/07, Robert M Robinson [EMAIL PROTECTED] wrote:
 
  On Wed, 23 May 2007, fREW wrote:
  |Someone recently was emailing the list about looking at a small
  |section of DNA with vim as text and it was a number of gigs.  I think
  |he ended up using other unix tools (sed and grep I think), but
  |nontheless, text files can be big too ;-)
  |
  |-fREW
  |
 
  A maxim that comes up here is A lack of imagination doesn't prove 
anything.
  The fact that Condoleeza Rice couldn't imagine the degree of chaos that 
would
  ensue if we invaded Iraq does not prove that Iraq is not currently in chaos!
 
  I use vim for _structured_ text files, largely because regular expression
  search is much more useful than word search when the text is structured.
  Whether those files are large or not usually depends on whether I'm editing
  programs (small) or viewing/editing their output (often quite large).  Emacs
  also provides regular expression search, but I find vim's commands simpler
  and easier to type--and therefore faster to use.

 I do not understand your statements: what's your problem of using
 regular expressions in grep and sed?

I think Robert implied that it takes lot of imagination
to use vim on multi-gigabyte size. I might be wrong.

I don't exactly understand the connection size of one's
imagination and size of the file on which one applies vim.
But the connection is perfectly possible. For example, I never tried to
run vim on anything bigger than 0.5GB and I do indeed have
average or lesser than average imagination.

Hell starting tomorrow, I am going to vim the 2+0.2*day_count sized
files, every day,
It only remains to buy imagine-o-meter, and apply it daily.

Yakov average-sized imagination Lerner



You should use that as your standard sig from now on.  Awesome.

-fREW


Re: A performance question

2007-05-25 Thread Robert M Robinson


No, I implied vim has more uses than any one person could possibly imagine.

I also meant any question like Why would anyone want ...? really just
means I can't imagine wanting , so if that isn't what you meant to
say you might want to rephrase your question.  I would ask why anyone
would want to say they had a limited imagination, but if I did I'd be
doing it myself!

If anyone took offense, my apologies; I meant it as a wry observation on
how people in general use language, not as anything personal.

Max

On Fri, 25 May 2007, Yakov Lerner wrote:

|I think Robert implied that it takes lot of imagination
|to use vim on multi-gigabyte size. I might be wrong.
|
|I don't exactly understand the connection size of one's
|imagination and size of the file on which one applies vim.
|But the connection is perfectly possible. For example, I never tried to
|run vim on anything bigger than 0.5GB and I do indeed have
|average or lesser than average imagination.
|
|Hell starting tomorrow, I am going to vim the 2+0.2*day_count sized
|files, every day,
|It only remains to buy imagine-o-meter, and apply it daily.
|
|Yakov average-sized imagination Lerner
|





Re: A performance question

2007-05-25 Thread John Beckett

Yakov Lerner wrote:

9-digit number can still be larger than 2^32-1, or than
2^31-1.


Just for the record:
 2^30 = 1,073,741,824

So 999,999,999 (largest 9-digit number) won't overflow a 32-bit
signed integer.

John



Re: A performance question

2007-05-24 Thread John Beckett

Charles E Campbell Jr wrote:

Sounds like the filesize is getting stored in a 32bit signed
number, and overflowing.


Yes, definitely.


Please let me know what getfsize() is actually returning


The return value is the bit pattern for the low 32 bits of the
true 64-bit file size:
3,146,839,490 file actual size
   -1,148,127,806 returned by getfsize()

The sum of the absolute values is exactly 4G. I confirmed the
above with a file exactly 8G + 512 bytes (getfsize() said it was
512 bytes).

I was going to suggest that you treat a negative getfsize()
value as a large file (may as well do that even if the value is
-1 for an error indication).

I suppose that would be useful until some better solution is
implemented. I didn't propose it because it would be quite ugly
for the script to give results like this:
   3GB file is large
   5GB file is not large
   7GB file is large
   9GB file is not large

Another ugly (but accurate) workaround would be:
- Provide the source for a small executable called islargefile
 to do the calculation.
- Provide an option in the script to use the executable.
- Have the script execute system('islargefile xxx 123456').
 Return value 0 means no, 1 means yes, 2 means error
 (return is first character of system() string).

Need to work out how to pass arguments:
 xxx = path of file to be tested
 123456 = limit at which file is large

John



Re: A performance question

2007-05-24 Thread John Beckett

Yongwei Wu wrote:

Even FAT32 supports files much larger than 4GB.


Not true. FAT32 supports files up to 4 GB.


Sorry I shot my mouth off there - I realised my blunder about ten
minutes after sending. I haven't actually used a FAT32 partition
for over ten years, and was confusing the maximum size of a FAT32
partition with its maximum file size.

On NTFS of course, as you mentioned, the sky is the limit. I
have made files larger than 4GB, and have written a couple of
simple programs to work with such files, so my basic point is
valid. The Win32 API supports files much larger than 4GB.

John



Re: A performance question

2007-05-24 Thread John Beckett

panshizhu wrote:

Yes, but on all systems, vim script could not take 64-bit
integers


I know that. My proposal is for a new Vim script function:
   islargefile({fname}, {limit})

which would return nonzero if the size of the file is greater
than the 32-bit signed {limit} argument.

Vim could easily handle the 64-bit arithmetic that is available
on several systems, so the proposed islargefile() would
accurately indicate whether the file size was greater than the
specified limit. The limit would be up to 2G - I don't think
there's any need to get cute and allow the caller to pass a
negative value which would then be treated as unsigned.

John



Re: A performance question

2007-05-24 Thread A.J.Mechelynck

John Beckett wrote:

Charles E Campbell Jr wrote:

Sounds like the filesize is getting stored in a 32bit signed
number, and overflowing.


Yes, definitely.


Please let me know what getfsize() is actually returning


The return value is the bit pattern for the low 32 bits of the
true 64-bit file size:
3,146,839,490 file actual size
   -1,148,127,806 returned by getfsize()

The sum of the absolute values is exactly 4G. I confirmed the
above with a file exactly 8G + 512 bytes (getfsize() said it was
512 bytes).

I was going to suggest that you treat a negative getfsize()
value as a large file (may as well do that even if the value is
-1 for an error indication).

I suppose that would be useful until some better solution is
implemented. I didn't propose it because it would be quite ugly
for the script to give results like this:
   3GB file is large
   5GB file is not large
   7GB file is large
   9GB file is not large

Another ugly (but accurate) workaround would be:
- Provide the source for a small executable called islargefile
 to do the calculation.
- Provide an option in the script to use the executable.
- Have the script execute system('islargefile xxx 123456').
 Return value 0 means no, 1 means yes, 2 means error
 (return is first character of system() string).

Need to work out how to pass arguments:
 xxx = path of file to be tested
 123456 = limit at which file is large

John



What about a different function to return, say, the number of 1K blocks (or 
the number of times 2^n bytes, with a parameter passed to the function) that a 
file uses?



Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
252. You vote for foreign officials.


Re: A performance question

2007-05-24 Thread A.J.Mechelynck

John Beckett wrote:

Yongwei Wu wrote:

Even FAT32 supports files much larger than 4GB.


Not true. FAT32 supports files up to 4 GB.


Sorry I shot my mouth off there - I realised my blunder about ten
minutes after sending. I haven't actually used a FAT32 partition
for over ten years, and was confusing the maximum size of a FAT32
partition with its maximum file size.

On NTFS of course, as you mentioned, the sky is the limit. I
have made files larger than 4GB, and have written a couple of
simple programs to work with such files, so my basic point is
valid. The Win32 API supports files much larger than 4GB.

John



...not to mention Unix/Linux, with its variety of not only FAT12, FAT16, FAT32 
and NTFS but also ext2, ext3, reiserfs, etc., supported. I see a backups.tgz 
file of 7GB (which normally isn't mounted) so big files exist here too.



Best regards,
Tony.
--
While I, with my usual enthusiasm,
Was exploring in Ermintrude's busiasm,
She explained, They are flat,
But think nothing of that --
You will find that my sweet sister Susiasm.


Re: A performance question

2007-05-24 Thread John Beckett

A.J.Mechelynck wrote:

What about a different function to return, say, the number of
1K blocks (or the number of times 2^n bytes, with a parameter
passed to the function) that a file uses?


Yes, that's a much more general and better idea.

Since there's probably not much need for this, I think that
simplicity would be good. That is, have the function work in a
fixed way with no options.

Re Dr.Chip's LargeFile script: It occurs to me that another
workaround would be to use system() to capture the output of
'ls -l file' or 'dir file' (need an option for which).

Then do some funky editing to calculate the number of digits in
the file length. If more than 9, treat file as large.

I'm playing with a tiny utility to help the LargeFile script.
Bluesky: Its code (64-bit file size) could potentially be
incorporated in Vim. I'll post results in vim-dev.

John



Re: A performance question

2007-05-23 Thread Peter Palm
Op woensdag 23 mei 2007, schreef fREW:
 Another thing that might help with speed that was mentioned a month
 or so ago is the following script specifically aimed at increasing
 speed for large files:
 http://www.vim.org/scripts/script.php?script_id=1506.

Indeed, among other things, this disables the swap file for 'large' 
files, which should really speed up things.


Peter



Re: A performance question

2007-05-23 Thread John Beckett

Peter Palm wrote:

http://www.vim.org/scripts/script.php?script_id=1506.


Indeed, among other things, this disables the swap file for
'large' files, which should really speed up things.


I was going to report the following issue to vim-dev after I got
a chance to investigate it a little further, but it seems
appropriate to mention it now.

I did some work with a 3GB text file on Win32, using Vim 7.0 and
Dr.Chip's LargeFile.vim script from Tip 1506 above.

The result was really ugly. The script failed to notice that 3GB
was large because the Vim function getfsize(f) returned a
negative number.

Vim eventually opened the file and was able to search, etc. So
Vim doesn't rely on the code behind getfsize().

I started looking at what could be done to fix the bug, but have
had to leave it for another day. I was starting to think that it
wouldn't be too hard to use the extended functions available in
recent Win32 and Linux to get a 64-bit file size. Then maybe
provide a function to compare a 64-bit file size with a
specified 32-bit limit, so LargeFile.vim could work reliably.

I haven't checked getfsize() on 32-bit Linux yet, nor am I
sufficiently patient to try opening the 3GB file with Vim 7.1.

John



Re: A performance question

2007-05-23 Thread panshizhu
John Beckett [EMAIL PROTECTED] 写于 2007-05-23 18:39:22:
 The result was really ugly. The script failed to notice that 3GB
 was large because the Vim function getfsize(f) returned a
 negative number.

 I haven't checked getfsize() on 32-bit Linux yet, nor am I
 sufficiently patient to try opening the 3GB file with Vim 7.1.

 John

As far as I know, Windows does not support files larger than 4GB. So its
okay to use unsigned 32-bit for filesize in windows. It is not that 32bit
isn't enough, but the fsize should be *unsigned*.

The problem is that vim script can use only *signed* 32-bit int as internel
type, so there might be improvement of vim script engine ―― instead of get
the 64-bit file size.

--
Sincerely, Pan, Shi Zhu. ext: 2606

Re: A performance question

2007-05-23 Thread John Beckett

panshizhu wrote:

As far as I know, Windows does not support files larger than
4GB. So its okay to use unsigned 32-bit for filesize in
windows.


It's not as bad as that! Even FAT32 supports files much larger
than 4GB.

The Win32 API includes function _stati64() to get a 64-bit file
size (the API really wants you to use GetFileSize(); _stati64()
is for old timers).

I was envisaging some new Vim script function like:
   islargefile({fname}, {limit})

which would return nonzero if the size of the file is greater
than the 32-bit signed {limit} argument.

On many systems, the calculation could use 64-bit integers.

John



Re: A performance question

2007-05-23 Thread panshizhu
John Beckett [EMAIL PROTECTED] 写于 2007-05-23 19:32:25:
 On many systems, the calculation could use 64-bit integers.

 John

Yes, but on all systems, vim script could not take 64-bit integers:

see eval.txt line 38:

1.1 Variable types ~
  *E712*
There are five types of variables:

NumberA 32 bit signed number.
Examples:  -123  0x10  0177

The only integer which supported by vim script is 32-bit signed. So even if
you can get 64-bit file size, you cannot save it in any variables in vim
script.

--
Sincerely, Pan, Shi Zhu. ext: 2606

Re: A performance question

2007-05-23 Thread Charles E Campbell Jr

Robert M Robinson wrote:

That brings me to my question.  I have noticed that when editing large 
files (millions of lines), deleting a large number of lines (say, 
hundreds of thousands to millions) takes an unbelieveably long time in 
VIM--at least on my systems.  This struck me as so odd, I looked you 
up (for the first time in all my years of use) so I could ask why!


The LargeFile.vim plugin helps to speed up the editing of large files:

http://vim.sourceforge.net/scripts/script.php?script_id=1506

It does so by changing a number of things: no syntax highlighting, no 
undo, etc.


Regards,
Chip Campbell



Re: A performance question

2007-05-23 Thread Charles E Campbell Jr

John Beckett wrote:


Peter Palm wrote:


http://www.vim.org/scripts/script.php?script_id=1506.



Indeed, among other things, this disables the swap file for
'large' files, which should really speed up things.



I was going to report the following issue to vim-dev after I got
a chance to investigate it a little further, but it seems
appropriate to mention it now.

I did some work with a 3GB text file on Win32, using Vim 7.0 and
Dr.Chip's LargeFile.vim script from Tip 1506 above.

The result was really ugly. The script failed to notice that 3GB
was large because the Vim function getfsize(f) returned a
negative number.



Sounds like the filesize is getting stored in a 32bit signed number, and 
overflowing.
Is the negative number -1 (that would mean file can't be found)?  If 
not, then perhaps
that fact could be used to extend the LargeFile's ability to catch 
large files: trigger
when getfsize() returns a number  -1.  Please let me know what 
getfsize() is actually

returning when applied to that 3GB file.

Regards,
Chip Campbell



Re: A performance question

2007-05-23 Thread Robert Maxwell Robinson


In that case, I'll have to thank Bram for fixing my problem before I even 
asked him to do so!  Thanks Gary, when I get a chance I'll download vim 7.


To those of you who provided links to work-around scripts etc., thank you 
for your help.  If any of you are having trouble with large files I'd 
recommend doing what I _didn't_ do:  make sure you're using the most 
recent version of vim before looking for other solutions.  You may not 
need to reduce vim's capabilities in order to work with large files, 
either!


Cheers,

Max

On Tue, 22 May 2007, Gary Johnson wrote:


It turns out that this Red Hat installation also has vim 6.3.82 in
/usr/bin/vim, so I tried that, too.

  /usr/bin/vim -u NONE two_million_lines

  50%
  :.,$d

2 minutes 30 seconds!  Eureka!  According to the System Monitor CPU
bar color, that was almost all User time, whereas with vim 7.1, it
was a more balanced mix of User and Kernel time.  (Kudos to Bram for
such a performance improvement from vim 6 to 7!)



Re: A performance question

2007-05-23 Thread A.J.Mechelynck

Robert Maxwell Robinson wrote:


In that case, I'll have to thank Bram for fixing my problem before I 
even asked him to do so!  Thanks Gary, when I get a chance I'll download 
vim 7.


To those of you who provided links to work-around scripts etc., thank 
you for your help.  If any of you are having trouble with large files 
I'd recommend doing what I _didn't_ do:  make sure you're using the most 
recent version of vim before looking for other solutions.  You may not 
need to reduce vim's capabilities in order to work with large files, 
either!


Cheers,

Max



That, my dear Max, is golden advice indeed, and not only with Vim: if you have 
a problem with some software, indeed any problem with any software, try to 
reproduce your problem using the latest available version of that software 
(the latest one which doesn't break your system of course). You shouldn't come 
out worse off for installing it, and, who knows? maybe your problem was solved 
by one of the intervening bugfixes. This advice is particularly worth while 
for fast-moving software such as, among others, Vim.



Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
251. You've never seen your closest friends who usually live WAY too far away.


Re: A performance question

2007-05-23 Thread panshizhu
Charles E Campbell Jr [EMAIL PROTECTED] 写于 2007-05-23 21:38:27:
 Sounds like the filesize is getting stored in a 32bit signed number, and
 overflowing.
 Is the negative number -1 (that would mean file can't be found)?  If
 not, then perhaps
 that fact could be used to extend the LargeFile's ability to catch
 large files: trigger
 when getfsize() returns a number  -1.  Please let me know what
 getfsize() is actually
 returning when applied to that 3GB file.

 Regards,
 Chip Campbell


Yes the getfsize() does return -1 when filesize between 2G and 4G. But if
the filesize is 4G to 6G, this may not work.

Anyway, trigger when getfsize returns a number  -1 does help in 50% cases.
It can't be wrong, so please just do it.

--
Sincerely, Pan, Shi Zhu. ext: 2606

Re: A performance question

2007-05-23 Thread Yongwei Wu

On 23/05/07, John Beckett [EMAIL PROTECTED] wrote:

panshizhu wrote:
 As far as I know, Windows does not support files larger than
 4GB. So its okay to use unsigned 32-bit for filesize in
 windows.

It's not as bad as that! Even FAT32 supports files much larger
than 4GB.


Not true. FAT32 supports files up to 4 GB. Check

 http://support.microsoft.com/kb/314463

NTFS does support big files. I can copy big movie files to a USB hard
disk only when it is formatted in NTFS.

Who really want to edit TEXT files as large as that? I cannot think of
scenarios other than log files. Maybe Vim does not fit in this role.

Best regards,

Yongwei

--
Wu Yongwei
URL: http://wyw.dcweb.cn/


Re: A performance question

2007-05-23 Thread panshizhu
Yongwei Wu [EMAIL PROTECTED] 写于 2007-05-24 11:28:06:
 Who really want to edit TEXT files as large as that? I cannot think of
 scenarios other than log files. Maybe Vim does not fit in this role.

 Best regards,

 Yongwei
 --

Yes it fits in this role, and frankly speaking this was the reason I first
choose Vim. (tail -h might be better for that, but if we want to do search,
we may have to use vi.)

Six years ago I often need to check the logs in the servers of my company
(those are generally HP, DEC and IBM minicomputers with different Unixes
without vim installed), we got more than 30 servers like that and the file
is usually 2GB to 11GB. I open the log file in plain vi and it takes way
too long to open.

One day I happened to compile and installed Vim (it was 5.x or 6.x) on one
of the servers, and found that the Vim opens those file about 10 to 20
times faster than the plain Vi, then I installed Vim on all of our servers
and feel that Vim greatly speeds my work. After that I begin to learn Vim.

So you see, if Vim could not handle Big text files, I would not have know
and using vim now. Vim5 and Vim6 opens big file fast, if anyone opens 3GB
text file for more than 60 seconds, I'll doubt if this is an issue of Vim7
or something wrong with his own configuration.

--
Sincerely, Pan, Shi Zhu. ext: 2606

Re: A performance question

2007-05-23 Thread fREW

On 5/23/07, Yongwei Wu [EMAIL PROTECTED] wrote:

On 23/05/07, John Beckett [EMAIL PROTECTED] wrote:
 panshizhu wrote:
  As far as I know, Windows does not support files larger than
  4GB. So its okay to use unsigned 32-bit for filesize in
  windows.

 It's not as bad as that! Even FAT32 supports files much larger
 than 4GB.

Not true. FAT32 supports files up to 4 GB. Check

  http://support.microsoft.com/kb/314463

NTFS does support big files. I can copy big movie files to a USB hard
disk only when it is formatted in NTFS.

Who really want to edit TEXT files as large as that? I cannot think of
scenarios other than log files. Maybe Vim does not fit in this role.

Best regards,

Yongwei

--
Wu Yongwei
URL: http://wyw.dcweb.cn/



Someone recently was emailing the list about looking at a small
section of DNA with vim as text and it was a number of gigs.  I think
he ended up using other unix tools (sed and grep I think), but
nontheless, text files can be big too ;-)

-fREW


Re: A performance question

2007-05-22 Thread A.J.Mechelynck

Robert M Robinson wrote:


First, thanks very much for creating VIM!  I have been using it on Linux 
systems for years, and now use it via cygwin at home as well.  I vastly 
prefer VIM to EMACS, especially at home.  I learned vi on a VAX/VMS 
system long ago (a friend of mine had ported it), when our computer 
science department was loading so many people on the VAXen that EDT was 
rendered unusably slow.  I still like VIM largely because I can do so 
much with so little effort in so little time.


That brings me to my question.  I have noticed that when editing large 
files (millions of lines), deleting a large number of lines (say, 
hundreds of thousands to millions) takes an unbelieveably long time in 
VIM--at least on my systems.  This struck me as so odd, I looked you up 
(for the first time in all my years of use) so I could ask why!


Seriously, going to line 1 million of a 2 million line file and typing 
the command :.,$d takes _minutes_ on my system (Red Hat Linux on a 
2GHz Athlon processor (i686), 512kb cache, 3 Gb memory), far longer than 
searching the entire 2 million line file for a single word 
(:g/MyQueryName/p).  Doing it this way fits way better into my usual 
workflow than using head -n 100, because of course I'm using a 
regular expression search to determine that I

want to truncate my file at line 100 in the first place.

I looked in the archive, and couldn't see that this issue had been 
raised before.  Is there any chance it can get added to the list of 
performance enhancement requests?


Thanks,

Max Robinson, PhD



I think this is just part of how Vim behaves.

When you edit a file, Vim holds the whole file in memory (IIUC). When you 
delete a million lines, Vim frees (i.e., releases to the OS) the memory those 
lines were using. That takes some time.



Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
250. You've given up the search for the perfect woman and instead,
 sit in front of the PC until you're just too tired to care.


Re: A performance question

2007-05-22 Thread Tim Chase

That brings me to my question.  I have noticed that when
editing large files (millions of lines), deleting a large
number of lines (say, hundreds of thousands to millions) takes
an unbelieveably long time in VIM--at least on my systems.


The issue of editing large files comes up occasionally.  A few 
settings can be tweaked to vastly improve performance.  Notably, 
the 'undolevels' setting can be reduced to -1 or 0 for improved 
performance.  If your lines are long, it can also help to disable 
syntax highlighting as well.  You can drop in on one such thread 
here:


  http://www.nabble.com/Re%3A-editing-large-file-p3665161.html

or the associated vim-tip at

  http://www.vim.org/tips/tip.php?tip_id=611

Another option might be to use a stream-oriented tool such as sed 
to edit your file:


  sed '10q'  infile.txt  outfile.txt

Fortunately, Vim has oodles of knobs to twiddle, so you can 
monkey with 'undolevels', 'swapfile', and the 'bufhidden' 
setting, as well as turning off sytnax highlighting, all of which 
can improve the performance of vim under uncommon load.



This struck me as so odd, I looked you up (for the first time
in all my years of use) so I could ask why!


Welcome aboard...the list is friendly, informative, on-topic, and 
an all-round example of what a mailing-list should be. :)


-tim







Re: A performance question

2007-05-22 Thread Robert Maxwell Robinson


Thanks, Tim.  I'll look at the options you recommended--and those you 
didn't, so I may not need to ask next time.  :)


Cheers,

Max

On Tue, 22 May 2007, Tim Chase wrote:

The issue of editing large files comes up occasionally.  A few settings can 
be tweaked to vastly improve performance.  Notably, the 'undolevels' 
setting can be reduced to -1 or 0 for improved performance.  If your lines 
are long, it can also help to disable syntax highlighting as well.  You can 
drop in on one such thread here:


 http://www.nabble.com/Re%3A-editing-large-file-p3665161.html

or the associated vim-tip at

 http://www.vim.org/tips/tip.php?tip_id=611

Another option might be to use a stream-oriented tool such as sed to edit 
your file:


 sed '10q'  infile.txt  outfile.txt

Fortunately, Vim has oodles of knobs to twiddle, so you can monkey with 
'undolevels', 'swapfile', and the 'bufhidden' setting, as well as turning 
off sytnax highlighting, all of which can improve the performance of vim 
under uncommon load.



This struck me as so odd, I looked you up (for the first time
in all my years of use) so I could ask why!


Welcome aboard...the list is friendly, informative, on-topic, and an 
all-round example of what a mailing-list should be. :)


-tim


Re: A performance question

2007-05-22 Thread Robert Maxwell Robinson


Well, I don't mean to.  :set says this:

--
autoindent  helplang=en scroll=11   t_Sb=Esc[4%dm
backspace=2 history=50  ttyfast t_Sf=Esc[3%dm
cscopetag   hlsearchttymouse=xterm
cscopeverbose   ruler   viminfo='20,50
cscopeprg=/usr/bin/cscope
fileencoding=utf-8
fileendcodings=utf-8,latin1
formatoptions=tcql
--
So, do I have syntax highlighting enabled?  The t_Sb and t_Sf look 
suspiciously like formatting commands, but I confess I'm not conversant 
on vim options.


Thanks,

Max

On Tue, 22 May 2007, Gary Johnson wrote:


Do you have syntax highlighting enabled?  That can really slow vim
down.

I created and opened a file as follows:

  while true
  do
  echo '123456789012345678901234567890123456789012345678901234567890'
  done | head -200  two_million_lines
  vim two_million_lines

Then within vim executed:

  50%
  :.,$d

Using vim 7.1 under Cygwin and Windows XP on a 3.6 GHz Pentium with
2 GB of RAM:  9 seconds.

Using vim 7.1 under Red Hat Enterprise Linux WS release 4 on a 2.8
GHz Pentium with 500 MB RAM:  16 seconds.

Regards,
Gary

--
Gary Johnson | Agilent Technologies
[EMAIL PROTECTED] | Mobile Broadband Division
| Spokane, Washington, USA



Re: A performance question

2007-05-22 Thread Tim Chase

Do you have syntax highlighting enabled?  That can really slow vim
down.


Well, I don't mean to.  :set says this:


It can be toggled via

:syntax on

and

:syntax off

To see what flavor of syntax highlighting you currently have, you 
can query the 'syntax' setting:


:set syntax?

-tim





Re: A performance question

2007-05-22 Thread Robert Maxwell Robinson


I just tried deleting 1133093 lines of a 1133093+1133409 line file, after 
typing :syntax off.  It took about 3 minutes.


Max

On Tue, 22 May 2007, Tim Chase wrote:


Do you have syntax highlighting enabled?  That can really slow vim
down.


Well, I don't mean to.  :set says this:


It can be toggled via

:syntax on

and

:syntax off

To see what flavor of syntax highlighting you currently have, you can query 
the 'syntax' setting:


:set syntax?

-tim






Re: A performance question

2007-05-22 Thread Robert Maxwell Robinson


:set syntax? replies syntax=.  I don't think it's syntax highlighting. 
I've used that with C and Prolog code before; I gave it up because it was 
too slow.  I'm editing text output from one of my programs; truncating the 
output of a day-long run to match a run in progress for testing purposes, 
hunting down rare bugs.


Max

On Tue, 22 May 2007, Tim Chase wrote:


Do you have syntax highlighting enabled?  That can really slow vim
down.


Well, I don't mean to.  :set says this:


It can be toggled via

:syntax on

and

:syntax off

To see what flavor of syntax highlighting you currently have, you can query 
the 'syntax' setting:


:set syntax?

-tim






Re: A performance question

2007-05-22 Thread Andy Wokula

A.J.Mechelynck schrieb:

Robert M Robinson wrote:


First, thanks very much for creating VIM!  I have been using it on 
Linux systems for years, and now use it via cygwin at home as well.  I 
vastly prefer VIM to EMACS, especially at home.  I learned vi on a 
VAX/VMS system long ago (a friend of mine had ported it), when our 
computer science department was loading so many people on the VAXen 
that EDT was rendered unusably slow.  I still like VIM largely because 
I can do so much with so little effort in so little time.


That brings me to my question.  I have noticed that when editing large 
files (millions of lines), deleting a large number of lines (say, 
hundreds of thousands to millions) takes an unbelieveably long time in 
VIM--at least on my systems.  This struck me as so odd, I looked you 
up (for the first time in all my years of use) so I could ask why!


Seriously, going to line 1 million of a 2 million line file and typing 
the command :.,$d takes _minutes_ on my system (Red Hat Linux on a 
2GHz Athlon processor (i686), 512kb cache, 3 Gb memory), far longer 
than searching the entire 2 million line file for a single word 
(:g/MyQueryName/p).  Doing it this way fits way better into my usual 
workflow than using head -n 100, because of course I'm using a 
regular expression search to determine that I

want to truncate my file at line 100 in the first place.

I looked in the archive, and couldn't see that this issue had been 
raised before.  Is there any chance it can get added to the list of 
performance enhancement requests?


Thanks,

Max Robinson, PhD



I think this is just part of how Vim behaves.

When you edit a file, Vim holds the whole file in memory (IIUC). When 
you delete a million lines, Vim frees (i.e., releases to the OS) the 
memory those lines were using. That takes some time.



Best regards,
Tony.


What about the numbered registers?
  :h 1

After freeing the lines, they are copied to 1 .
And the content of 1 is shifted to 2 (before, of course)
And so on, until register 9.

To avoid the copies, the blackhole register can be used:
   :.,$d _

If there are copies, registeres can be cleared by hand:
   :let @1 = 
   :let @2 = 
   ...
   :let @9 = 
This also takes time, but frees the memory.

--
Regards,
Andy


Re: A performance question

2007-05-22 Thread Robert Maxwell Robinson


Thanks, Andy; the black hole register is a new idea to me.  Unfortunately, 
:.,$d _ to the black hole register appears to take the same amount of 
time as :.,$d itself.  set undolevels=-1 speeds it up, but set 
undolevels=0 does not; this suggests to me that the problem isn't related 
to how many undo buffers are around, just that the undo facility is 
available at all.


Honestly, the 3 minutes it takes has to involve a significant amount of 
waste, such as timing out for some system resource; reading the 2 million 
line file into memory doesn't take that long in the first place, and the 
delete is the first change I make to the file, so there isn't a stack of 
buffers filled with undo information to start with.


Max

On Tue, 22 May 2007, Andy Wokula wrote:


A.J.Mechelynck schrieb:

Robert M Robinson wrote:


First, thanks very much for creating VIM!  I have been using it on Linux 
systems for years, and now use it via cygwin at home as well.  I vastly 
prefer VIM to EMACS, especially at home.  I learned vi on a VAX/VMS 
system long ago (a friend of mine had ported it), when our computer 
science department was loading so many people on the VAXen that EDT was 
rendered unusably slow.  I still like VIM largely because I can do so 
much with so little effort in so little time.


That brings me to my question.  I have noticed that when editing large 
files (millions of lines), deleting a large number of lines (say, 
hundreds of thousands to millions) takes an unbelieveably long time in 
VIM--at least on my systems.  This struck me as so odd, I looked you up 
(for the first time in all my years of use) so I could ask why!


Seriously, going to line 1 million of a 2 million line file and typing 
the command :.,$d takes _minutes_ on my system (Red Hat Linux on a 
2GHz Athlon processor (i686), 512kb cache, 3 Gb memory), far longer than 
searching the entire 2 million line file for a single word 
(:g/MyQueryName/p).  Doing it this way fits way better into my usual 
workflow than using head -n 100, because of course I'm using a 
regular expression search to determine that I

want to truncate my file at line 100 in the first place.

I looked in the archive, and couldn't see that this issue had been 
raised before.  Is there any chance it can get added to the list of 
performance enhancement requests?


Thanks,

Max Robinson, PhD



I think this is just part of how Vim behaves.

When you edit a file, Vim holds the whole file in memory (IIUC). When you 
delete a million lines, Vim frees (i.e., releases to the OS) the memory 
those lines were using. That takes some time.



Best regards,
Tony.


What about the numbered registers?
 :h 1

After freeing the lines, they are copied to 1 .
And the content of 1 is shifted to 2 (before, of course)
And so on, until register 9.

To avoid the copies, the blackhole register can be used:
  :.,$d _

If there are copies, registeres can be cleared by hand:
  :let @1 = 
  :let @2 = 
  ...
  :let @9 = 
This also takes time, but frees the memory.

--
Regards,
Andy



Re: A performance question

2007-05-22 Thread Gary Johnson
On 2007-05-22, Robert Maxwell Robinson [EMAIL PROTECTED] wrote:
  :set undolevels=-1 caused my test to run in less than 15 sec, with no 
  other options fiddled with.  Thanks Tim, now I have a work-around!
 
  Now, does having the undo facility available _necessarily_ mean deleting a 
  large chunk of a file takes so long, or can that be added to the list of 
  desired performance enhancements?

Not in my experience.  In both experiments I reported earlier I 
hadn't done anything special with 'undolevels' and checking them now 
shows undolevels=1000.

I repeated the experiment on the Linux system staring vim as

   vim -u NONE two_million_lines

:.,$d took 13 seconds.  I did notice that the CPU was railed at 
100% during that time, so loading of your CPU by other tasks may 
have an effect, as might the actual physical memory available to 
vim.

:set undolevels=-1 did reduce the time to 10 seconds.

Regards,
Gary

-- 
Gary Johnson | Agilent Technologies
[EMAIL PROTECTED] | Mobile Broadband Division
 | Spokane, Washington, USA


Re: A performance question

2007-05-22 Thread Robert Maxwell Robinson


Hmm, interesting.  I've noticed before that the CPU is pegged when I'm 
deleting, but I don't think my machine's behavior is due to CPU load; the 
machine has two CPUs, I'm typically the only (serious) user, as top has 
confirmed is the case now, and I get the same behavior whether I'm running 
another large job or not.  My other large job takes about 1 Gb leaving 
almost 2 Gb of memory free, so I don't think I'm running out of physical 
memory, either.


Given the difference between your results and mine, I finally checked my 
software versions, which are old:  Red Hat 3.4.6, vim 6.3.82. 
Unfortunately I don't have permission to update this system, and the 
administrator hasn't been willing to do so in the past.


I went looking for release notes for vim, but the announcements I 
found didn't go into detail about what bugs were fixed in which version. 
Can someone point me in the right direction?


Thanks.  --Max

On Tue, 22 May 2007, Gary Johnson wrote:


 Now, does having the undo facility available _necessarily_ mean deleting a
 large chunk of a file takes so long, or can that be added to the list of
 desired performance enhancements?


Not in my experience.  In both experiments I reported earlier I
hadn't done anything special with 'undolevels' and checking them now
shows undolevels=1000.

I repeated the experiment on the Linux system staring vim as

  vim -u NONE two_million_lines

:.,$d took 13 seconds.  I did notice that the CPU was railed at
100% during that time, so loading of your CPU by other tasks may
have an effect, as might the actual physical memory available to
vim.

:set undolevels=-1 did reduce the time to 10 seconds.

Regards,
Gary

--
Gary Johnson | Agilent Technologies
[EMAIL PROTECTED] | Mobile Broadband Division
| Spokane, Washington, USA



Re: A performance question

2007-05-22 Thread Gary Johnson
On 2007-05-22, Robert Maxwell Robinson [EMAIL PROTECTED] wrote:
  Hmm, interesting.  I've noticed before that the CPU is pegged when I'm 
  deleting, but I don't think my machine's behavior is due to CPU load; the 
  machine has two CPUs, I'm typically the only (serious) user, as top has 
  confirmed is the case now, and I get the same behavior whether I'm running 
  another large job or not.  My other large job takes about 1 Gb leaving 
  almost 2 Gb of memory free, so I don't think I'm running out of physical 
  memory, either.
 
  Given the difference between your results and mine, I finally checked my 
  software versions, which are old:  Red Hat 3.4.6, vim 6.3.82. Unfortunately 
  I don't have permission to update this system, and the administrator hasn't 
  been willing to do so in the past.

It turns out that this Red Hat installation also has vim 6.3.82 in 
/usr/bin/vim, so I tried that, too.

   /usr/bin/vim -u NONE two_million_lines

   50%
   :.,$d

2 minutes 30 seconds!  Eureka!  According to the System Monitor CPU 
bar color, that was almost all User time, whereas with vim 7.1, it 
was a more balanced mix of User and Kernel time.  (Kudos to Bram for 
such a performance improvement from vim 6 to 7!)

I'm not allowed to update anything under /usr on this system, 
either, so I build the latest and greatest versions of tools under 
$HOME/src and put the binaries in $HOME/bin.

Building vim under Linux is really easy.  I do the following.

   mkdir ~/src/Linux/vim-7.1
   cd ~/src/Linux/vim-7.1

Download vim-7.1.tar.bz2 from vim.sf.net.

   tar jxf vim-7.1.tar.bz2
   cd vim71
   ./configure --prefix=$HOME/src/Linux/vim-7.1 --enable-cscope
   make
   make install
   ln -s $HOME/src/Linux/vim-7.1/bin/vim ~/bin/Linux/vim

My PATH includes $HOME/bin/Linux and that directory contains most of  
the symbolic links to vim that you will find in 
$HOME/src/Linux/vim-7.1/bin; the ones I use.  That is,

   $ cd ~/bin/Linux
   $ ls -l | grep vim
   lrwxrwxrwx  1 garyjohn fw   3 Nov 14  2005 gvim - vim
   lrwxrwxrwx  1 garyjohn fw   3 Nov 14  2005 gvimdiff - vim
   lrwxrwxrwx  1 garyjohn fw   3 Sep 23  2005 vi - vim
   lrwxrwxrwx  1 garyjohn fw   3 Sep 23  2005 view - vim
   lrwxrwxrwx  1 garyjohn fw  40 May 17 18:45 vim - 
/home/garyjohn/src/Linux/vim-7.1/bin/vim
   lrwxrwxrwx  1 garyjohn fw   3 Sep 23  2005 vimdiff - vim

That makes it really easy to update and to test different versions
of vim with only a change to one symbolic link.

But that's just a matter of taste.  The point is that however you
choose to install it, it's easy to build and maintain your own vim
installation without having to bother or bother with your system
administrator.

  I went looking for release notes for vim, but the announcements I found 
  didn't go into detail about what bugs were fixed in which version. Can 
  someone point me in the right direction?

Go to the vim home page, vim.sf.net, click on the link to 
Documentation, then help files online, then main help file, and 
finally, version7.txt.  Or you can just go that page directly,

http://vimdoc.sourceforge.net/htmldoc/version7.html

This describes all the changes from version 6 to version 7, 
including bug fixes.

Regards,
Gary

-- 
Gary Johnson | Agilent Technologies
[EMAIL PROTECTED] | Mobile Broadband Division
 | Spokane, Washington, USA


Re: A performance question

2007-05-22 Thread fREW

On 5/22/07, Gary Johnson [EMAIL PROTECTED] wrote:

On 2007-05-22, Robert Maxwell Robinson [EMAIL PROTECTED] wrote:
  Hmm, interesting.  I've noticed before that the CPU is pegged when I'm
  deleting, but I don't think my machine's behavior is due to CPU load; the
  machine has two CPUs, I'm typically the only (serious) user, as top has
  confirmed is the case now, and I get the same behavior whether I'm running
  another large job or not.  My other large job takes about 1 Gb leaving
  almost 2 Gb of memory free, so I don't think I'm running out of physical
  memory, either.

  Given the difference between your results and mine, I finally checked my
  software versions, which are old:  Red Hat 3.4.6, vim 6.3.82. Unfortunately
  I don't have permission to update this system, and the administrator hasn't
  been willing to do so in the past.

It turns out that this Red Hat installation also has vim 6.3.82 in
/usr/bin/vim, so I tried that, too.

   /usr/bin/vim -u NONE two_million_lines

   50%
   :.,$d

2 minutes 30 seconds!  Eureka!  According to the System Monitor CPU
bar color, that was almost all User time, whereas with vim 7.1, it
was a more balanced mix of User and Kernel time.  (Kudos to Bram for
such a performance improvement from vim 6 to 7!)

I'm not allowed to update anything under /usr on this system,
either, so I build the latest and greatest versions of tools under
$HOME/src and put the binaries in $HOME/bin.

Building vim under Linux is really easy.  I do the following.

   mkdir ~/src/Linux/vim-7.1
   cd ~/src/Linux/vim-7.1

Download vim-7.1.tar.bz2 from vim.sf.net.

   tar jxf vim-7.1.tar.bz2
   cd vim71
   ./configure --prefix=$HOME/src/Linux/vim-7.1 --enable-cscope
   make
   make install
   ln -s $HOME/src/Linux/vim-7.1/bin/vim ~/bin/Linux/vim

My PATH includes $HOME/bin/Linux and that directory contains most of
the symbolic links to vim that you will find in
$HOME/src/Linux/vim-7.1/bin; the ones I use.  That is,

   $ cd ~/bin/Linux
   $ ls -l | grep vim
   lrwxrwxrwx  1 garyjohn fw   3 Nov 14  2005 gvim - vim
   lrwxrwxrwx  1 garyjohn fw   3 Nov 14  2005 gvimdiff - vim
   lrwxrwxrwx  1 garyjohn fw   3 Sep 23  2005 vi - vim
   lrwxrwxrwx  1 garyjohn fw   3 Sep 23  2005 view - vim
   lrwxrwxrwx  1 garyjohn fw  40 May 17 18:45 vim - 
/home/garyjohn/src/Linux/vim-7.1/bin/vim
   lrwxrwxrwx  1 garyjohn fw   3 Sep 23  2005 vimdiff - vim

That makes it really easy to update and to test different versions
of vim with only a change to one symbolic link.

But that's just a matter of taste.  The point is that however you
choose to install it, it's easy to build and maintain your own vim
installation without having to bother or bother with your system
administrator.

  I went looking for release notes for vim, but the announcements I found
  didn't go into detail about what bugs were fixed in which version. Can
  someone point me in the right direction?

Go to the vim home page, vim.sf.net, click on the link to
Documentation, then help files online, then main help file, and
finally, version7.txt.  Or you can just go that page directly,

http://vimdoc.sourceforge.net/htmldoc/version7.html

This describes all the changes from version 6 to version 7,
including bug fixes.

Regards,
Gary

--
Gary Johnson | Agilent Technologies
[EMAIL PROTECTED] | Mobile Broadband Division
 | Spokane, Washington, USA



Another thing that might help with speed that was mentioned a month or
so ago is the following script specifically aimed at increasing speed
for large files:
http://www.vim.org/scripts/script.php?script_id=1506.

-fREW


Re: A performance question

2007-05-22 Thread panshizhu
AFAIK Vim 7 has a different way of handling undo levels.

Have you tried with Vim 6 instead? I had used Vim 6 to edit a text file
(3Gbytes) and do things within seconds.

--
Sincerely, Pan, Shi Zhu. ext: 2606


Robert Maxwell Robinson [EMAIL PROTECTED] 写于 2007-05-23 05:59:20:


 :set undolevels=-1 caused my test to run in less than 15 sec, with no
 other options fiddled with.  Thanks Tim, now I have a work-around!

 Now, does having the undo facility available _necessarily_ mean deleting
a
 large chunk of a file takes so long, or can that be added to the list of
 desired performance enhancements?

 Max

 On Tue, 22 May 2007, Tim Chase wrote:

  The issue of editing large files comes up occasionally.  A few settings
can
  be tweaked to vastly improve performance.  Notably, the 'undolevels'
  setting can be reduced to -1 or 0 for improved performance.  If your
lines
  are long, it can also help to disable syntax highlighting as well. You
can
  drop in on one such thread here: