Re: speed of file(1)

1999-07-21 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (Peter Jeremy) writes:

 "Leif Neland" [EMAIL PROTECTED] wrote:
 My 60MHz Pentium, FreeBSD
 
 time file /usr/home/leif/vnc-3.3.2r
 /usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
 original filename, last modified: Thu Jan 21 19:23:21 1999
 
 real0m1.237s
 user0m0.758s
 sys 0m0.394s

 I can't believe these figures.

Hmm, a 200 MHz Pentium (MMX), 3.2-RELEASE, everything in cache:

$ /usr/bin/time file twofish.tar.gz
twofish.tar.gz: gzip compressed data, deflated, last modified: Mon Jun 15 02:40:53 
1998, os: Unix
0.35 real 0.24 user 0.10 sys

I'd say that considering that things are cached (cpu-bound), it's very
accurately proportional to Leif's time.  Variances can be accounted
for by the slight implementation differences (the MMX version has a
bigger L1 and better branch prediction).

It's also reasonably proportional to a 400 MHz PII (0.09/0.08/0.01
running 3.2 -- 0.06/0.04/0.01 running 2.2.8, BTW).  Considering the
completely different core, this is also quite close to what you might
expect.

 I can't reproduce the complaint using a 64MB PII-266 running -CURRENT -
 there's no evidence of lack of speed, and profiling file(1) doesn't
 show any anomolies.

What are your results, then?


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: speed of file(1)

1999-07-21 Thread Peter Jeremy

Ville-Pertti Keinonen [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] (Peter Jeremy) writes:
 I can't believe these figures.

Based on the figures below, maybe I was overly hasty in this statement.
The changes between 2.x and 3.x magic files have far more impact than
I would have expected.

What are your results, then?

All timings with everything cached (although the 386 only has 8MB
which limits the cacheability).  For the 2.2.5 systems, I give timings
with both the 2.2.5 magic and the 4.0 magic (which is the same as
3.2-RELEASE, in /tmp).

i386SX-25 running 2.2.5 (roughly as posted earlier):
% /usr/bin/time file src/Z/dhcp-2.0b1pl26.tar.gz 
src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: Thu Jan  1 
10:00:00 1970, os: Unix
2.82 real 1.92 user 0.84 sys
% /usr/bin/time file -m /tmp/magic src/Z/dhcp-2.0b1pl26.tar.gz 
src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: Thu Jan  1 
10:00:00 1970, os: Unix
4.05 real 2.67 user 1.23 sys

486DX2-50 running 2.2.5:
% /usr/bin/time file src/Z/dhcp-3.0-alpha-19990423.tar.gz 
src/Z/dhcp-3.0-alpha-19990423.tar.gz: gzip compressed data, deflated, last modified: 
Thu Jan  1 10:00:00 1970, os: Unix
1.43 real 0.96 user 0.38 sys
% /usr/bin/time file -m /tmp/magic src/Z/dhcp-3.0-alpha-19990423.tar.gz 
src/Z/dhcp-3.0-alpha-19990423.tar.gz: gzip compressed data, deflated, last modified: 
Thu Jan  1 10:00:00 1970, os: Unix
2.15 real 1.62 user 0.44 sys

PII-266 running 4.0-CURRENT:
% /usr/bin/time file src/Z/dhcp-1.4.0p6.tar.gz
src/Z/dhcp-1.4.0p6.tar.gz: gzip compressed data, deflated, last modified: Wed Mar  3 
20:57:52 1999, os: Unix
0.13 real 0.09 user 0.03 sys

When I profile file in a slow system (like a 386 or 486), there is an
obvious performance bottleneck:  The problem is the memcpy() invoked
from fgets().  The only solution would seem to be to mmap() magic
and parse it, rather than using fgets() to read it.  This bottleneck
will also be far more obvious on bandwidth-starved systems (like
386SX and 486DX2/4), whereas virtually the whole thing fits into the
L2 cache on my P-II.

Peter



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: speed of file(1)

1999-07-21 Thread Peter Edwards

A quick look at the source reveals:

A MAXMAGIS constant in file.h that estimates a limit of 1000 lines in
magic. (The real number is 4802)

An array sized on MAXMAGIS, that is reallocated every ALLOC_INTR lines
of magic once MAXMAGIS is exceeded.

The patch updates MAXMAGIS to 5000 (give a bit of room to grow)
And makes ALLOC_INCR a variable that is bigger, and doubles every time
it is used, to attenuate the problem if there ever ends up being 1
entries in magic.

Results on a 90Mhz Pentium:

new verson

time ./file ./file
./file: FreeBSD/i386 compact demand paged dynamically linked executable
not stripped
0.14 real 0.11 user 0.02 sys

old verson:

./file: FreeBSD/i386 compact demand paged dynamically linked executable
not stripped
0.79 real 0.60 user 0.16 sys




--
Peter.



Peter Jeremy wrote:
 
 Ville-Pertti Keinonen [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Peter Jeremy) writes:
  I can't believe these figures.
 
 Based on the figures below, maybe I was overly hasty in this statement.
 The changes between 2.x and 3.x magic files have far more impact than
 I would have expected.
 
 What are your results, then?
 
 All timings with everything cached (although the 386 only has 8MB
 which limits the cacheability).  For the 2.2.5 systems, I give timings
 with both the 2.2.5 magic and the 4.0 magic (which is the same as
 3.2-RELEASE, in /tmp).
 
 i386SX-25 running 2.2.5 (roughly as posted earlier):
 % /usr/bin/time file src/Z/dhcp-2.0b1pl26.tar.gz
 src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: Thu Jan  
1 10:00:00 1970, os: Unix
 2.82 real 1.92 user 0.84 sys
 % /usr/bin/time file -m /tmp/magic src/Z/dhcp-2.0b1pl26.tar.gz
 src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: Thu Jan  
1 10:00:00 1970, os: Unix
 4.05 real 2.67 user 1.23 sys
 
 486DX2-50 running 2.2.5:
 % /usr/bin/time file src/Z/dhcp-3.0-alpha-19990423.tar.gz
 src/Z/dhcp-3.0-alpha-19990423.tar.gz: gzip compressed data, deflated, last modified: 
Thu Jan  1 10:00:00 1970, os: Unix
 1.43 real 0.96 user 0.38 sys
 % /usr/bin/time file -m /tmp/magic src/Z/dhcp-3.0-alpha-19990423.tar.gz
 src/Z/dhcp-3.0-alpha-19990423.tar.gz: gzip compressed data, deflated, last modified: 
Thu Jan  1 10:00:00 1970, os: Unix
 2.15 real 1.62 user 0.44 sys
 
 PII-266 running 4.0-CURRENT:
 % /usr/bin/time file src/Z/dhcp-1.4.0p6.tar.gz
 src/Z/dhcp-1.4.0p6.tar.gz: gzip compressed data, deflated, last modified: Wed Mar  3 
20:57:52 1999, os: Unix
 0.13 real 0.09 user 0.03 sys
 
 When I profile file in a slow system (like a 386 or 486), there is an
 obvious performance bottleneck:  The problem is the memcpy() invoked
 from fgets().  The only solution would seem to be to mmap() magic
 and parse it, rather than using fgets() to read it.  This bottleneck
 will also be far more obvious on bandwidth-starved systems (like
 386SX and 486DX2/4), whereas virtually the whole thing fits into the
 L2 cache on my P-II.
 
 Peter
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message

Common subdirectories: file/Magdir and file.new/Magdir
diff -c file/apprentice.c file.new/apprentice.c
*** file/apprentice.c   Wed Jan 28 07:36:21 1998
--- file.new/apprentice.c   Wed Jul 21 12:35:21 1999
***
*** 50,55 
--- 50,56 
  static void eatsize   __P((char **));
  
  static int maxmagic = 0;
+ static int alloc_incr = 256;
  
  static int apprentice_1   __P((char *, int));
  
***
*** 180,188 
struct magic *m;
char *t, *s;
  
- #define ALLOC_INCR20
if (nd+1 = maxmagic){
!   maxmagic += ALLOC_INCR;
if ((magic = (struct magic *) realloc(magic, 
  sizeof(struct magic) * 
  maxmagic)) == NULL) {
--- 181,188 
struct magic *m;
char *t, *s;
  
if (nd+1 = maxmagic){
!   maxmagic += alloc_incr;
if ((magic = (struct magic *) realloc(magic, 
  sizeof(struct magic) * 
  maxmagic)) == NULL) {
***
*** 192,198 
else
exit(1);
}
!   memset(magic[*ndx], 0, sizeof(struct magic) * ALLOC_INCR);
}
m = magic[*ndx];
m-flag = 0;
--- 192,199 
else
exit(1);
}
!   memset(magic[*ndx], 0, sizeof(struct magic) * alloc_incr);
!   alloc_incr *= 2;
}
m = magic[*ndx];
m-flag = 0;
diff -c file/file.h file.new/file.h
*** file/file.h Wed Jul 21 12:37:00 1999
--- file.new/file.h Wed Jul 21 12:35:40 1999
***
*** 35,41 
  

Re: speed of file(1)

1999-07-21 Thread Matthew Dillon

Nice rundown of the problem!

I presume someone is going to commit this...

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

:*** file/apprentice.c  Wed Jan 28 07:36:21 1998
:--- file.new/apprentice.c  Wed Jul 21 12:35:21 1999
:***
:*** 50,55 
:--- 50,56 
:  static void eatsize  __P((char **));
:  
:  static int maxmagic = 0;
:...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: speed of file(1)

1999-07-21 Thread Charles Randall

When this gets committed, can it be applied to both the 3.x and 4.x trees?

Thanks,
Charles

-Original Message-
From: Peter Edwards [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 21, 1999 5:55 AM
To: Peter Jeremy
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: speed of file(1)


A quick look at the source reveals:

A MAXMAGIS constant in file.h that estimates a limit of 1000 lines in
magic. (The real number is 4802)

An array sized on MAXMAGIS, that is reallocated every ALLOC_INTR lines
of magic once MAXMAGIS is exceeded.

The patch updates MAXMAGIS to 5000 (give a bit of room to grow)
And makes ALLOC_INCR a variable that is bigger, and doubles every time
it is used, to attenuate the problem if there ever ends up being 1
entries in magic.

Results on a 90Mhz Pentium:

new verson

time ./file ./file
./file: FreeBSD/i386 compact demand paged dynamically linked executable
not stripped
0.14 real 0.11 user 0.02 sys

old verson:

./file: FreeBSD/i386 compact demand paged dynamically linked executable
not stripped
0.79 real 0.60 user 0.16 sys




--
Peter.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: speed of file(1)

1999-07-21 Thread Peter Jeremy

I wrote:
 Looking at ktrace with MALLOC_OPTIONS=U, it does do a lot of
 realloc()ing (once for every 20 active lines in .../magic) and sbrk()s
 to a maximum size of ~390KB - not really significant.

and in a later message:

 When I profile file in a slow system (like a 386 or 486), there is an
 obvious performance bottleneck:  The problem is the memcpy() invoked
 from fgets().

Peter Edwards [EMAIL PROTECTED] wrote:
A MAXMAGIS constant in file.h that estimates a limit of 1000 lines in
magic. (The real number is 4802)

An array sized on MAXMAGIS, that is reallocated every ALLOC_INTR lines
of magic once MAXMAGIS is exceeded.

That'll teach me to rely on the output from gprof.  I thought that
gprof's claim (repeated by me) that the apprentice_1() - fgets() -
memcpy() chain was taking all the time looked dubious, but it was
consistent across several systems (and it was late at night for me).
(Anyone want to adapt the profiling code so that it correctly
apportions time between callers, rather than just using number of
calls?)

My earlier statement about lots of realloc's, together with the
(accurate) datapoint that memcpy() was very slow should have led me
to Peter Edwards fix.  It also explains why the 3.2/4.x magic file
(which has only about 20% more lines) takes 50% longer to start up
(continually reallocing to increase an array size is O(N^2)).

Congratulations to Peter Edwards.

Peter


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: speed of file(1)

1999-07-21 Thread Wes Peters

Matthew Dillon wrote:
 
 Nice rundown of the problem!
 
 I presume someone is going to commit this...

OK, I've got it on freefall, ready to roll, and building on my 3.2-STABLE
system here.  I'll commit it as soon as *I've* seen it work, if somebody
doesn't beat me to the punch.  ;^)

Thanks for the patch, Peter.

-- 
"Where am I, and what am I doing in this handbasket?"

Wes Peters Softweyr LLC
http://softweyr.com/   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: speed of file(1)

1999-07-21 Thread Ville-Pertti Keinonen

jere...@gsmx07.alcatel.com.au (Peter Jeremy) writes:

 Leif Neland le...@neland.dk wrote:
 My 60MHz Pentium, FreeBSD
 
 time file /usr/home/leif/vnc-3.3.2r
 /usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
 original filename, last modified: Thu Jan 21 19:23:21 1999
 
 real0m1.237s
 user0m0.758s
 sys 0m0.394s

 I can't believe these figures.

Hmm, a 200 MHz Pentium (MMX), 3.2-RELEASE, everything in cache:

$ /usr/bin/time file twofish.tar.gz
twofish.tar.gz: gzip compressed data, deflated, last modified: Mon Jun 15 
02:40:53 1998, os: Unix
0.35 real 0.24 user 0.10 sys

I'd say that considering that things are cached (cpu-bound), it's very
accurately proportional to Leif's time.  Variances can be accounted
for by the slight implementation differences (the MMX version has a
bigger L1 and better branch prediction).

It's also reasonably proportional to a 400 MHz PII (0.09/0.08/0.01
running 3.2 -- 0.06/0.04/0.01 running 2.2.8, BTW).  Considering the
completely different core, this is also quite close to what you might
expect.

 I can't reproduce the complaint using a 64MB PII-266 running -CURRENT -
 there's no evidence of lack of speed, and profiling file(1) doesn't
 show any anomolies.

What are your results, then?


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-21 Thread Peter Jeremy
Ville-Pertti Keinonen w...@iki.fi wrote:
jere...@gsmx07.alcatel.com.au (Peter Jeremy) writes:
 I can't believe these figures.

Based on the figures below, maybe I was overly hasty in this statement.
The changes between 2.x and 3.x magic files have far more impact than
I would have expected.

What are your results, then?

All timings with everything cached (although the 386 only has 8MB
which limits the cacheability).  For the 2.2.5 systems, I give timings
with both the 2.2.5 magic and the 4.0 magic (which is the same as
3.2-RELEASE, in /tmp).

i386SX-25 running 2.2.5 (roughly as posted earlier):
% /usr/bin/time file src/Z/dhcp-2.0b1pl26.tar.gz 
src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: Thu 
Jan  1 10:00:00 1970, os: Unix
2.82 real 1.92 user 0.84 sys
% /usr/bin/time file -m /tmp/magic src/Z/dhcp-2.0b1pl26.tar.gz 
src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: Thu 
Jan  1 10:00:00 1970, os: Unix
4.05 real 2.67 user 1.23 sys

486DX2-50 running 2.2.5:
% /usr/bin/time file src/Z/dhcp-3.0-alpha-19990423.tar.gz 
src/Z/dhcp-3.0-alpha-19990423.tar.gz: gzip compressed data, deflated, last 
modified: Thu Jan  1 10:00:00 1970, os: Unix
1.43 real 0.96 user 0.38 sys
% /usr/bin/time file -m /tmp/magic src/Z/dhcp-3.0-alpha-19990423.tar.gz 
src/Z/dhcp-3.0-alpha-19990423.tar.gz: gzip compressed data, deflated, last 
modified: Thu Jan  1 10:00:00 1970, os: Unix
2.15 real 1.62 user 0.44 sys

PII-266 running 4.0-CURRENT:
% /usr/bin/time file src/Z/dhcp-1.4.0p6.tar.gz
src/Z/dhcp-1.4.0p6.tar.gz: gzip compressed data, deflated, last modified: Wed 
Mar  3 20:57:52 1999, os: Unix
0.13 real 0.09 user 0.03 sys

When I profile file in a slow system (like a 386 or 486), there is an
obvious performance bottleneck:  The problem is the memcpy() invoked
from fgets().  The only solution would seem to be to mmap() magic
and parse it, rather than using fgets() to read it.  This bottleneck
will also be far more obvious on bandwidth-starved systems (like
386SX and 486DX2/4), whereas virtually the whole thing fits into the
L2 cache on my P-II.

Peter



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-21 Thread Peter Edwards
A quick look at the source reveals:

A MAXMAGIS constant in file.h that estimates a limit of 1000 lines in
magic. (The real number is 4802)

An array sized on MAXMAGIS, that is reallocated every ALLOC_INTR lines
of magic once MAXMAGIS is exceeded.

The patch updates MAXMAGIS to 5000 (give a bit of room to grow)
And makes ALLOC_INCR a variable that is bigger, and doubles every time
it is used, to attenuate the problem if there ever ends up being 1
entries in magic.

Results on a 90Mhz Pentium:

new verson

time ./file ./file
./file: FreeBSD/i386 compact demand paged dynamically linked executable
not stripped
0.14 real 0.11 user 0.02 sys

old verson:

./file: FreeBSD/i386 compact demand paged dynamically linked executable
not stripped
0.79 real 0.60 user 0.16 sys




--
Peter.



Peter Jeremy wrote:
 
 Ville-Pertti Keinonen w...@iki.fi wrote:
 jere...@gsmx07.alcatel.com.au (Peter Jeremy) writes:
  I can't believe these figures.
 
 Based on the figures below, maybe I was overly hasty in this statement.
 The changes between 2.x and 3.x magic files have far more impact than
 I would have expected.
 
 What are your results, then?
 
 All timings with everything cached (although the 386 only has 8MB
 which limits the cacheability).  For the 2.2.5 systems, I give timings
 with both the 2.2.5 magic and the 4.0 magic (which is the same as
 3.2-RELEASE, in /tmp).
 
 i386SX-25 running 2.2.5 (roughly as posted earlier):
 % /usr/bin/time file src/Z/dhcp-2.0b1pl26.tar.gz
 src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: 
 Thu Jan  1 10:00:00 1970, os: Unix
 2.82 real 1.92 user 0.84 sys
 % /usr/bin/time file -m /tmp/magic src/Z/dhcp-2.0b1pl26.tar.gz
 src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: 
 Thu Jan  1 10:00:00 1970, os: Unix
 4.05 real 2.67 user 1.23 sys
 
 486DX2-50 running 2.2.5:
 % /usr/bin/time file src/Z/dhcp-3.0-alpha-19990423.tar.gz
 src/Z/dhcp-3.0-alpha-19990423.tar.gz: gzip compressed data, deflated, last 
 modified: Thu Jan  1 10:00:00 1970, os: Unix
 1.43 real 0.96 user 0.38 sys
 % /usr/bin/time file -m /tmp/magic src/Z/dhcp-3.0-alpha-19990423.tar.gz
 src/Z/dhcp-3.0-alpha-19990423.tar.gz: gzip compressed data, deflated, last 
 modified: Thu Jan  1 10:00:00 1970, os: Unix
 2.15 real 1.62 user 0.44 sys
 
 PII-266 running 4.0-CURRENT:
 % /usr/bin/time file src/Z/dhcp-1.4.0p6.tar.gz
 src/Z/dhcp-1.4.0p6.tar.gz: gzip compressed data, deflated, last modified: Wed 
 Mar  3 20:57:52 1999, os: Unix
 0.13 real 0.09 user 0.03 sys
 
 When I profile file in a slow system (like a 386 or 486), there is an
 obvious performance bottleneck:  The problem is the memcpy() invoked
 from fgets().  The only solution would seem to be to mmap() magic
 and parse it, rather than using fgets() to read it.  This bottleneck
 will also be far more obvious on bandwidth-starved systems (like
 386SX and 486DX2/4), whereas virtually the whole thing fits into the
 L2 cache on my P-II.
 
 Peter
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-hackers in the body of the messageCommon subdirectories: file/Magdir and file.new/Magdir
diff -c file/apprentice.c file.new/apprentice.c
*** file/apprentice.c   Wed Jan 28 07:36:21 1998
--- file.new/apprentice.c   Wed Jul 21 12:35:21 1999
***
*** 50,55 
--- 50,56 
  static void eatsize   __P((char **));
  
  static int maxmagic = 0;
+ static int alloc_incr = 256;
  
  static int apprentice_1   __P((char *, int));
  
***
*** 180,188 
struct magic *m;
char *t, *s;
  
- #define ALLOC_INCR20
if (nd+1 = maxmagic){
!   maxmagic += ALLOC_INCR;
if ((magic = (struct magic *) realloc(magic, 
  sizeof(struct magic) * 
  maxmagic)) == NULL) {
--- 181,188 
struct magic *m;
char *t, *s;
  
if (nd+1 = maxmagic){
!   maxmagic += alloc_incr;
if ((magic = (struct magic *) realloc(magic, 
  sizeof(struct magic) * 
  maxmagic)) == NULL) {
***
*** 192,198 
else
exit(1);
}
!   memset(magic[*ndx], 0, sizeof(struct magic) * ALLOC_INCR);
}
m = magic[*ndx];
m-flag = 0;
--- 192,199 
else
exit(1);
}
!   memset(magic[*ndx], 0, sizeof(struct magic) * alloc_incr);
!   alloc_incr *= 2;
}
m = magic[*ndx];
m-flag = 0;
diff -c file/file.h file.new/file.h
*** file/file.h Wed Jul 21 12:37:00 1999
--- file.new/file.h Wed Jul 21 12:35:40 1999
***
*** 

Re: speed of file(1)

1999-07-21 Thread Matthew Dillon
Nice rundown of the problem!

I presume someone is going to commit this...

-Matt
Matthew Dillon 
dil...@backplane.com

:*** file/apprentice.c  Wed Jan 28 07:36:21 1998
:--- file.new/apprentice.c  Wed Jul 21 12:35:21 1999
:***
:*** 50,55 
:--- 50,56 
:  static void eatsize  __P((char **));
:  
:  static int maxmagic = 0;
:...


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



RE: speed of file(1)

1999-07-21 Thread Charles Randall
When this gets committed, can it be applied to both the 3.x and 4.x trees?

Thanks,
Charles

-Original Message-
From: Peter Edwards [mailto:peter.edwa...@isocor.ie]
Sent: Wednesday, July 21, 1999 5:55 AM
To: Peter Jeremy
Cc: w...@iki.fi; hack...@freebsd.org
Subject: Re: speed of file(1)


A quick look at the source reveals:

A MAXMAGIS constant in file.h that estimates a limit of 1000 lines in
magic. (The real number is 4802)

An array sized on MAXMAGIS, that is reallocated every ALLOC_INTR lines
of magic once MAXMAGIS is exceeded.

The patch updates MAXMAGIS to 5000 (give a bit of room to grow)
And makes ALLOC_INCR a variable that is bigger, and doubles every time
it is used, to attenuate the problem if there ever ends up being 1
entries in magic.

Results on a 90Mhz Pentium:

new verson

time ./file ./file
./file: FreeBSD/i386 compact demand paged dynamically linked executable
not stripped
0.14 real 0.11 user 0.02 sys

old verson:

./file: FreeBSD/i386 compact demand paged dynamically linked executable
not stripped
0.79 real 0.60 user 0.16 sys




--
Peter.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-21 Thread Peter Jeremy
I wrote:
 Looking at ktrace with MALLOC_OPTIONS=U, it does do a lot of
 realloc()ing (once for every 20 active lines in .../magic) and sbrk()s
 to a maximum size of ~390KB - not really significant.

and in a later message:

 When I profile file in a slow system (like a 386 or 486), there is an
 obvious performance bottleneck:  The problem is the memcpy() invoked
 from fgets().

Peter Edwards peter.edwa...@isocor.ie wrote:
A MAXMAGIS constant in file.h that estimates a limit of 1000 lines in
magic. (The real number is 4802)

An array sized on MAXMAGIS, that is reallocated every ALLOC_INTR lines
of magic once MAXMAGIS is exceeded.

That'll teach me to rely on the output from gprof.  I thought that
gprof's claim (repeated by me) that the apprentice_1() - fgets() -
memcpy() chain was taking all the time looked dubious, but it was
consistent across several systems (and it was late at night for me).
(Anyone want to adapt the profiling code so that it correctly
apportions time between callers, rather than just using number of
calls?)

My earlier statement about lots of realloc's, together with the
(accurate) datapoint that memcpy() was very slow should have led me
to Peter Edwards fix.  It also explains why the 3.2/4.x magic file
(which has only about 20% more lines) takes 50% longer to start up
(continually reallocing to increase an array size is O(N^2)).

Congratulations to Peter Edwards.

Peter


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-21 Thread Wes Peters
Matthew Dillon wrote:
 
 Nice rundown of the problem!
 
 I presume someone is going to commit this...
 
 -Matt
 Matthew Dillon
 dil...@backplane.com
 
 :*** file/apprentice.c  Wed Jan 28 07:36:21 1998
 :--- file.new/apprentice.c  Wed Jul 21 12:35:21 1999
 :***
 :*** 50,55 
 :--- 50,56 
 :  static void eatsize  __P((char **));
 :
 :  static int maxmagic = 0;
 :...

I will, if nobody else has.  Are there any other committers following this
thread?

-- 
Where am I, and what am I doing in this handbasket?

Wes Peters Softweyr LLC
http://softweyr.com/   w...@softweyr.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-21 Thread Wes Peters
Matthew Dillon wrote:
 
 Nice rundown of the problem!
 
 I presume someone is going to commit this...

OK, I've got it on freefall, ready to roll, and building on my 3.2-STABLE
system here.  I'll commit it as soon as *I've* seen it work, if somebody
doesn't beat me to the punch.  ;^)

Thanks for the patch, Peter.

-- 
Where am I, and what am I doing in this handbasket?

Wes Peters Softweyr LLC
http://softweyr.com/   w...@softweyr.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Sv: speed of file(1)

1999-07-20 Thread Warner Losh

Maybe the P60 is memory starved.  Thrashing would cause this huge
factor of speed difference...

Warner



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Sv: speed of file(1)

1999-07-20 Thread Warner Losh
Maybe the P60 is memory starved.  Thrashing would cause this huge
factor of speed difference...

Warner



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Sv: speed of file(1)

1999-07-20 Thread Matthew Dillon

:
:Maybe the P60 is memory starved.  Thrashing would cause this huge
:factor of speed difference...
:
:Warner

No, I tested it on my 1G box - there was a very noticeable delay running
'file' on a simple text file.  Something in the file program or in the
data description is causing the file program to eat cpu but I'm not really
interested in spending hours tracking it down.

If someone wants to work on the problem, I recommend editing the magic
file to home in on the cause.  I did a quick 'cut magic file in half'
test and the time went from 0.09 seconds to 0.01 seconds, so I believe
there is something in there that is causing the problem.

-Matt
Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-20 Thread Peter Jeremy
Leif Neland le...@neland.dk wrote:
My 60MHz Pentium, FreeBSD

time file /usr/home/leif/vnc-3.3.2r
/usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
original filename, last modified: Thu Jan 21 19:23:21 1999

real0m1.237s
user0m0.758s
sys 0m0.394s

I can't believe these figures.

Matthew Dillon dil...@apollo.backplane.com wrote:
Someone would have to compare file sources or profile it to figure out 
what is causing the slowness. 

I can't reproduce the complaint using a 64MB PII-266 running -CURRENT -
there's no evidence of lack of speed, and profiling file(1) doesn't
show any anomolies.

It is somewhat slower on my 8MB 386, but not unreasonably so:
pc0640% time file src/Z/dhcp-2.0b1pl26.tar.gz 
src/Z/dhcp-2.0b1pl26.tar.gz: gzip compressed data, deflated, last modified: Thu 
Jan  1 10:00:00 1970, os: Unix
file src/Z/dhcp-2.0b1pl26.tar.gz  1.96s user 0.83s system 98% cpu 2.823 total
pc0640% 

Note that this is somewhat more than twice the time Leif claimed for
his P-60 - and a P-60 should be more than twice the speed of a
386SX-25.

Unfortunately, I can't profile it on my 386: It's running 2.x and I
deleted the profiling libraries due to lack of space.  It will happily
run the profiled ELF file(1), but doesn't generate any timing data.

Looking at ktrace with MALLOC_OPTIONS=U, it does do a lot of
realloc()ing (once for every 20 active lines in .../magic) and sbrk()s
to a maximum size of ~390KB - not really significant.

All I can think of is that Leif has a problem with his P-60 system.

Peter


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



speed of file(1)

1999-07-19 Thread Leif Neland

While trying to port amavis, the virusscanner for mail,
 http://aachalon.de/AMaViS/amavis-0.2.0-pre4.tar.gz ) I noticed it used the
file(1) several times for each file, and it took rather long time, causing
bb to report red for high CPU-load each time I collected a batch of mail.

So I compared it with a Linux box:

My 60MHz Pentium, FreeBSD

time file /usr/home/leif/vnc-3.3.2r
/usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
original filename, last modified: Thu Jan 21 19:23:21 1999

real0m1.237s
user0m0.758s
sys 0m0.394s

133MHz Pentium II, Linux

time  file vnc-3.3.2r3_unixsrc.tgz
vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated, original filename,
last modified: Thu Jan 21 19:23:21 1999, os: Unix

real0m0.036s
user0m0.010s
sys 0m0.030s

While I realise 60MHz is less than 133MHz, a factor 34 in difference of real
time seems suspect.

The magic file is different, but almost the same size.

Why is FreeBSD's file so much slower?

Leif




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: speed of file(1)

1999-07-19 Thread Matthew Dillon

Check the size of the magic files on your FreeBSD and Linux boxen.
file was never really designed to be efficient.  FreeBSD's magic
file is /usr/share/misc/magic - around 164K.

-Matt

:
:While trying to port amavis, the virusscanner for mail,
: http://aachalon.de/AMaViS/amavis-0.2.0-pre4.tar.gz ) I noticed it used the
:file(1) several times for each file, and it took rather long time, causing
:bb to report red for high CPU-load each time I collected a batch of mail.
:
:So I compared it with a Linux box:
:
:My 60MHz Pentium, FreeBSD
:
:time file /usr/home/leif/vnc-3.3.2r
:/usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
:original filename, last modified: Thu Jan 21 19:23:21 1999
:
:real0m1.237s
:user0m0.758s
:sys 0m0.394s
:
:133MHz Pentium II, Linux
:
:time  file vnc-3.3.2r3_unixsrc.tgz
:vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated, original filename,
:last modified: Thu Jan 21 19:23:21 1999, os: Unix
:
:real0m0.036s
:user0m0.010s
:sys 0m0.030s
:
:While I realise 60MHz is less than 133MHz, a factor 34 in difference of real
:time seems suspect.
:
:The magic file is different, but almost the same size.
:
:Why is FreeBSD's file so much slower?
:
:Leif
:
:
:
:
:To Unsubscribe: send mail to [EMAIL PROTECTED]
:with "unsubscribe freebsd-hackers" in the body of the message
:

Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Sv: speed of file(1)

1999-07-19 Thread Leif Neland

From: Matthew Dillon [EMAIL PROTECTED]

 Check the size of the magic files on your FreeBSD and Linux boxen.
 file was never really designed to be efficient.  FreeBSD's magic
 file is /usr/share/misc/magic - around 164K.
 
 -Matt
 
 :
  :
 :The magic file is different, but almost the same size.
 :
Leif



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



speed of file(1)

1999-07-19 Thread Leif Neland
While trying to port amavis, the virusscanner for mail,
 http://aachalon.de/AMaViS/amavis-0.2.0-pre4.tar.gz ) I noticed it used the
file(1) several times for each file, and it took rather long time, causing
bb to report red for high CPU-load each time I collected a batch of mail.

So I compared it with a Linux box:

My 60MHz Pentium, FreeBSD

time file /usr/home/leif/vnc-3.3.2r
/usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
original filename, last modified: Thu Jan 21 19:23:21 1999

real0m1.237s
user0m0.758s
sys 0m0.394s

133MHz Pentium II, Linux

time  file vnc-3.3.2r3_unixsrc.tgz
vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated, original filename,
last modified: Thu Jan 21 19:23:21 1999, os: Unix

real0m0.036s
user0m0.010s
sys 0m0.030s

While I realise 60MHz is less than 133MHz, a factor 34 in difference of real
time seems suspect.

The magic file is different, but almost the same size.

Why is FreeBSD's file so much slower?

Leif




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-19 Thread Matthew Dillon
Check the size of the magic files on your FreeBSD and Linux boxen.
file was never really designed to be efficient.  FreeBSD's magic
file is /usr/share/misc/magic - around 164K.

-Matt

:
:While trying to port amavis, the virusscanner for mail,
: http://aachalon.de/AMaViS/amavis-0.2.0-pre4.tar.gz ) I noticed it used the
:file(1) several times for each file, and it took rather long time, causing
:bb to report red for high CPU-load each time I collected a batch of mail.
:
:So I compared it with a Linux box:
:
:My 60MHz Pentium, FreeBSD
:
:time file /usr/home/leif/vnc-3.3.2r
:/usr/home/leif/vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated,
:original filename, last modified: Thu Jan 21 19:23:21 1999
:
:real0m1.237s
:user0m0.758s
:sys 0m0.394s
:
:133MHz Pentium II, Linux
:
:time  file vnc-3.3.2r3_unixsrc.tgz
:vnc-3.3.2r3_unixsrc.tgz: gzip compressed data, deflated, original filename,
:last modified: Thu Jan 21 19:23:21 1999, os: Unix
:
:real0m0.036s
:user0m0.010s
:sys 0m0.030s
:
:While I realise 60MHz is less than 133MHz, a factor 34 in difference of real
:time seems suspect.
:
:The magic file is different, but almost the same size.
:
:Why is FreeBSD's file so much slower?
:
:Leif
:
:
:
:
:To Unsubscribe: send mail to majord...@freebsd.org
:with unsubscribe freebsd-hackers in the body of the message
:

Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Sv: speed of file(1)

1999-07-19 Thread Leif Neland
From: Matthew Dillon dil...@apollo.backplane.com

 Check the size of the magic files on your FreeBSD and Linux boxen.
 file was never really designed to be efficient.  FreeBSD's magic
 file is /usr/share/misc/magic - around 164K.
 
 -Matt
 
 :
  :
 :The magic file is different, but almost the same size.
 :
Leif



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-19 Thread Wes Peters
Matthew Dillon wrote:
 
 Check the size of the magic files on your FreeBSD and Linux boxen.
 file was never really designed to be efficient.  FreeBSD's magic
 file is /usr/share/misc/magic - around 164K.

The Linux one 169350 bytes, 4891 lines.  The FreeBSD 3.1 magic file is
164223 bytes, 4802 lines.

 Leif Neland asked:
 
 :While trying to port amavis, the virusscanner for mail,
 : http://aachalon.de/AMaViS/amavis-0.2.0-pre4.tar.gz ) I noticed it used the
 :file(1) several times for each file, and it took rather long time, causing

This begs the question Why?  Can't the program cache the results of file(1)
instead of calling it multiple times?

Premature optimization is the root of all evil.


-- 
Where am I, and what am I doing in this handbasket?

Wes Peters Softweyr LLC
http://softweyr.com/   w...@softweyr.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: speed of file(1)

1999-07-19 Thread Matthew Dillon

:
:The Linux one 169350 bytes, 4891 lines.  The FreeBSD 3.1 magic file is
:164223 bytes, 4802 lines.
:
: Leif Neland asked:
: 
: :While trying to port amavis, the virusscanner for mail,
: : http://aachalon.de/AMaViS/amavis-0.2.0-pre4.tar.gz ) I noticed it used the
: :file(1) several times for each file, and it took rather long time, causing
:
:This begs the question Why?  Can't the program cache the results of file(1)
:instead of calling it multiple times?
:
:Premature optimization is the root of all evil.
:
:
:-- 
:Where am I, and what am I doing in this handbasket?
:
:Wes Peters Softweyr LLC
:http://softweyr.com/   
w...@softweyr.com

Someone would have to compare file sources or profile it to figure out 
what is causing the slowness. 

-Matt
Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message