On Wed, Dec 19, 2007 at 12:34:02AM +1100, Rick Welykochy wrote:
> >No sir! But shell usually wins.
>
> On my 1 GHz / 1 GB powerbook, the python one-liner
> I just submitted runs 5 x faster than the original.
I think C usually wins, the version below is 25 times faster than the
python version (from disk cache).
[EMAIL PROTECTED]:~$ ls -lh /tmp/randomcommas
-rw-r--r-- 1 ianw ianw 65M 2007-12-19 14:30 /tmp/randomcommas
[EMAIL PROTECTED]:~$ /usr/bin/time ./comma < /tmp/randomcommas
commas: 1287100
0.07user 0.04system 0:00.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+126minor)pagefaults 0swaps
[EMAIL PROTECTED]:~$ /usr/bin/time python -Sc "import sys; print
sum(l.count(',') for l in sys.stdin)" < /tmp/randomcommas
1287100
2.68user 0.13system 0:02.84elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8659minor)pagefaults 0swaps
I'd guess the Python version is spending that time doing some extra
copying because it causes a lot of page faults is really cache
unfriendly.
Python
Instructions retired per L1 data cache access: 11.03
Instructions retired per L2 data cache access: 24.16
C
Instructions retired per L1 data cache access: 6.01
Instructions retired per L2 data cache access: 366.92
-i
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#define CHUNK 16384
char buf[CHUNK];
int main(int argc, char *argv[])
{
unsigned long count = 0;
ssize_t len;
int fd = 0;
if (argc != 1)
fd = open(argv[1], O_RDONLY);
if (fd == -1) {
printf("blah: %s\n", strerror(errno));
exit(-1);
}
while ( (len = read(fd, buf, CHUNK)) != 0 )
{
int i;
for (i=0; i < len; i++)
if (buf[i] == ',') count++;
}
printf("commas: %lu\n", count);
return 0;
}
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html