Re: [SLUG] Tuesday afternoon shell command optimisation party!
This one time, at band camp, Alex Samad wrote: On Tue, Dec 18, 2007 at 11:57:39PM +1100, Jamie Wilkinson wrote: This one time, at band camp, Jeff Waugh wrote: quote who=Robert Thorsby sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Something like the following might be close: awk 'BEGIN{FS=,}{$0~,$:i=i+NF?i=i+NF-1}END{print(i)}' input.txt Close in what sense, the syntax error, the length, or the output? ;-) Why *are* you using the g option to sed's search and replace? g - global (?) means to do it more than one time on the line Right, yes, but I didn't see why he wanted to do a global search and replace on non commas originally, I read that character class incorrectly the first time :) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
This one time, at band camp, Norman Gaywood wrote: On Wed, Dec 19, 2007 at 12:46:51PM +1100, Scott Ragen wrote: [EMAIL PROTECTED] wrote on 19/12/2007 11:34:30 AM: Norman Gaywood wrote: perl -00 -ne 'print tr/,//' input.txt I nominate the perl soln as the winner so far: runs like a bat of out hell and is the most easy to understand. And the shortest in source code size. I have to disagree. Whilst it may be fast, its not 100% correct. Most of the time it would probably work, but if there are any blank lines, it outputs the current count, and starts again. Consider the following file contents: --file contents-- this,is,the,first,line this,is,the,second the,above,was,a,blank,line and,another,blank,line --end file contents-- On Jeff's original command: sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m 15 The perl command: perl -00 -ne 'print tr/,//' input.txt 753 You are correct. I misread the perlrun man page. -00 means paragraph mode. I wanted slurp mode, which is the slightly uglier -0777. So the perl solution should be: perl -0777 -ne 'print tr/,//' input.txt There is also the slightly shorter, tending to perl ugly instead of perl neat: perl -0777 -pe '$_=tr/,//' input.txt If that's more readable and reducing human cost over people cost, I quit. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
This one time, at band camp, Jeff Waugh wrote: quote who=Jamie Wilkinson sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! You want to count the total number of characters in a file, not including newlines, that are on lines that don't start with a comma. That's an... interesting... reading of the command. Clearly this should be a Tuesday afternoon only activity rather than a Tuesday-almost-Wednesday after a few drinks activity. ;-) Thanks for the concession :) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
This one time, at band camp, Ian Wienand wrote: I'd guess the Python version is spending that time doing some extra copying because it causes a lot of page faults is really cache unfriendly. Python Instructions retired per L1 data cache access: 11.03 Instructions retired per L2 data cache access: 24.16 C Instructions retired per L1 data cache access: 6.01 Instructions retired per L2 data cache access: 366.92 I love Ians posts only because he includes L1 and L2 cache hits in every one. If only you would share the command that gave you these numbers. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Thu, Dec 20, 2007 at 09:22:49PM +1100, Jamie Wilkinson wrote: I love Ians posts only because he includes L1 and L2 cache hits in every one. If only you would share the command that gave you these numbers. The numbers come from the CPU performance counters. I use the perfmon tools [1] to get at these. This project should (one day) make it into the standard kernel as the interface to performance management units provided by current CPUs. One extremely compelling reason to use Itanium is the excellent HP Caliper [2]. It monitors a range of useful metrics and presents them in a report, which is much easier than measuring them all by hand and manually correlating. AFAIK there isn't anything like this for x86 so far, so you're left to get familiar with [3]. -i [1] http://perfmon2.sourceforge.net/ [2] http://www.hp.com/go/caliper [3] http://www.intel.com/design/processor/manuals/253669.pdf -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Thu, Dec 20, 2007 at 09:18:06PM +1100, Jamie Wilkinson wrote: This one time, at band camp, Alex Samad wrote: On Tue, Dec 18, 2007 at 11:57:39PM +1100, Jamie Wilkinson wrote: This one time, at band camp, Jeff Waugh wrote: quote who=Robert Thorsby sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Something like the following might be close: awk 'BEGIN{FS=,}{$0~,$:i=i+NF?i=i+NF-1}END{print(i)}' input.txt Close in what sense, the syntax error, the length, or the output? ;-) Why *are* you using the g option to sed's search and replace? g - global (?) means to do it more than one time on the line Right, yes, but I didn't see why he wanted to do a global search and replace on non commas originally, I read that character class incorrectly the first time :) start with line abcde,bedgf,kasdas,ccc, remove all non , then you can wc -c ( or wc -m) for count number of bytes ( or chars) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html signature.asc Description: Digital signature -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Tue, 2007-12-18 at 17:55 +1100, Peter Hardy wrote: I've got a sixpack of beer for a working PostScript variant. :-) drop this in a file called count.ps %!PS /count { 0 { currentfile read { (,) 0 get eq { 1 add } if } { 20 string cvs print (\n) print stop } ifelse } loop } def count and then use the following command cat count.ps input.txt | gs - and, yes, I tested it before posting. Cascade Premium, please. Regards Peter Miller [EMAIL PROTECTED] /\/\*http://miller.emu.id.au/pmiller/ PGP public key ID: 1024D/D0EDB64D fingerprint = AD0A C5DF C426 4F03 5D53 2BDB 18D8 A4E2 D0ED B64D See http://www.keyserver.net or any PGP keyserver for public key. DRM doesn't anger consumers, content owners abusing DRM angers consumers. signature.asc Description: This is a digitally signed message part -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Dec 18, 2007 5:21 PM, Peter Hardy [EMAIL PROTECTED] wrote: On Tue, 2007-12-18 at 16:09 +1100, Jeff Waugh wrote: Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! How do you want it optimised? Well, for readability, speed and code size this handy BF program is easily the winner: ,[[-+++][--]+[[-]-][-+],]. Keep in mind it's also optimised for usability, so the output is the ascii value of the number rather than the number itself. It only handles up to 255 commas (unless you have a unicode BF interpreter.) HTH, Sam -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
This one time, at band camp, Jeff Waugh wrote: Hi all, Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! You want to count the total number of characters in a file, not including newlines, that are on lines that don't start with a comma. Does it have to be in shell? :-) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
This one time, at band camp, Jeff Waugh wrote: quote who=Robert Thorsby sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Something like the following might be close: awk 'BEGIN{FS=,}{$0~,$:i=i+NF?i=i+NF-1}END{print(i)}' input.txt Close in what sense, the syntax error, the length, or the output? ;-) Why *are* you using the g option to sed's search and replace? Oh I see, I misread your caret. Nevermind. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
This one time, at band camp, Peter Miller wrote: Cascade Premium, please. Zing! -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=Jamie Wilkinson sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! You want to count the total number of characters in a file, not including newlines, that are on lines that don't start with a comma. That's an... interesting... reading of the command. Clearly this should be a Tuesday afternoon only activity rather than a Tuesday-almost-Wednesday after a few drinks activity. ;-) Does it have to be in shell? :-) No sir! But shell usually wins. - Jeff -- GNOME.conf.au 2008: Melbourne, Australia http://live.gnome.org/Melbourne2008 ASCII stupid question, get a stupid ANSI. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
Jeff Waugh wrote: Does it have to be in shell? :-) No sir! But shell usually wins. On my 1 GHz / 1 GB powerbook, the python one-liner I just submitted runs 5 x faster than the original. But what does 'shell' really mean? python,perl, etc. are external to the shell as surely as sed,tr,wc. cheers rick -- _ Rick Welykochy || Praxis Services People who enjoy eating sausage and obey the law should not watch either being made. -- Otto von Bismarck -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
Jamie Wilkinson wrote: This one time, at band camp, Jeff Waugh wrote: Hi all, Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! You want to count the total number of characters in a file, not including newlines, that are on lines that don't start with a comma. Does it have to be in shell? :-) How about this. You want to count the total number of commas in a file. python -c import sys; print (''.join(sys.stdin.readlines())).count(',') input.txt cheers rickw -- _ Rick Welykochy || Praxis Services People who enjoy eating sausage and obey the law should not watch either being made. -- Otto von Bismarck -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
Jeff Waugh wrote: Traditionally it has been quick command line combos, but there are always offerings in more conventional and esoteric forms. Such as brainf*ck this time. I enjoyed the bf and postscript the best so far. Any INTERCAL skilz out there? cheers rickw -- _ Rick Welykochy || Praxis Services People who enjoy eating sausage and obey the law should not watch either being made. -- Otto von Bismarck -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=Rick Welykochy Jeff Waugh wrote: Does it have to be in shell? :-) No sir! But shell usually wins. On my 1 GHz / 1 GB powerbook, the python one-liner I just submitted runs 5 x faster than the original. Ah yes, well there are different definitions of optimisation, and all are valid in this game. I was referring to command line length and cleverness though, which is probably a bit exclusive. ;-) But what does 'shell' really mean? python,perl, etc. are external to the shell as surely as sed,tr,wc. Traditionally it has been quick command line combos, but there are always offerings in more conventional and esoteric forms. Such as brainf*ck this time. I posted this as a result of a joke on another mailing list about an amazing abuse of commas from a particular poster, but also because we haven't had one of these threads for ages. It was always fun when Herbert Xu or Gus Lees would smack everyone about with a clever use of cut(1). ;-) - Jeff -- linux.conf.au 2008: Melbourne, Australiahttp://lca2008.linux.org.au/ We are peaking sexually when they are peaking. And two peaks makes a hell of a good mount. - SMH -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
Rick Welykochy wrote: Jamie Wilkinson wrote: This one time, at band camp, Jeff Waugh wrote: Hi all, Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! You want to count the total number of characters in a file, not including newlines, that are on lines that don't start with a comma. Does it have to be in shell? :-) How about this. You want to count the total number of commas in a file. python -c import sys; print (''.join(sys.stdin.readlines())).count(',') input.txt You can simplify that slightly: python -Sc import sys; print sys.stdin.read().count(',') input.txt This way is faster, too :) If you want to avoid reading the whole file into memory: python -Sc import sys; print sum(l.count(',') for l in sys.stdin) input.txt -Andrew. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
Andrew Bennetts wrote: If you want to avoid reading the whole file into memory: python -Sc import sys; print sum(l.count(',') for l in sys.stdin) input.txt Yes. That's much better. Slurping the input into memory doesn't scale well. And they said TIMTOWTDI applied only to perl. cheers rickw -- _ Rick Welykochy || Praxis Services People who enjoy eating sausage and obey the law should not watch either being made. -- Otto von Bismarck -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Tue, Dec 18, 2007 at 11:57:39PM +1100, Jamie Wilkinson wrote: This one time, at band camp, Jeff Waugh wrote: quote who=Robert Thorsby sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Something like the following might be close: awk 'BEGIN{FS=,}{$0~,$:i=i+NF?i=i+NF-1}END{print(i)}' input.txt Close in what sense, the syntax error, the length, or the output? ;-) Why *are* you using the g option to sed's search and replace? g - global (?) means to do it more than one time on the line Oh I see, I misread your caret. Nevermind. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html signature.asc Description: Digital signature -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Dec 18, 2007 4:47 PM, Jeff Waugh [EMAIL PROTECTED] wrote: quote who=Martin Visser perl -e 'while(){$a+=s/[,]//g};print $a\n' input.txt Do I win?? Oddly, perl very rarely wins these. ;-) This must come close: perl -00 -ne 'print tr/,//' input.txt -- Norman Gaywood, Systems Administrator University of New England, Armidale, NSW 2351, Australia -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
Norman Gaywood wrote: Oddly, perl very rarely wins these. ;-) This must come close: perl -00 -ne 'print tr/,//' input.txt Timing test: say the above takes time 1.0 then the following takes time 0.46 ... python -Sc import sys; print sys.stdin.read().count(',') both are much much faster than a sequence of pipes and low-level shell utils. And easier to comprehend. IMHO, the most important part of optimisation these days is comprehensibility: people power costs a lot more than CPU and disks power. I nominate the perl soln as the winner so far: runs like a bat of out hell and is the most easy to understand. And the shortest in source code size. cheers rickw -- _ Rick Welykochy || Praxis Services People who enjoy eating sausage and obey the law should not watch either being made. -- Otto von Bismarck -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
[EMAIL PROTECTED] wrote on 19/12/2007 11:34:30 AM: Norman Gaywood wrote: Oddly, perl very rarely wins these. ;-) This must come close: perl -00 -ne 'print tr/,//' input.txt Timing test: say the above takes time 1.0 then the following takes time 0.46 ... python -Sc import sys; print sys.stdin.read().count(',') both are much much faster than a sequence of pipes and low-level shell utils. And easier to comprehend. IMHO, the most important part of optimisation these days is comprehensibility: people power costs a lot more than CPU and disks power. I nominate the perl soln as the winner so far: runs like a bat of out hell and is the most easy to understand. And the shortest in source code size. I have to disagree. Whilst it may be fast, its not 100% correct. Most of the time it would probably work, but if there are any blank lines, it outputs the current count, and starts again. Consider the following file contents: --file contents-- this,is,the,first,line this,is,the,second the,above,was,a,blank,line and,another,blank,line --end file contents-- On Jeff's original command: sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m 15 The perl command: perl -00 -ne 'print tr/,//' input.txt 753 I'm not skilled enough with perl to correct this, but If the numbers could easily be added together it would work... Cheers, Scott -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Wed, Dec 19, 2007 at 12:46:51PM +1100, Scott Ragen wrote: [EMAIL PROTECTED] wrote on 19/12/2007 11:34:30 AM: Norman Gaywood wrote: perl -00 -ne 'print tr/,//' input.txt I nominate the perl soln as the winner so far: runs like a bat of out hell and is the most easy to understand. And the shortest in source code size. I have to disagree. Whilst it may be fast, its not 100% correct. Most of the time it would probably work, but if there are any blank lines, it outputs the current count, and starts again. Consider the following file contents: --file contents-- this,is,the,first,line this,is,the,second the,above,was,a,blank,line and,another,blank,line --end file contents-- On Jeff's original command: sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m 15 The perl command: perl -00 -ne 'print tr/,//' input.txt 753 You are correct. I misread the perlrun man page. -00 means paragraph mode. I wanted slurp mode, which is the slightly uglier -0777. So the perl solution should be: perl -0777 -ne 'print tr/,//' input.txt There is also the slightly shorter, tending to perl ugly instead of perl neat: perl -0777 -pe '$_=tr/,//' input.txt -- Norman Gaywood, Systems Administrator University of New England, Armidale, NSW 2351, Australia [EMAIL PROTECTED]Phone: +61 (0)2 6773 3337 http://mcs.une.edu.au/~normFax: +61 (0)2 6773 3312 Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
Scott Ragen wrote: I have to disagree. Whilst it may be fast, its not 100% correct. Most of the time it would probably work, but if there are any blank lines, it outputs the current count, and starts again. Consider the following file contents: --file contents-- this,is,the,first,line this,is,the,second the,above,was,a,blank,line and,another,blank,line --end file contents-- On Jeff's original command: sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m 15 The perl command: perl -00 -ne 'print tr/,//' input.txt 753 Change the input line separator to octal 0777: perl -0777 -ne 'print tr/,//' input.txt 15 cheers rickw -- _ Rick Welykochy || Praxis Services People who enjoy eating sausage and obey the law should not watch either being made. -- Otto von Bismarck -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
Norman Gaywood wrote: There is also the slightly shorter, tending to perl ugly instead of perl neat: perl -0777 -pe '$_=tr/,//' input.txt Let's get rid of one character: perl -0777 -pe '$_=y/,//' input.txt cheers rickw -- _ Rick Welykochy || Praxis Services People who enjoy eating sausage and obey the law should not watch either being made. -- Otto von Bismarck -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Wed, Dec 19, 2007 at 12:34:02AM +1100, Rick Welykochy wrote: Jeff Waugh wrote: Does it have to be in shell? :-) No sir! But shell usually wins. On my 1 GHz / 1 GB powerbook, the python one-liner I just submitted runs 5 x faster than the original. But what does 'shell' really mean? python,perl, etc. are external to the shell as surely as sed,tr,wc. Here's a pure shell version (without the overrated quality of being correct). It works by using comma as a field separator then counting the number of fields. Main problem: trailing fields non-empty lines still count, but empty lines don't! IFS=, while read do set -- $REPLY n=$(($n+$#)) done echo $n $ echo ,,, | ./count.sh 3 $ echo ,,, | ./count.sh 7 $ echo 'cow,dog,fish,' | ./count.sh 3 so far so good but: $ echo 'cow,dog,fish' | ./count.sh 3 $ echo 'cow dog fish' | ./count.sh 3 Poo! Could almost certainly be fixed with one or more of test -n, shift and expr but that's enough for me. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Wed, Dec 19, 2007 at 12:34:02AM +1100, Rick Welykochy wrote: No sir! But shell usually wins. On my 1 GHz / 1 GB powerbook, the python one-liner I just submitted runs 5 x faster than the original. I think C usually wins, the version below is 25 times faster than the python version (from disk cache). [EMAIL PROTECTED]:~$ ls -lh /tmp/randomcommas -rw-r--r-- 1 ianw ianw 65M 2007-12-19 14:30 /tmp/randomcommas [EMAIL PROTECTED]:~$ /usr/bin/time ./comma /tmp/randomcommas commas: 1287100 0.07user 0.04system 0:00.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+126minor)pagefaults 0swaps [EMAIL PROTECTED]:~$ /usr/bin/time python -Sc import sys; print sum(l.count(',') for l in sys.stdin) /tmp/randomcommas 1287100 2.68user 0.13system 0:02.84elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8659minor)pagefaults 0swaps I'd guess the Python version is spending that time doing some extra copying because it causes a lot of page faults is really cache unfriendly. Python Instructions retired per L1 data cache access: 11.03 Instructions retired per L2 data cache access: 24.16 C Instructions retired per L1 data cache access: 6.01 Instructions retired per L2 data cache access: 366.92 -i #include stdio.h #include stdlib.h #include sys/types.h #include sys/stat.h #include fcntl.h #include string.h #include errno.h #include unistd.h #define CHUNK 16384 char buf[CHUNK]; int main(int argc, char *argv[]) { unsigned long count = 0; ssize_t len; int fd = 0; if (argc != 1) fd = open(argv[1], O_RDONLY); if (fd == -1) { printf(blah: %s\n, strerror(errno)); exit(-1); } while ( (len = read(fd, buf, CHUNK)) != 0 ) { int i; for (i=0; i len; i++) if (buf[i] == ',') count++; } printf(commas: %lu\n, count); return 0; } -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Wed, Dec 19, 2007 at 02:50:55PM +1100, Ian Wienand wrote: [ C version ] Here's one in lex; ripped off from the flex info page. I'd be interested in its performance compared to straight C. No doubt worse, just curious how much worse. $ cat count.l cat count.l int num_commas = 0; %% , ++num_commas; .|\n {} %% main() { yylex(); printf( # of commas = %d\n, num_commas ); return 0; } $ lex -t count.l count.c $ cc count.c -lfl -o count $ echo dsfsdf | ./count # of commas = 4 -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Wed, Dec 19, 2007 at 02:51:34PM +1100, Matthew Hannigan wrote: Here's one in lex; ripped off from the flex info page. I'd be interested in its performance compared to straight C. No doubt worse, just curious how much worse. Similar to the Python version [EMAIL PROTECTED]:/tmp$ /usr/bin/time ./count ./randomcommas # of commas = 1287100 2.57user 0.02system 0:02.60elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+138minor)pagefaults 0swaps I'm not even going to guess at what that version is actually doing! A quick look says this one is CPU bound, compared to Python which is memory bound. Lex % Cycles lost due to GR/load dependency stalls (lower is better): 0.31 Python % Cycles lost due to GR/load dependency stalls (lower is better): 46.25 The Python spends a lot of time sitting around waiting for data to come from the cache/memory (load dependency stalls). The Lex version doesn't so the extra time can be attributed to CPU work. -i -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
[SLUG] Tuesday afternoon shell command optimisation party!
Hi all, Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! Thanks, - Jeff -- linux.conf.au 2008: Melbourne, Australiahttp://lca2008.linux.org.au/ It will test your head. And your mind. And your brain, too. - Jack Black, School of Rock -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
$quoted_author = Jeff Waugh ; Here's a starting point. What's a more optimal way to perform this task? :-) the first question was what is the task trying to achieve? :) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m you appear to be counting the number of fields in a csv file but unless there is a trailing comma each line worth of data will be 'off by 1' cheers marty -- My Everest is not in Nepal, She's sleeping in the bedroom second right down the hall. Ed Hillary couldn't crack this nut, He'd be hiding in the lounge room with the rest of us. My Everest - Lazy Susan -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=Martin Barry $quoted_author = Jeff Waugh ; Here's a starting point. What's a more optimal way to perform this task? :-) the first question was what is the task trying to achieve? :) That's part of the challenge. sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m you appear to be counting the number of fields in a csv file but unless there is a trailing comma each line worth of data will be 'off by 1' Ah, an interesting thought, but no, it's more of a blunt instrument than that. ;-) - Jeff -- linux.conf.au 2008: Melbourne, Australiahttp://lca2008.linux.org.au/ In addition to these ample facilities, there exists a powerful configuration tool called gcc. - Elliot Hughes, author of lwm -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
perl -e 'while(){$a+=s/[,]//g};print $a\n' input.txt Do I win?? On Dec 18, 2007 4:09 PM, Jeff Waugh [EMAIL PROTECTED] wrote: Hi all, Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! Thanks, - Jeff -- linux.conf.au 2008: Melbourne, Australiahttp://lca2008.linux.org.au/ It will test your head. And your mind. And your brain, too. - Jack Black, School of Rock -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html -- Regards, Martin Martin Visser -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
[EMAIL PROTECTED] wrote on 18/12/2007 04:09:15 PM: Hi all, Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Not the most graceful, but the following seems to work: grep -o ',' input.txt |wc -l Cheers, Scott -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On 18/12/07 16:32:03, Jeff Waugh wrote: That's part of the challenge. sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Something like the following might be close: awk 'BEGIN{FS=,}{$0~,$:i=i+NF?i=i+NF-1}END{print(i)}' input.txt Robert Thorsby Old timers will tell you what a pain unstable was during the new testament transition. -- Jon Corbet on Debian's KJV packages -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Dec 18, 2007 4:35 PM, Martin Visser [EMAIL PROTECTED] wrote: perl -e 'while(){$a+=s/[,]//g};print $a\n' input.txt Do I win?? No. Ironically, your solution is 3 characters longer. :-) Lindsay -- http://slug.org.au/ (the Sydney Linux Users Group) http://holmwood.id.au/~lindsay/ (me) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=Martin Visser perl -e 'while(){$a+=s/[,]//g};print $a\n' input.txt Do I win?? Oddly, perl very rarely wins these. ;-) - Jeff -- linux.conf.au 2008: Melbourne, Australiahttp://lca2008.linux.org.au/ Odd is good by the way. I knew normal in high school and normal hates me. - Mary Gardiner -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On 18/12/2007, at 4:42 PM, Scott Ragen wrote: [EMAIL PROTECTED] wrote on 18/12/2007 04:09:15 PM: Hi all, Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Not the most graceful, but the following seems to work: grep -o ',' input.txt |wc -l Assuming we're using GNU grep we can leave the pipe off: grep -c -o ',' input.txt The '-c' will count the number of matching lines thus negating the need for |wc -l. Similar to the whole 'cat file | grep' argument. -- James smime.p7s Description: S/MIME cryptographic signature -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=Scott Ragen grep -o ',' input.txt |wc -l Oh nice, grep -o! Clever! :-) - Jeff -- GNOME.conf.au 2008: Melbourne, Australia http://live.gnome.org/Melbourne2008 People keep asking me why we aren't married, and he says, 'Every time I am about to ask you, you do something annoying'. - Kate Beckinsale -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=Robert Thorsby sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Something like the following might be close: awk 'BEGIN{FS=,}{$0~,$:i=i+NF?i=i+NF-1}END{print(i)}' input.txt Close in what sense, the syntax error, the length, or the output? ;-) - Jeff -- GNOME.conf.au 2008: Melbourne, Australia http://live.gnome.org/Melbourne2008 I used the word 'infrastructure' when describing her cooking style... and she didn't speak to me for a week. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On 18/12/07 16:45:18, Robert Thorsby wrote: Something like the following might be close: awk 'BEGIN{FS=,}{$0~,$:i=i+NF?i=i+NF-1}END{print(i)}' input.txt Oops, I transposed the : and the ? in the conditional. Just shows what you can do when fingers outpace brain. Robert Thorsby Research is what I'm doing when I don't know what I'm doing. -- von Braun -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=James Gray Not the most graceful, but the following seems to work: grep -o ',' input.txt |wc -l Assuming we're using GNU grep we can leave the pipe off: grep -c -o ',' input.txt Hmm, unfortunately the -c misinterprets the count due to a weird interaction between -c and -o. I wonder if this should be regarded as a bug in GNU grep? $ grep -o ',' input.txt | wc -l 19 $ grep -c -o ',' input.txt 10 - Jeff -- GNOME.conf.au 2008: Melbourne, Australia http://live.gnome.org/Melbourne2008 I get my kicks above the .sigline, sunshine. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Dec 18, 2007 4:53 PM, Jeff Waugh [EMAIL PROTECTED] wrote: quote who=James Gray Not the most graceful, but the following seems to work: grep -o ',' input.txt |wc -l Assuming we're using GNU grep we can leave the pipe off: grep -c -o ',' input.txt Hmm, unfortunately the -c misinterprets the count due to a weird interaction between -c and -o. I wonder if this should be regarded as a bug in GNU grep? Not a bug at all! grep -c only counts the number of matching lines, not the number of occurances of a pattern in a line. Lindsay -- http://slug.org.au/ (the Sydney Linux Users Group) http://holmwood.id.au/~lindsay/ (me) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On 18/12/07 16:50:17, Jeff Waugh wrote: Something like the following might be close: awk 'BEGIN{FS=,}{$0~,$:i=i+NF?i=i+NF-1}END{print(i)}' input.txt Close in what sense, the syntax error, the length, or the output? ;-) - Jeff Syntax error granted -- just keeping you on your toes :-) But, at least, it only uses one process (what ain't perl). Robert Testing? What's that? If it compiles, it is good, if it boots up it is perfect. -- Linus Torvalds -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=Lindsay Holmwood Not a bug at all! grep -c only counts the number of matching lines, not the number of occurances of a pattern in a line. But when you stick -o in there... Hrrrmmm... I even switched their position on the command line to see if that changed the output. Total cargo cult and voodoo, but anyway. ;-) - Jeff -- GNOME.conf.au 2008: Melbourne, Australia http://live.gnome.org/Melbourne2008 Everything I knew about TCP/IP I had downloaded the same day I started hacking the net code. - Alan Cox -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
RE: [SLUG] Tuesday afternoon shell command optimisation party!
Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m For starters, remount the partition containing input.txt with the noatime option and disable trackerd. :) Then, change the '*' to a '\+' in your regex. This saved about 30% CPU time on a 2Mb sample. - Rog -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Tue, 2007-12-18 at 16:09 +1100, Jeff Waugh wrote: Here's a starting point. What's a more optimal way to perform this task? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! How do you want it optimised? grep -o is the most readable. But the fastest I've found so far is cat input.txt | tr -d '\n' | tr ',' '\n' | wc -l -- Pete -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Dec 18, 2007 4:35 PM, Martin Visser [EMAIL PROTECTED] wrote: perl -e 'while(){$a+=s/[,]//g};print $a\n' input.txt Ruby version: ruby -e p IO.read('input.txt').count(',') Lindsay -- http://slug.org.au/ (the Sydney Linux Users Group) http://holmwood.id.au/~lindsay/ (me) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
[EMAIL PROTECTED] wrote on 18/12/2007 05:21:35 PM: On Tue, 2007-12-18 at 16:09 +1100, Jeff Waugh wrote: Here's a starting point. What's a more optimal way to perform thistask? :-) sed 's#[^,]*##g' input.txt | tr -d '\n' | wc -m Tuesday afternoon shell optimisation party! How do you want it optimised? grep -o is the most readable. But the fastest I've found so far is cat input.txt | tr -d '\n' | tr ',' '\n' | wc -l This seems to work too: cat input.txt |tr -dC ',' |wc -c Cheers, Scott -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On Tue, 2007-12-18 at 17:05 +1100, Lindsay Holmwood wrote: On Dec 18, 2007 4:35 PM, Martin Visser [EMAIL PROTECTED] wrote: perl -e 'while(){$a+=s/[,]//g};print $a\n' input.txt Ruby version: I've got a sixpack of beer for a working PostScript variant. :-) -- Pete -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
quote who=Peter Hardy How do you want it optimised? It doesn't matter either way -- almost all claims in these threads are educational in some form or another! :-) - Jeff -- GNOME.conf.au 2008: Melbourne, Australia http://live.gnome.org/Melbourne2008 Blessed are the cracked, for they let in the light. - Spike Milligan -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
On 18/12/07 17:33:25, Scott Ragen wrote: This seems to work too: cat input.txt |tr -dC ',' |wc -c Use redirection to eliminate cat tr -dC ',' input.txt | wc -c Robert Thorsby Let me know if you don't receive this message. -- email signature tagline -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Tuesday afternoon shell command optimisation party!
tr -dc ',' input.txt | wc -c will count the number of commas in the input file. Peter C -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html