Re: [Chicken-users] Chicken vs Perl
2011/9/20 Sascha Ziemann : > > $ dd if=/dev/zero bs=1M count=100 | od -xv | cat.scm > /dev/null > 100+0 records in > 100+0 records out > 104857600 bytes (105 MB) copied, 36.9156 s, 2.8 MB/s > > With cat.scm being this: > > #! /usr/local/bin/csi -s > > (let next-line ((line (read-line))) > (if (not (eof-object? line)) > (begin > (printf "~a\n" line) > (next-line (read-line) > printf seems to be quite slow. display and newline perform much better: 104857600 bytes (105 MB) copied, 14.7021 s, 7.1 MB/s Gambit-C isn't better either: 104857600 bytes (105 MB) copied, 18.9284 s, 5.5 MB/s ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
2011/9/20 Daishi Kato : > > My guess is that read-line is slower than <> in perl. > (I think <> is so optimized in perl.) Yes this is one reason. I tried this: $ dd if=/dev/zero bs=1M count=100 | od -xv | cat > /dev/null 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 7.69538 s, 13.6 MB/s $ dd if=/dev/zero bs=1M count=100 | od -xv | perl -pe 'print $_;' > /dev/null 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 8.1591 s, 12.9 MB/s $ dd if=/dev/zero bs=1M count=100 | od -xv | cat.scm > /dev/null 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 36.9156 s, 2.8 MB/s With cat.scm being this: #! /usr/local/bin/csi -s (let next-line ((line (read-line))) (if (not (eof-object? line)) (begin (printf "~a\n" line) (next-line (read-line) But it is only about 5 times slower and not 30 times like my original program. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
On Tue, Sep 20, 2011 at 03:54:32PM +0200, Sascha Ziemann wrote: > 2011/9/20 Christian Kellermann : > > > > You can add -profile to csc's options. If you need any eggs and > > want those profiled too, recompile them also with -profile. > > How to do that? > > I have installed them with chicken-install. As far as I can see there > are no options to specify compilation options. The environment variable CSC_OPTIONS controls how csc works, even when invoked through chicken-install, so just do $ CSC_OPTIONS=-profile chicken-install [SOME-EGG] Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
Hi, Sascha! > So the questions are: > > - What is wrong with the Chicken code? Nothing. > - Why is there no difference between csi and csc? Because most work is done in regex handling and library code. Your loop doesn't take a significant part of the processing time. Note that Perl has been tuned beyond comprehension for jobs like this, tuned in dark and evil ways that we don't want to know about. CHICKEN's regular expression library on the other hand is very portable and highlevel code and we (and the original author in particular) have not yet had the time to start seriously tuning it - and there are many many opportunities for doing so. cheers, felix ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] for-each and mismatching list lengths
Christian Kellermann scripsit: > SRFI-1's map allows this so (use srfi-1) if you need it. For what it's worth, R7RS changes map and for-each to work like SRFI 1, so I would advise fixing this. -- John Cowan co...@ccil.org http://ccil.org/~cowan Assent may be registered by a signature, a handshake, or a click of a computer mouse transmitted across the invisible ether of the Internet. Formality is not a requisite; any sign, symbol or action, or even willful inaction, as long as it is unequivocally referable to the promise, may create a contract. --Specht v. Netscape ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
2011/9/20 Christian Kellermann : > > You can add -profile to csc's options. If you need any eggs and > want those profiled too, recompile them also with -profile. How to do that? I have installed them with chicken-install. As far as I can see there are no options to specify compilation options. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
At Tue, 20 Sep 2011 15:18:30 +0200, Peter Bex wrote: > > On Tue, Sep 20, 2011 at 10:08:16PM +0900, Daishi Kato wrote: > > Hi, > > > > My situation is pretty similar to yours, meaning I used to use Perl > > and later started using Chicken for my job. > > > > Running your scripts on my machine produced similar result > > (about 10 times difference). > > > > -unsafe option in csc-4.6.0 didn't work (no change). > > -unsafe-libraries in csc-4.0.0 did work (a little faster), > > but it's not available in csc-4.6.0 (does anybody know why?). > > > > I also tried with csc-4.7.0, and guess what, it's a little slower > > (at least on my test data. I partially crawled wiki.call-cc.org). > > Peter, how could this happen? > > This probably depends on the nature of your regex. We made the > tradeoff that large consecutive ranges of characters are stored > more efficiently as a range instead of as separate characters. > This means that if you are using a regex with many separate chars > it could be slightly slower. > > In some cases regexes can't be compiled to DFA but need to use > backtracking, which is comparatively slow. That's not the case > in Sascha's regex (I checked), but might be the reason it's slow > for you. I would like to note that I used the script that Sascha posted. Let me try with a somewhat larger test set. % wget -r -l2 http://wiki.call-cc.org/manual/index % du -s wiki.call-cc.org 2528wiki.call-cc.org % time /usr/local/chicken-4.7.0/bin/csi -s a.scm wiki.call-cc.org > /dev/null /usr/local/chicken-4.7.0/bin/csi -s a.scm wiki.call-cc.org > /dev/null 3.88s user 0.16s system 85% cpu 4.708 total % time /usr/local/chicken-4.6.0/bin/csi -s a.scm wiki.call-c c.org > /dev/null /usr/local/chicken-4.6.0/bin/csi -s a.scm wiki.call-cc.org > /dev/null 3.13s user 0.13s system 86% cpu 3.771 total % grep regexp a.scm (define href (regexp "href=\"(http://[^\"/?]+)([\"/?].*)" #t)) % uname -r -v -o 2.6.39-2-686-pae #1 SMP Tue Jul 5 03:48:49 UTC 2011 GNU/Linux Best, Daishi ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
On Tue, Sep 20, 2011 at 10:08:16PM +0900, Daishi Kato wrote: > Hi, > > My situation is pretty similar to yours, meaning I used to use Perl > and later started using Chicken for my job. > > Running your scripts on my machine produced similar result > (about 10 times difference). > > -unsafe option in csc-4.6.0 didn't work (no change). > -unsafe-libraries in csc-4.0.0 did work (a little faster), > but it's not available in csc-4.6.0 (does anybody know why?). > > I also tried with csc-4.7.0, and guess what, it's a little slower > (at least on my test data. I partially crawled wiki.call-cc.org). > Peter, how could this happen? This probably depends on the nature of your regex. We made the tradeoff that large consecutive ranges of characters are stored more efficiently as a range instead of as separate characters. This means that if you are using a regex with many separate chars it could be slightly slower. In some cases regexes can't be compiled to DFA but need to use backtracking, which is comparatively slow. That's not the case in Sascha's regex (I checked), but might be the reason it's slow for you. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
2011/9/20 Alan Post : > > It looks like you have a copy-and-paste error here? Yes it looks like. But this should be past error bullet proof: $ for EXT in .pl .scm "" ; do file ../../bin/grep-domains$EXT ; time ../../bin/grep-domains$EXT | md5sum ; done ../../bin/grep-domains.pl: a /usr/bin/perl script text executable 03dce8cb0dc986f0188df99c0bb23f24 - real0m1.835s user0m1.628s sys 0m0.156s ../../bin/grep-domains.scm: a /usr/local/bin/csi -s script text executable 03dce8cb0dc986f0188df99c0bb23f24 - real1m34.311s user1m30.014s sys 0m0.768s ../../bin/grep-domains: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped 03dce8cb0dc986f0188df99c0bb23f24 - real1m0.481s user0m58.332s sys 0m0.600s ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
Hi, My situation is pretty similar to yours, meaning I used to use Perl and later started using Chicken for my job. Running your scripts on my machine produced similar result (about 10 times difference). -unsafe option in csc-4.6.0 didn't work (no change). -unsafe-libraries in csc-4.0.0 did work (a little faster), but it's not available in csc-4.6.0 (does anybody know why?). I also tried with csc-4.7.0, and guess what, it's a little slower (at least on my test data. I partially crawled wiki.call-cc.org). Peter, how could this happen? My guess is that read-line is slower than <> in perl. (I think <> is so optimized in perl.) This is just my guess and there's no guarantee, but how about comparing with using read-all in chicken and $/=undef in perl? Best, Daishi At Tue, 20 Sep 2011 14:11:41 +0200, Sascha Ziemann wrote: > > I tried to use Chicken for a job I would use normally Perl for to find > out whether Chicken might be a useful alternative. > > The job is: go through a web site mirror and report a unique list of > all domains from all hrefs. > > This is the my Perl version: > > #! /usr/bin/perl > > use warnings; > use strict; > use File::Find; > > my $dir = $ARGV[0] || '.'; > my @files; > my %urls; > > find ({wanted => sub { push @files, $_ if -f $_; }, >no_chdir => 1}, $dir); > > foreach my $file (@files) { > open (HTML, $file) || die "Can not open file '$file'"; > while () { > while (/href="(http:\/\/[^"\/?]+)(["\/?].*)/i) { > $urls{lc $1} = 1; > $_ = $2; } } > close (HTML); } > > foreach my $url (sort keys %urls) { > print $url, "\n"; } > > The Perl version takes for my test tree about two seconds: > > real 0m1.810s > user 0m1.664s > sys 0m0.140s > > And this is my Chicken version: > > #! /usr/local/bin/csi -s > > (require-extension posix regex srfi-69) > > (define dir (let ((args (command-line-arguments))) > (if (pair? args) > (car args) > "."))) > (define files (find-files dir regular-file?)) > (define urls (make-hash-table)) > (define href (regexp "href=\"(http://[^\"/?]+)([\"/?].*)" #t)) > > (for-each > (lambda (filename) >(with-input-from-file filename > (lambda () >(let next-line ((line (read-line))) > (if (not (eof-object? line)) > (let next-href ((found (string-search href line))) >(if found >(begin > (hash-table-set! urls (string-downcase (cadr found)) #t) > (next-href (string-search href (caddr found) >(next-line (read-line > files) > > (for-each > (lambda (arg) >(printf "~a\n" arg)) > (sort (hash-table-keys urls) string > And now hold on tight! It takes more than one minute for the same data: > > real 1m16.540s > user 1m14.849s > sys 0m0.664s > > And there is almost no significant performance boost by compiling it: > > real 0m1.810s > user 0m1.664s > sys 0m0.140s > > So the questions are: > > - What is wrong with the Chicken code? > - How can I profile the code? > - Why is there no difference between csi and csc? > > ___ > Chicken-users mailing list > Chicken-users@nongnu.org > https://lists.nongnu.org/mailman/listinfo/chicken-users ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
2011/9/20 Peter Bex : > > Also, you didn't say which site it was. The testset itself may also be > an important factor. aldi.us About 187 megs of html, gif, jpg, swf and pdf. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
* Alan Post [110920 14:31]: > It looks like you have a copy-and-paste error here? It would appear > that compiling the code makes it precisely as fast as perl. ;-) Heh, well spotted. -- Who can (make) the muddy water (clear)? Let it be still, and it will gradually become clear. Who can secure the condition of rest? Let movement go on, and the condition of rest will gradually arise. -- Lao Tse. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
2011/9/20 Peter Bex : > The most important question is: which version of Chicken is this? > There have been massive optimizations done to irregex (the regex > engine used in Chicken) between 4.6.0 and 4.7.0 csi -version reports this: Version 4.7.0 linux-unix-gnu-x86-64 [ 64bit manyargs dload ptables ] I compiled it a few days ago on a 64 bit Debian Lenny. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
On Tue, Sep 20, 2011 at 02:11:41PM +0200, Sascha Ziemann wrote: > The Perl version takes for my test tree about two seconds: > > real 0m1.810s > user 0m1.664s > sys 0m0.140s > [snip] > > And now hold on tight! It takes more than one minute for the same data: > > real 1m16.540s > user 1m14.849s > sys 0m0.664s > > And there is almost no significant performance boost by compiling it: > > real 0m1.810s > user 0m1.664s > sys 0m0.140s > It looks like you have a copy-and-paste error here? It would appear that compiling the code makes it precisely as fast as perl. ;-) -Alan -- .i ma'a lo bradi cu penmi gi'e du ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
On Tue, Sep 20, 2011 at 02:11:41PM +0200, Sascha Ziemann wrote: > The job is: go through a web site mirror and report a unique list of > all domains from all hrefs. Also, you didn't say which site it was. The testset itself may also be an important factor. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
* Sascha Ziemann [110920 14:12]: > - How can I profile the code? > - Why is there no difference between csi and csc? You can add -profile to csc's options. If you need any eggs and want those profiled too, recompile them also with -profile. This will place a file PROFILE. in your current directory. With 'chicken-profile' you can then view the results of one run. Hope this helps, Christian -- Who can (make) the muddy water (clear)? Let it be still, and it will gradually become clear. Who can secure the condition of rest? Let movement go on, and the condition of rest will gradually arise. -- Lao Tse. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Chicken vs Perl
On Tue, Sep 20, 2011 at 02:11:41PM +0200, Sascha Ziemann wrote: > I tried to use Chicken for a job I would use normally Perl for to find > out whether Chicken might be a useful alternative. A great test! > And now hold on tight! It takes more than one minute for the same data: > > real 1m16.540s > user 1m14.849s > sys 0m0.664s > > And there is almost no significant performance boost by compiling it: > > real 0m1.810s > user 0m1.664s > sys 0m0.140s The most important question is: which version of Chicken is this? There have been massive optimizations done to irregex (the regex engine used in Chicken) between 4.6.0 and 4.7.0 > So the questions are: > > - What is wrong with the Chicken code? At first glance it looks fine. > - How can I profile the code? Build it with "csc -profile ...", then run it. It will produce a profile file which you can read with "chicken-profile". > - Why is there no difference between csi and csc? Probably because the inefficiency is in irregex, which is already compiled; the bottleneck is not in your code, so making it faster by compiling it won't help. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
[Chicken-users] Chicken vs Perl
I tried to use Chicken for a job I would use normally Perl for to find out whether Chicken might be a useful alternative. The job is: go through a web site mirror and report a unique list of all domains from all hrefs. This is the my Perl version: #! /usr/bin/perl use warnings; use strict; use File::Find; my $dir = $ARGV[0] || '.'; my @files; my %urls; find ({wanted => sub { push @files, $_ if -f $_; }, no_chdir => 1}, $dir); foreach my $file (@files) { open (HTML, $file) || die "Can not open file '$file'"; while () { while (/href="(http:\/\/[^"\/?]+)(["\/?].*)/i) { $urls{lc $1} = 1; $_ = $2; } } close (HTML); } foreach my $url (sort keys %urls) { print $url, "\n"; } The Perl version takes for my test tree about two seconds: real0m1.810s user0m1.664s sys 0m0.140s And this is my Chicken version: #! /usr/local/bin/csi -s (require-extension posix regex srfi-69) (define dir (let ((args (command-line-arguments))) (if (pair? args) (car args) "."))) (define files (find-files dir regular-file?)) (define urls (make-hash-table)) (define href (regexp "href=\"(http://[^\"/?]+)([\"/?].*)" #t)) (for-each (lambda (filename) (with-input-from-file filename (lambda () (let next-line ((line (read-line))) (if (not (eof-object? line)) (let next-href ((found (string-search href line))) (if found (begin (hash-table-set! urls (string-downcase (cadr found)) #t) (next-href (string-search href (caddr found) (next-line (read-line files) (for-each (lambda (arg) (printf "~a\n" arg)) (sort (hash-table-keys urls) stringhttps://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] for-each and mismatching list lengths
* Sascha Ziemann [110920 12:17]: > This throws an error: > > (for-each (lambda (a b) > (printf "~s ~s\n" a b)) > (list 1 2 3 0) > (list 4 5 6)) > > But this does not: > > (for-each (lambda (a b) > (printf "~s ~s\n" a b)) > (list 1 2 3) > (list 4 5 6 0)) > > Is this a bug or feature? > > Guile throws out-of-range for both. R5RS specifies: If more than one list is given, then they must all be the same length. If you give it lists of different lengths, the effect is unspecified. SRFI-1's map allows this so (use srfi-1) if you need it. Cheers, Christian -- Who can (make) the muddy water (clear)? Let it be still, and it will gradually become clear. Who can secure the condition of rest? Let movement go on, and the condition of rest will gradually arise. -- Lao Tse. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
[Chicken-users] for-each and mismatching list lengths
This throws an error: (for-each (lambda (a b) (printf "~s ~s\n" a b)) (list 1 2 3 0) (list 4 5 6)) But this does not: (for-each (lambda (a b) (printf "~s ~s\n" a b)) (list 1 2 3) (list 4 5 6 0)) Is this a bug or feature? Guile throws out-of-range for both. ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Code for parsing multipart/form-data
Thanks all, for the links, will look into them. On Tue, Sep 20, 2011 at 12:40 PM, Peter Bex wrote: > On Tue, Sep 20, 2011 at 08:49:24AM +0530, Santosh Rajan wrote: > > Hi, > > > > I am looking for chicken code for parsing multipart/form-data. Can anyone > > point me to the code please? Sure it must be there somewhere, at least in > > the web server code. > > Unfortunately there isn't any currently because it's something nobody > has needed yet, and doing this properly and elegantly is not easy. > > There was a "http-form-posts" egg which used a port of Gauche's MIME > code for Chicken 3 which is still available from svn at > > https://anonym...@code.call-cc.org/svn/chicken-eggs/release/3/http-server-form-posts > > And Alex Shinn's "hato" library also includes some MIME handling code > you might want to use: http://synthcode.com/scheme/hato > (note that the link to the docs are wrong, they should point to > http://synthcode.com/scheme/hato/doc/hato-manual.html) > > The latest release of the http-client egg (v0.5, released a few days ago) > contains some hacky code to deal with multipart form posts, though. > > Cheers, > Peter > -- > http://sjamaan.ath.cx > -- > "The process of preparing programs for a digital computer > is especially attractive, not only because it can be economically > and scientifically rewarding, but also because it can be an aesthetic > experience much like composing poetry or music." >-- Donald Knuth > -- http://about.me/santosh.rajan ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Code for parsing multipart/form-data
On Tue, Sep 20, 2011 at 4:10 PM, Peter Bex wrote: > > And Alex Shinn's "hato" library also includes some MIME handling code > you might want to use: http://synthcode.com/scheme/hato > (note that the link to the docs are wrong, they should point to > http://synthcode.com/scheme/hato/doc/hato-manual.html) The link from http://synthcode.com/scheme/hato/ is correct. I don't know where you found the link without the trailing slash, but if you navigate from the top page all the links should work. [I should change the bad link to a redirect though.] -- Alex ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] Code for parsing multipart/form-data
On Tue, Sep 20, 2011 at 08:49:24AM +0530, Santosh Rajan wrote: > Hi, > > I am looking for chicken code for parsing multipart/form-data. Can anyone > point me to the code please? Sure it must be there somewhere, at least in > the web server code. Unfortunately there isn't any currently because it's something nobody has needed yet, and doing this properly and elegantly is not easy. There was a "http-form-posts" egg which used a port of Gauche's MIME code for Chicken 3 which is still available from svn at https://anonym...@code.call-cc.org/svn/chicken-eggs/release/3/http-server-form-posts And Alex Shinn's "hato" library also includes some MIME handling code you might want to use: http://synthcode.com/scheme/hato (note that the link to the docs are wrong, they should point to http://synthcode.com/scheme/hato/doc/hato-manual.html) The latest release of the http-client egg (v0.5, released a few days ago) contains some hacky code to deal with multipart form posts, though. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users