Re: [fpc-pascal] fast text processing

2007-11-01 Thread Flávio Etrusco
What about the TRegExpr library? http://www.regexpstudio.com/TRegExpr/TRegExpr.html The license is MIT-like... Cheers, Flávio On 10/31/07, Bee [EMAIL PROTECTED] wrote: Well, considering that perl's or Python's speed greatly rely on the underlying C-implementation of (at least) particular

Re: [fpc-pascal] fast text processing

2007-11-01 Thread Flávio Etrusco
Recent versions of JCL include a pcre header, too. Even a .obj is provided for linking statically :-) -Flávio On 10/31/07, Jeff Pohlmeyer [EMAIL PROTECTED] wrote: the easiest is simply making a FPC header to pcre, it might be useful even. I did that, once upon a time...

Re: [fpc-pascal] fast text processing

2007-11-01 Thread L
No more strlen: http://www.hu.freepascal.org/fpcircbot/cgipastebin?msgid=1432 This doesn't work if you have spaces in front of the tags sometag sometag I'm not sure if the Perl one fails too though. I don't have perl installed and can't test it ;-) A real parser doesn't

Re: [fpc-pascal] fast text processing

2007-11-01 Thread Vincent Snijders
Daniël Mantione schreef: Op Wed, 31 Oct 2007, schreef Vincent Snijders: Florian Klaempfl schreef: Vincent Snijders schrieb: Why not SetLength(s,i)? StrLen is _very_ expensive. I don't see a way how another #0 can be before. No more strlen:

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
Or even better, give a clear problem description. TASKS: First, is to count number of words inside the document. Second, is to count number of unique words inside the document. INPUT: Document format is using HTML-like format for storing articles. Here's the format: DOC (contains an

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Bee schreef: TStrings is meant for GUI purposes. It's design and implementation are not optimized. Then its our task to optimize it more (and more) so it could as fast as Perl. What I meant here is using standard or default FPC's classes/units. ;) I don't count any third parties

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
TStrings is meant for GUI purposes. It's design and implementation are not optimized. Then its our task to optimize it more (and more) so it could as fast as Perl. What I meant here is using standard or default FPC's classes/units. ;) I don't count any third parties class/units/library or

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Marco van de Voort
I had never used Perl before. Until someone showed me Perl is very fast for text processing (using its powerful regex), despite it's an interpreted language. It even beat Delphi and FPC though both are compiled language. A few lines Perl program almost two times faster than a few pages

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Marco van de Voort
TStrings is meant for GUI purposes. It's design and implementation are not optimized. Then its our task to optimize it more (and more) so it could as fast as Perl. What I meant here is using standard or default FPC's classes/units. ;) I don't count any third parties class/units/library

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Bee schreef: If you just want to show off, the easiest is simply making a FPC header to pcre, it might be useful even. Making a header on top of other libraries written by other languages is showing that FPC is lack of powerful units/libraries. For some particular cases maybe it's alright.

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Jonas Maebe
On 31 Oct 2007, at 10:40, Bee wrote: If you just want to show off, the easiest is simply making a FPC header to pcre, it might be useful even. Making a header on top of other libraries written by other languages is showing that FPC is lack of powerful units/libraries. It merely means

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
If you just want to show off, the easiest is simply making a FPC header to pcre, it might be useful even. Making a header on top of other libraries written by other languages is showing that FPC is lack of powerful units/libraries. For some particular cases maybe it's alright. For example

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
To give you a head start, check out: http://svn.freepascal.org/svn/fpc/trunk/packages/base/regexpr/ You read my mind! I've just been thinking about to use fpc's regex unit. :D It needs testing and fixing. How is it compare with regex that comes from FPC 2.0.4? I still can't upgrade to fpc

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Jonas Maebe
On 31 Oct 2007, at 11:00, Bee wrote: It merely means you don't want to waste time on rewriting a perfectly good library simply because of abstract language purity reasons. It is completely irrelevant in what language a library is written if it works well with your program and does what

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
It merely means you don't want to waste time on rewriting a perfectly good library simply because of abstract language purity reasons. It is completely irrelevant in what language a library is written if it works well with your program and does what you want. Yes. If C is very good, I won't

[fpc-pascal] fast text processing

2007-10-31 Thread Jeff Pohlmeyer
Heck, I'm not even a programmer, and this kludge is about 25% faster than your perl script on my machine program koleski; {$MODE OBJFPC} {$H+} uses classes, strings; var f: text; s:ansistring; wc:longint=0; wl, ul:TStringList; i,n:LongInt; begin assign(f, 'Koleksi.dat');

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
Heck, I'm not even a programmer, and this kludge is about 25% faster than your perl script on my machine Nope. It's still more or less twice slower. :-D [EMAIL PROTECTED]:~$ time perl koleksi.perl Word count: 126944 Unique word count: 11793 real0m0.208s user0m0.204s sys

[fpc-pascal] fast text processing

2007-10-31 Thread Jeff Pohlmeyer
this kludge is about 25% faster than your perl script on my machine Nope. It's still more or less twice slower. :-D I guess it depends on the hardware: % time koleksi.pl # perl Word count: 126944 Unique word count: 11793 real0m1.019s user0m0.992s sys 0m0.028s % time

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Jeff Pohlmeyer schreef: this kludge is about 25% faster than your perl script on my machine Nope. It's still more or less twice slower. :-D I guess it depends on the hardware: % time koleksi.pl # perl Word count: 126944 Unique word count: 11793 real0m1.019s user0m0.992s sys

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
AMD-K6-700 / SuSE-10.3 / Linux-2.6.22 / perl-5.8.8 / fpc-2.2.0 Probably because the different fpc version, no? I'm using fpc 2.0.4. However, this is a good news. :) -Bee- has Bee.ography at: http://beeography.wordpress.com ___ fpc-pascal

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Marco van de Voort
If you just want to show off, the easiest is simply making a FPC header to pcre, it might be useful even. Someone had worked on it using Delphi7 and PCRE. A little optimization is done on the TStringList, it uses CompareText instead of CompareStr for text comparison. It does faster than

[fpc-pascal] fast text processing

2007-10-31 Thread Jeff Pohlmeyer
the easiest is simply making a FPC header to pcre, it might be useful even. I did that, once upon a time... http://www.hotlinkfiles.com/files/526004_qpvr0/pcre-fpc.tar.gz -Jeff ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
I made it three times as fast on my computer (windows 2000, fpc 2.3.1, P4 1.5 Ghz) using a hashlist for the unique word count. Using a larger textbuf gave an additional 10% speed up: Arrrggg, I hate myself for not able to upgrade to fpc v.2.2.0! I can't find TFPStringHashTable on fpc

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Graeme Geldenhuys
On 31/10/2007, Bee [EMAIL PROTECTED] wrote: Vincent said it was 3 times faster. I expected the result would be about 0.10s. Or am I wrong? Maybe that's machine dependent I'll try the one without the hash table to see the difference. Otherwise, lets just compare the sys time and say it's

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Bee schreef: [EMAIL PROTECTED]:word_parser$ time ./project1 Word count:126944 Unique word count:11793 real0m0.185s user0m0.140s sys 0m0.000s [EMAIL PROTECTED]:word_parser$ time perl project1.perl Word count: 126944 Unique word count: 11793 real0m0.281s user0m0.244s sys

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Bee schreef: [EMAIL PROTECTED]:word_parser$ time ./project1 Word count:126944 Unique word count:11793 real0m0.185s user0m0.140s sys 0m0.000s [EMAIL PROTECTED]:word_parser$ time perl project1.perl Word count: 126944 Unique word count: 11793 real0m0.281s user0m0.244s sys

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Graeme Geldenhuys
On 31/10/2007, Vincent Snijders [EMAIL PROTECTED] wrote: Maybe I have a relatively slow computer, so I get more speedup. Keep in mind, that disk time is constant. I'm also not sure if FPC compiler parameters where used. I did. I compiled as:fpc project1.pas Anyway, here is the Hash

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Graeme Geldenhuys
Well done Vincent!! :-) I can confirm your results... [EMAIL PROTECTED]:word_parser$ time ./project1 Word count:126944 Unique word count:11793 real0m0.185s user0m0.140s sys 0m0.000s [EMAIL PROTECTED]:word_parser$ time perl project1.perl Word count: 126944 Unique word count: 11793

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
[EMAIL PROTECTED]:word_parser$ time ./project1 Word count:126944 Unique word count:11793 real0m0.185s user0m0.140s sys 0m0.000s [EMAIL PROTECTED]:word_parser$ time perl project1.perl Word count: 126944 Unique word count: 11793 real0m0.281s user0m0.244s sys 0m0.016s

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Marco van de Voort
On 31/10/2007, Vincent Snijders [EMAIL PROTECTED] wrote: Maybe I have a relatively slow computer, so I get more speedup. Keep in mind, that disk time is constant. I'm also not sure if FPC compiler parameters where used. I did. I compiled as:fpc project1.pas It could be wise to

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
I had never used Perl before. Until someone showed me Perl is very fast for text processing (using its powerful regex), despite it's an interpreted language. It even beat Delphi and FPC though both are compiled language. A few lines Perl program almost two times faster than a few pages pascal

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Marco van de Voort
[ Charset ISO-8859-1 unsupported, converting... ] I had never used Perl before. Until someone showed me Perl is very fast for text processing (using its powerful regex), despite it's an interpreted language. It even beat Delphi and FPC though both are compiled language. A few lines Perl

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Graeme Geldenhuys
On 31/10/2007, Marco van de Voort [EMAIL PROTECTED] wrote: It could be wise to add -O3 for anything considered a benchmark :-) It squeezed another 0.015s out of the time making it even faster. :-) Regards, - Graeme - ___ fpGUI - a

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Graeme Geldenhuys
On 31/10/2007, Bee [EMAIL PROTECTED] wrote: Alright everyone. The answers are enough. Now I can say confidently that pascal (fpc v.2.2) is FASTER than Perl, including in text processing. ;) Thank you very much to everyone who involves in this thread. :) That was fun!!! :-0 So what's the

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Marco van de Voort schreef: [ Charset ISO-8859-1 unsupported, converting... ] I had never used Perl before. Until someone showed me Perl is very fast for text processing (using its powerful regex), despite it's an interpreted language. It even beat Delphi and FPC though both are compiled

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Bee schreef: But people who have seen http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdnalang=all may have doubted that. Vincent, are we connected or what?! I was about to post the very exact URL! :-D I wonder where fpc would end up in that list, A: if it uses its own

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Henry Vermaak
On 31/10/2007, Graeme Geldenhuys [EMAIL PROTECTED] wrote: That was fun!!! :-0 So what's the task for tomorrow? ;-) for tomorrow the homework is: improve fpc regexp capability and get fpc in the top 5 on the regex-dna shootout ratings ;) Regards, - Graeme - henry

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
Sure, but as Jonas pointed out it is better to use a good library than the write a bad library yourself. And someone would claim that the speed comes from the library (c?), not from pascal. :P It's a LANGUAGE shootout btw, not LIBRARY shootout. Maybe you had forgotten that. ;) -Bee- has

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Bee schreef: Sure, but as Jonas pointed out it is better to use a good library than the write a bad library yourself. And someone would claim that the speed comes from the library (c?), not from pascal. :P It's a LANGUAGE shootout btw, not LIBRARY shootout. Maybe you had forgotten that. ;)

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vinzent Hoefler
On Wednesday 31 October 2007 14:19, Bee wrote: Sure, but as Jonas pointed out it is better to use a good library than the write a bad library yourself. And someone would claim that the speed comes from the library (c?), not from pascal. :P It's a LANGUAGE shootout btw, not LIBRARY

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
A. That is not a problem. A couple of other, non-c participants use pcre. I myself would consider them lacks of the language advantage on their own. B. The same is done with pidigits. C. It shows that fpc has good interoperability. We could submit more than one programs, one using fpc own

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Bee
Well, considering that perl's or Python's speed greatly rely on the underlying C-implementation of (at least) particular functionality, all those comparisons basically boil down to C vs. C, anyway. Or don't they? Exactly! That's why I prefer to use fpc own regexpr unit. :) -Bee- has

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Florian Klaempfl
Vincent Snijders schrieb: Jeff Pohlmeyer schreef: this kludge is about 25% faster than your perl script on my machine Nope. It's still more or less twice slower. :-D I guess it depends on the hardware: % time koleksi.pl # perl Word count: 126944 Unique word count: 11793 real

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Florian Klaempfl schreef: Vincent Snijders schrieb: Jeff Pohlmeyer schreef: s[i]:=#0; SetLength(s,StrLen(@s[1])); Why not SetLength(s,i)? StrLen is _very_ expensive. I don't see a way how another #0 can be before. That is right, I am working on a version which does not

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Mattias Gaertner
On Wed, 31 Oct 2007 13:56:04 +0100 (CET) [EMAIL PROTECTED] (Marco van de Voort) wrote: [ Charset ISO-8859-1 unsupported, converting... ] I had never used Perl before. Until someone showed me Perl is very fast for text processing (using its powerful regex), despite it's an interpreted

Re: [fpc-pascal] fast text processing

2007-10-31 Thread L
Well, considering that perl's or Python's speed greatly rely on the underlying C-implementation of (at least) particular functionality, all those comparisons basically boil down to C vs. C, anyway. Or don't they? Exactly! That's why I prefer to use fpc own regexpr unit. :) -Bee-

Re: [fpc-pascal] fast text processing

2007-10-31 Thread L
Word count: 126944 Unique word count: 11793 real0m0.281s user0m0.244s sys 0m0.016s Can someone do a test for 5 minutes of parsing and see if things slow down or speed up for one of the programs? That takes away process load time too.. example: the time it takes to fork

Re: [fpc-pascal] fast text processing

2007-10-31 Thread L
Word count: 126944 Unique word count: 11793 real0m0.281s user0m0.244s sys 0m0.016s Can someone do a test for 5 minutes of parsing Rather I mean can someone do a more realistic test such as parsing 1500 files instead of one single file. At least in my line of

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Daniël Mantione
Op Wed, 31 Oct 2007, schreef Vincent Snijders: Florian Klaempfl schreef: Vincent Snijders schrieb: Why not SetLength(s,i)? StrLen is _very_ expensive. I don't see a way how another #0 can be before. No more strlen: http://www.hu.freepascal.org/fpcircbot/cgipastebin?msgid=1432 One

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Vincent Snijders
Florian Klaempfl schreef: Vincent Snijders schrieb: Why not SetLength(s,i)? StrLen is _very_ expensive. I don't see a way how another #0 can be before. No more strlen: http://www.hu.freepascal.org/fpcircbot/cgipastebin?msgid=1432 Vincent ___

Re: [fpc-pascal] fast text processing

2007-10-31 Thread Graeme Geldenhuys
On 01/11/2007, Vincent Snijders [EMAIL PROTECTED] wrote: No more strlen: http://www.hu.freepascal.org/fpcircbot/cgipastebin?msgid=1432 Wow, that version improved quite a bit from the previous one!! [EMAIL PROTECTED]:word_parser$ time ./project1_fast Word count:126944 Unique word count:11793

[fpc-pascal] fast text processing

2007-10-30 Thread Bee
Hi all, I had never used Perl before. Until someone showed me Perl is very fast for text processing (using its powerful regex), despite it's an interpreted language. It even beat Delphi and FPC though both are compiled language. A few lines Perl program almost two times faster than a few

Re: [fpc-pascal] fast text processing

2007-10-30 Thread L
Hi all, I had never used Perl before. Until someone showed me Perl is very fast for text processing (using its powerful regex), despite it's an interpreted language. It even beat Delphi and FPC though both are compiled language. A few lines Perl program almost two times faster than a few

Re: [fpc-pascal] fast text processing

2007-10-30 Thread Bee
Give us a test case (some example source code) and I will beat the living crap out of any perl script. Perl is built using Cee, so anything Perl can do Cee can do better.. which means Pascal can do better or similar. Perl is not written in Perl. In other words, perl is just a wrapper around the

Re: [fpc-pascal] fast text processing

2007-10-30 Thread Vincent Snijders
Bee schreef: The pascal counter-part resulting almost twice slower. Though not as simple as Perl, the pascal code is quite simple and only using standar fpc's units. But, I won't post the code here to not influence your logic. ;) For me it would be better if you posted the pascal program,