Hi David, I'm technical but unfortunately I haven't got any experience on PERL etc.
Are you meaning that there is a method of inserting the PERL lines into the Channel Configuration setup for the Channel on the Plucker Desktop or do I have to do this another manual way? The download for me was about 52 minutes but it's the parsing which took the 7-8 hours. I using a P3-750Mhz 256Meg RAM so I assume the hard disk must be dog slow! Is there a way of preventing the cache being cleared so that I can run the PERL commands on the downloaded pages before they get parsed, rather than have to download it multiple times? Can I run the plucker-build executable from a DOS window to create the PDB using the syntax below? plucker-build -H "http://www.godrules.net/library/SAmerican/treasury/treasury.htm" --staybelo w="http://www.godrules.net/library/SAmerican/treasury/" --maxdepth=2 --zlib- compression -f pdbfilename Don't bother if it'll take too long to explain I'll put up with the text being small but if it's a straight forward process for a layman like me to run the PERL scripts I'd really appreciate the steps so I can give it a try. Or if you could point me to a FAQ which would help I'd appreciate it. Thanks again for your help. Paul > It works fine (although it took about 8 hours to parse and compile!!) I just parsed it, using the url you pasted in a follow-up message, and it only took me 52.5 minutes to grab all 1,250 links and parse them into the single pdb. I was on a dual PIII/600 on a fairly slow DSL connection. 8 hours sounds extremely excessive. Are you sure you weren't parsing lots of redundant or offsite links? I used the following syntax: $ time plucker-build \ -H "http://www.godrules.net/library/SAmerican/treasury/treasury.htm" \ --staybelow="http://www.godrules.net/library/SAmerican/treasury/" \ --maxdepth=2 --zlib-compression -f GodRules real 52m31.555s user 35m20.210s sys 0m10.900s The resulting PDB was: -rw-r--r-- 1 hacker users 5137483 2002-12-11 07:01 GodRules.pdb I see the problem you're seeing, and there definately are <pre> tags in the source. Look closely at the following url, just after the closing </h1> after the "TSK - GENESIS 1" part: http://www.godrules.net/library/SAmerican/treasury/treasurygen1.htm Looks like: <b><h1><i>TSK - GENESIS 1</i></h1><pre> ^ ding! ..and at the end of the page: <!-- God Rules.NET--></pre> ^ ding! So you can remove those with two quick perl one-liners, but the output may not be what you expect: perl -pi.orig -e 's,<pre>,,g' *htm perl -pi.orig -e ,s,</pre>,,g' *htm Good luck! d. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.427 / Virus Database: 240 - Release Date: 06/12/2002 _______________________________________________ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

