Hi David,

I'm technical but unfortunately I haven't got any experience on PERL etc.

Are you meaning that there is a method of inserting the PERL lines into the
Channel Configuration setup for the Channel on the Plucker Desktop or do I
have to do this another manual way?

The download for me was about 52 minutes but it's the parsing which took the
7-8 hours. I using a P3-750Mhz 256Meg RAM so I assume the hard disk must be
dog slow! Is there a way of preventing the cache being cleared so that I can
run the PERL commands on the downloaded pages before they get parsed, rather
than have to download it multiple times?

Can I run the plucker-build executable from a DOS window to create the PDB
using the syntax below?

plucker-build -H
"http://www.godrules.net/library/SAmerican/treasury/treasury.htm"; --staybelo
w="http://www.godrules.net/library/SAmerican/treasury/"; --maxdepth=2 --zlib-
compression -f pdbfilename

Don't bother if it'll take too long to explain I'll put up with the text
being small but if it's a straight forward process for a layman like me to
run the PERL scripts I'd really appreciate the steps so I can give it a try.
Or if you could point me to a FAQ which would help I'd appreciate it.

Thanks again for your help.

Paul







> It works fine (although it took about 8 hours to parse and compile!!)

        I just parsed it, using the url you pasted in a follow-up message,
and it only took me 52.5 minutes to grab all 1,250 links and parse them into
the single pdb. I was on a dual PIII/600 on a fairly slow DSL connection. 8
hours sounds extremely excessive. Are you sure you weren't parsing lots of
redundant or offsite links?

        I used the following syntax:

$ time plucker-build \
-H "http://www.godrules.net/library/SAmerican/treasury/treasury.htm"; \
--staybelow="http://www.godrules.net/library/SAmerican/treasury/";    \
--maxdepth=2 --zlib-compression -f GodRules

real    52m31.555s
user    35m20.210s
sys     0m10.900s

        The resulting PDB was:

-rw-r--r--    1 hacker     users      5137483 2002-12-11 07:01 GodRules.pdb

        I see the problem you're seeing, and there definately are <pre>
tags in the source. Look closely at the following url, just after the
closing </h1> after the "TSK - GENESIS 1" part:

http://www.godrules.net/library/SAmerican/treasury/treasurygen1.htm

        Looks like:

<b><h1><i>TSK - GENESIS 1</i></h1><pre>
                                  ^ ding!

        ..and at the end of the page:

<!-- God Rules.NET--></pre>
                     ^ ding!

        So you can remove those with two quick perl one-liners, but the
output may not be what you expect:

        perl -pi.orig -e 's,<pre>,,g' *htm
        perl -pi.orig -e ,s,</pre>,,g' *htm

        Good luck!


d.

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.427 / Virus Database: 240 - Release Date: 06/12/2002

_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to