Re: Looking for a gatherer.

Ian King Tue, 30 Jan 2001 03:39:36 -0800

http://www.microsoft.com/BackStage/bkst_column_13.htm


It is quite configurable and extensible.  (Note: it doesn't run on 98.)
-- Ian

-----Original Message-----
From: Jim MacDiarmid [mailto:[EMAIL PROTECTED]]
Sent: Monday, January 22, 2001 6:22 AM
To: [EMAIL PROTECTED]
Subject: Re: Looking for a gatherer.


Is there anything like this that would run on a Windows 98 or NT
platform?

> Jim MacDiarmid, Senior Software Engineer
> PACEL Corp.
> 8870 Rixlew Lane
> Manassas, VA 20109
> (703) 257-4759
> FAX:  (703) 361-6706
> www.pacel.com
>
>
>
> -----Original Message-----
> From: Simon Wilkinson [SMTP:[EMAIL PROTECTED]]
> Sent: Sunday, January 14, 2001 4:37 PM
> To:   [EMAIL PROTECTED]
> Subject:      Re: Looking for a gatherer.
>
> > I am looking for a spider/gatherer with the following
characteristics:
> >     * Enables the control of the crawling process by URL
> substring/regexp
> > and HTML context of the link.
> >     * Enables the control of the gathering (i.e. saving) processes
by
> URL
> > substring/regexp, MIME type, other header information and ideally by
> some
> > predicates on the HTML source.
> >     * Some way to save page/document metadata, ideally in a
database.
> >     * Freeware, shareware or otherwise inexpensive would be nice.
>
> You might like to take a look at Harvest-NG, which is free software.
> (http://webharvest.sourceforge.net/ng) It will allow all of what you
> detail above. It saves the metadata in a Perl DBM database - some work
> has been done, but not completed, on working with the DBI interface
> to a remote database. You may find that some knowledge of Perl is
helpful
> in adapting it exactly to your needs (much use is made of Perl regular
> expressions in the pattern matching, for instance).
>
> Cheers,
>
> Simon.

Re: Looking for a gatherer.

Reply via email to