Re: Crawling just one particular page from a host

Erlend Garåsen Tue, 14 May 2013 05:06:49 -0700

On 14.05.13 13.49, Karl Wright wrote:

Hi Erlend,


"Hosts matching seeds" means that if the domain (in this case
www.ibsen.uio.no <http://www.ibsen.uio.no>) is mentioned in a seed, a
page with the same domain will be included in the crawl if there is
nothing else that excludes it.  So it sounds like it is working as designed.

Yes, you are right. I'm just trying to find a simple way to crawl justthe starting page of a host and nothing else, i.e.:

http://www.ibsen.uio.no/forside.xhtml

I tried to place this in the include in crawl box:
http://www\.ibsen\.uio\.no/forside\.xhtml$

Still it will include everything else from that host unless I write alot of exclude reg exp rules.


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: Crawling just one particular page from a host

Reply via email to