On 14.05.13 13.49, Karl Wright wrote:
Hi Erlend,
"Hosts matching seeds" means that if the domain (in this case
www.ibsen.uio.no <http://www.ibsen.uio.no>) is mentioned in a seed, a
page with the same domain will be included in the crawl if there is
nothing else that excludes it. So it sounds like it is working as designed.
Yes, you are right. I'm just trying to find a simple way to crawl just
the starting page of a host and nothing else, i.e.:
http://www.ibsen.uio.no/forside.xhtml
I tried to place this in the include in crawl box:
http://www\.ibsen\.uio\.no/forside\.xhtml$
Still it will include everything else from that host unless I write a
lot of exclude reg exp rules.
Erlend
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050