You can set a hopcount filter - that should do it. Karl
On Tue, May 14, 2013 at 8:06 AM, Erlend Garåsen <[email protected]>wrote: > On 14.05.13 13.49, Karl Wright wrote: > >> Hi Erlend, >> >> "Hosts matching seeds" means that if the domain (in this case >> www.ibsen.uio.no <http://www.ibsen.uio.no>) is mentioned in a seed, a >> >> page with the same domain will be included in the crawl if there is >> nothing else that excludes it. So it sounds like it is working as >> designed. >> > > Yes, you are right. I'm just trying to find a simple way to crawl just the > starting page of a host and nothing else, i.e.: > http://www.ibsen.uio.no/**forside.xhtml<http://www.ibsen.uio.no/forside.xhtml> > > I tried to place this in the include in crawl box: > http://www\.ibsen\.uio\.no/**forside\.xhtml$ > > Still it will include everything else from that host unless I write a lot > of exclude reg exp rules. > > > Erlend > > -- > Erlend Garåsen > Center for Information Technology Services > University of Oslo > P.O. Box 1086 Blindern, N-0317 OSLO, Norway > Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: > 31050 >
