> From: [EMAIL PROTECTED]
> To: [email protected]
> Subject: subcollection
> Date: Tue, 30 Sep 2008 08:55:35 +0000
>
>
> Hi,
>
> I'm trying to get subcollections working in nutch 1.0-dev, and have crawled
> our intranet with the subcollection.xml configured as below. However when I
> submit a query to search.jsp eg,
>
> subcollection:im database
Duh! I realised what I was doing wrong. I was literally typing
subcollection: instead of my subcollection name in the query eg,
should have typed im:database
>
> I don't get any results (as opposed to submitting this without
> subcollection:im)
>
> Is this configured wrongly? I realise that subcollection.xml doesn't do regex
> expressions, but I wasn't sure if I could just put in part of the url, or had
> to put in the full stem pattern eg, http://planet.somdomain.com/level1/
>
> Thanks,
> Ed.
>
> <subcollections>
> <subcollection>
> <name>default</name>
> <id>default</id>
> <whitelist>
> </whitelist>
> <blacklist>
>
> planet.somedomain.com/general/aptrix/bani.nsf/Content/Weekly+news
> /aptprop.nsf/Content/Americas+
> /aptprop.nsf/Content/AB+CityFlyer+
> /aptprop.nsf/Content/CityFlyer+
> /im/barch/
> /im/dms/
> /im/tech/
> </blacklist>
> </subcollection>
>
> <subcollection>
> <name>im</name>
> <id>im</id>
> <whitelist>
> planet.somedomain.com/general/aptrix/aptim.nsf/
> planet.somedomain.com/im/barch/
> planet.somedomain.com/im/dms/
> planet.somedomain.com/im/tech/
> </whitelist>
> <blacklist />
> </subcollection>
>
> <subcollection>
> <name>news</name>
> <id>news</id>
> <whitelist>
>
> planet.somedomain.com/general/aptrix/bani.nsf/Content/Weekly+news
> </whitelist>
> <blacklist />
> </subcollection>
>
> </subcollections>
>
> _________________________________________________________________
> Discover Bird's Eye View now with Multimap from Live Search
> http://clk.atdmt.com/UKM/go/111354026/direct/01/
_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/