Indeed, you are correct. 

Thanks.


jay jiang wrote:

> Shouldn't that be subcollection:wiki instead?   Also I assumed you had
> subcollection added to plugin.includes in the config file
> (nutch-site.xml).
>
> Andrew Libby wrote:
>
>> Iv'e applied the patch in the ticket linked to below.  I browesed the
>> patch to
>> try to figure out how to use this plugin, and I'm having troubles trying
>> to get it
>> working.
>> Before I get into the details, if someone has a source of information
>> describing
>> how nutch starts up and initializes plugins so that I can get a feel for
>> if this patch
>> is even being used properly in the system, I'd very much appreciate it.
>>
>> ----
>>
>> Here's what I did:
>>
>> Added patches with patch -p0 < subcollection.2.path
>>
>> Comiled tarball with ant tar
>>
>> Extracted tarball in my runtime location with tar -zxvpf -
>> nutch-0.8-dev.tar.gz
>>
>> Created urls/urls.txt containing my site name
>> (http://www.philadelphiariders.com/)
>>
>> Edited crawl-urlfilter.xml to accept aformentioned site name
>>
>> Edited subcollections.xml and added the following:
>>
>>    <subcollection>
>>        <name>wiki</name>
>>        <id>wiki</name>
>>        <whitelist>http://www.philadelphiariders.com/wiki</whitelist>
>>        <blacklist />
>>    </subcollection>
>>
>>    <subcollection>
>>        <name>moto-web</name>
>>        <id>moto-web</name>
>>        <whitelist>http://www.philadelphiariders.com/c/dmoz</whitelist>
>>        <blacklist />
>>    </subcollection>
>>
>>    <subcollection>
>>        <name>gallery</name>
>>        <id>gallery</id>
>>        <whitelist>http://www.philadelphiariders.com/gallery</whitelist>
>>        <blacklist />
>>    </subcollection>
>>
>> Crawled/ indexed my site with ./bin/nutch crawl urls -dir ../nutch-index
>>
>> When I start tomcat and do some test searching, I get links from the
>> wiki area
>> w/o a collection filed added to the query.  But if I do something a
>> query like:
>>
>> collection:wiki loudon
>>
>> Which should return documents, I get none. Additionally, if I simply
>> query
>> collection:wiki, I get no hits.
>>
>> If anyone has any ideas, I'll be very greatful.
>>
>>
>> Zaheed Haque wrote:
>>
>>  
>>
>>> Maybe this could help you..
>>>
>>> http://issues.apache.org/jira/browse/NUTCH-201
>>>
>>> Cheers
>>>
>>>
>>>
>>>   
>>
>>
>>  
>>
>
>


-- 
Andrew Libby                                  
[EMAIL PROTECTED]
http://philadelphiariders.com/




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to