I checked the patch for subcollections
(http://issues.apache.org/jira/browse/NUTCH-201) - although I assumed it is
included in the latest public release 0.8.1.

Compared to the current source code, it looks like having has an extra file
(which doesn't exist in version 0.8.1)

src/plugin/subcollection/src/java/org/apache/nutch/util/DomUtil.java 

Could this be the case for "collection" not being updated on re-indexing?


Sami Siren-2 wrote:
> 
> liv wrote:
>> - I reindex the db: delete folder "indexes", run the command:
>> 
>> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*
>> 
>> - then I inspect the resulting db with luke again
>> 
>> Unfortunately nothing has changed. Maybe I am missing something... Please
>> tell me if you see anything wrong.
> 
> If you did exactly those steps then what happens is that the
> subcollections.xml is read from inside the .job file. You need to
> rebuild the .job to put new file inside of it.
> 
> simply do "ant" and rerun indexing and it should work as expected.
> 
> --
>  Sami Siren
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/subcollections-tf2821188.html#a7947722
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to