Unfortunately my java knowledge is too poor to debug this one. However I
doubt that the file "subcollections.xml" from inside the nutch-xxx.job is
used. This because the file nutchxxx.job is old enough - has the date since
the day I made he nutch installation.

Initially I use file "subcollections.xml" from inside folder "conf". I add2
collections named "test1", "test2", I fetch/index, then I inspect the db
with luke - and i see the values "test1" and "test2".

Then I change the same file "subcollections.xml", rename collections to
"new1", "new2", reindex and inspect the contents. I see the same values
"test1","test2".

If the file from inside .job would have been used, then I shouldn't see the
initial collections... not the ones I defined at step 1.

Maybe you point me to another .job file... Or maybe the subcollections.xml
is not re-parsed when indexing... unfortunately I cannot trace/debug it.


Sami Siren-2 wrote:
> 
> liv wrote:
>> - I reindex the db: delete folder "indexes", run the command:
>> 
>> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*
>> 
>> - then I inspect the resulting db with luke again
>> 
>> Unfortunately nothing has changed. Maybe I am missing something... Please
>> tell me if you see anything wrong.
> 
> If you did exactly those steps then what happens is that the
> subcollections.xml is read from inside the .job file. You need to
> rebuild the .job to put new file inside of it.
> 
> simply do "ant" and rerun indexing and it should work as expected.
> 
> --
>  Sami Siren
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/subcollections-tf2821188.html#a7929866
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to