Unfortunately my java knowledge is too poor to debug this one. However I doubt that the file "subcollections.xml" from inside the nutch-xxx.job is used. This because the file nutchxxx.job is old enough - has the date since the day I made he nutch installation.
Initially I use file "subcollections.xml" from inside folder "conf". I add2 collections named "test1", "test2", I fetch/index, then I inspect the db with luke - and i see the values "test1" and "test2". Then I change the same file "subcollections.xml", rename collections to "new1", "new2", reindex and inspect the contents. I see the same values "test1","test2". If the file from inside .job would have been used, then I shouldn't see the initial collections... not the ones I defined at step 1. Maybe you point me to another .job file... Or maybe the subcollections.xml is not re-parsed when indexing... unfortunately I cannot trace/debug it. Sami Siren-2 wrote: > > liv wrote: >> - I reindex the db: delete folder "indexes", run the command: >> >> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/* >> >> - then I inspect the resulting db with luke again >> >> Unfortunately nothing has changed. Maybe I am missing something... Please >> tell me if you see anything wrong. > > If you did exactly those steps then what happens is that the > subcollections.xml is read from inside the .job file. You need to > rebuild the .job to put new file inside of it. > > simply do "ant" and rerun indexing and it should work as expected. > > -- > Sami Siren > > > -- View this message in context: http://www.nabble.com/subcollections-tf2821188.html#a7929866 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
