I intend to use nutch with a fairly complex structure of subcollections. I did some tests and the storage/search performs as expected; however there is an aspect I may have neglected and cannot find an answer.
How/at which stage are subcollections added to the index structure? I plan on crawling frequently, adding new sites to existent repository, merging/reindexing as needed. However if I need to change the subcollection structure (ie. add a site to a newly created subcollection) I don't want to recrawl it again. I hope it can be done by simply using the existent/crawled data. I tried the following: fetch/index some sites using subcollection1.xml. Then change to subcollection2.xml. Then reindex - nothing happens in terms of collections. Same if I do a updatedb/reindex. Can you please point me something to look at? I am fairly poor in java, and I hope I won't need to change the source code. Although the need to "inject" somehow the new names for subcollection field looks rather peculiar. Thanks -- View this message in context: http://www.nabble.com/subcollections-tf2821188.html#a7874307 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
