Thank you, Andrzej and Thomas. Your answers make sense. Re-crawling is not a problem because I had not crawled a lot anyway and am still in development. I had about 700,000 urls. I worry about directory structure & file formats because I plan to crawl 100M urls and have some manual effort after crawling that I would not like to repeat as new versions of nutch are released.
I also ask the question because I read that people have crawled/indexed hundreds of millions of pages using Nutch. I assumed that they must have used nutch 0.7.1 or prior version. And therefore I want to know how they would migrate to nutch 0.8 production release. Thanks again to both of you. Bipin Parmar --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > TDLN wrote: > > Unfortunately this is only feasible with *a lot* > of custom code. > > Probably you will be done sooner refetching and > indexing your pages. > > I confirm. Theroretically, you probably could use > some classloader > tricks to load both versions of classes and > libraries, and then use > other (temporary container) classes loaded from the > parent classloader > to transfer the data. But it would be a LOT of pain > to code it well and > reliably ... > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ > __________________________________ > [__ || __|__/|__||\/| Information Retrieval, > Semantic Web > ___|||__|| \| || | Embedded Unix, System > Integration > http://www.sigram.com Contact: info at sigram dot > com > > > Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
