Thank you, Andrzej and Thomas. Your answers make
sense. Re-crawling is not a problem because I had not
crawled a lot anyway and am still in development. I
had about 700,000 urls. I worry about directory
structure & file formats because I plan to crawl 100M
urls and have some manual effort after crawling that I
would not like to repeat as new versions of nutch are
released.

I also ask the question because I read that people
have crawled/indexed hundreds of millions of pages
using Nutch. I assumed that they must have used nutch
0.7.1 or prior version. And therefore I want to know
how they would migrate to nutch 0.8 production
release.

Thanks again to both of you.

Bipin Parmar

--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> TDLN wrote:
> > Unfortunately this is only feasible with *a lot*
> of custom code.
> > Probably you will be done sooner refetching and
> indexing your pages.
> 
> I confirm. Theroretically, you probably could use
> some classloader 
> tricks to load both versions of classes and
> libraries, and then use 
> other (temporary container) classes loaded from the
> parent classloader 
> to transfer the data. But it would be a LOT of pain
> to code it well and 
> reliably ...
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _  
> __________________________________
> [__ || __|__/|__||\/|  Information Retrieval,
> Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System
> Integration
> http://www.sigram.com  Contact: info at sigram dot
> com
> 
> 
> 


Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to