I would say that with any (open source) software <= version 1.0 you
should *plan* for change.

Rgrds, Thomas




On 6/23/06, Bipin Parmar <[EMAIL PROTECTED]> wrote:
> Thank you, Andrzej and Thomas. Your answers make
> sense. Re-crawling is not a problem because I had not
> crawled a lot anyway and am still in development. I
> had about 700,000 urls. I worry about directory
> structure & file formats because I plan to crawl 100M
> urls and have some manual effort after crawling that I
> would not like to repeat as new versions of nutch are
> released.
>
> I also ask the question because I read that people
> have crawled/indexed hundreds of millions of pages
> using Nutch. I assumed that they must have used nutch
> 0.7.1 or prior version. And therefore I want to know
> how they would migrate to nutch 0.8 production
> release.
>
> Thanks again to both of you.
>
> Bipin Parmar
>
> --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> > TDLN wrote:
> > > Unfortunately this is only feasible with *a lot*
> > of custom code.
> > > Probably you will be done sooner refetching and
> > indexing your pages.
> >
> > I confirm. Theroretically, you probably could use
> > some classloader
> > tricks to load both versions of classes and
> > libraries, and then use
> > other (temporary container) classes loaded from the
> > parent classloader
> > to transfer the data. But it would be a LOT of pain
> > to code it well and
> > reliably ...
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >  ___. ___ ___ ___ _ _
> > __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval,
> > Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System
> > Integration
> > http://www.sigram.com  Contact: info at sigram dot
> > com
> >
> >
> >
>
>

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to