> >> you could do a quick hack in 0.8 to >"fetch" the pages from your 0.7 crawl, using a modified fetcher. > > what do you mean? Do I have to modify the fetcher code by myself ?
Yes, you'd have to modify the 0.8 fetcher code (or rather create your own plug-in) that uses a Nutch 0.7 search setup to get at all of the previously fetched content. -- Ken >Ken Krugler wrote: >> > >>It's really a sad news for me. I must spend a lot of time on fetching it >>>again. >> >> If it's only just HTML, then you could do a quick hack in 0.8 to >> "fetch" the pages from your 0.7 crawl, using a modified fetcher. You >> wouldn't have all of the header info, but if everything is text/html >> then you might be OK. >> >> -- Ken >> >> >>>Andrzej Bialecki wrote: >>>> >>>> King Kong wrote: >>>>> I had fetched about 3Gbytes pages in Nutch-0.7.2 . >>>>> Now, I want to move it to Nutch-0.8, How can I do it ? >>>>> >>>> >>>> Unfortunately, the data is not portable between these versions. The >>>> only >>>> thing you could do to preserve your webdb is to dump it into a text >>>> file, and then inject into a 0.8 crawldb. As for the segments, you will >>>> have to refetch them. >>>> >>>> -- >>>> Best regards, >>>> Andrzej Bialecki <>< >>>> ___. ___ ___ ___ _ _ __________________________________ >>>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >>>> ___|||__|| \| || | Embedded Unix, System Integration >>> > http://www.sigram.com Contact: info at sigram dot com >> >> -- >> Ken Krugler >> Krugle, Inc. >> +1 530-210-6378 >> "Find Code, Find Answers" >> >> > >-- >View this message in context: >http://www.nabble.com/How-does-Nutch-0.7.2-data-upgrade-to-0.8--tf2151013.html#a5949225 >Sent from the Nutch - User forum at Nabble.com. -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers" ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
