>> you could do a quick hack in 0.8 to 
"fetch" the pages from your 0.7 crawl, using a modified fetcher. 
 
  what do you mean? Do I have to modify the fetcher code by myself ?


Ken Krugler wrote:
> 
>>It's really  a sad news for me. I must spend a lot of time on fetching it
>>again.
> 
> If it's only just HTML, then you could do a quick hack in 0.8 to 
> "fetch" the pages from your 0.7 crawl, using a modified fetcher. You 
> wouldn't have all of the header info, but if everything is text/html 
> then you might be OK.
> 
> -- Ken
> 
> 
>>Andrzej Bialecki wrote:
>>>
>>>  King Kong wrote:
>>>>  I had fetched about 3Gbytes pages in Nutch-0.7.2 .
>>>>  Now, I want to move it to Nutch-0.8, How can I do it ?
>>>>  
>>>
>>>  Unfortunately, the data is not portable between these versions. The
>>> only
>>>  thing you could do to preserve your webdb is to dump it into a text
>>>  file, and then inject into a 0.8 crawldb. As for the segments, you will
>>>  have to refetch them.
>>>
>>>  --
>>>  Best regards,
>>>  Andrzej Bialecki     <><
>>>   ___. ___ ___ ___ _ _   __________________________________
>>>  [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>>>  ___|||__||  \|  ||  |  Embedded Unix, System Integration
>>  > http://www.sigram.com  Contact: info at sigram dot com
> 
> -- 
> Ken Krugler
> Krugle, Inc.
> +1 530-210-6378
> "Find Code, Find Answers"
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-does-Nutch-0.7.2-data-upgrade-to-0.8--tf2151013.html#a5949225
Sent from the Nutch - User forum at Nabble.com.


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to