It should be possible to merge the CrawlDbs but not that way. "current" is a hard-wired subdir. A correct call would not contain "current": nutch mergedb <output> crawldb1/ crawldb2/
I understand you may have lost lot of data but again: > Assumed crawling is continued the missed data will be crawled again > (or already has been crawled again because it happened 3 days ago). But that's also a question how you run the crawl. First, you should check whether entries are really lost. If yes, you better run the update job again. The segment to update the CrawlDb with should be still there. The update job took 1.5h, that's a lot. What is your -topN? If it's large reduce it, so that one cycle finishes within a few hours. If a job fails the loss is tolerable, just run it again. On 07/08/2013 10:49 PM, eakarsu wrote: > Sebastian, > > The hadoop job result page does not render properly. There was nothing wrong > for updatedb job. > > Can we merge current and 624730206 folders with command? > > nutch mergedb <output_crawldb> 160milyonurls/crawldb/current > 160milyonurls/crawldb/624730206 > > > User: hduser > JobName: crawldb 160milyonurls/crawldb > JobConf: > hdfs://summitdev1:54310/media/sdb/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201307050940_0002/job.xml > Job-ACLs: All users are allowed > Submitted At: 5-Jul-2013 22:14:37 > Launched At: 5-Jul-2013 22:14:38 (0sec) > Finished At: 5-Jul-2013 23:55:11 (1hrs, 40mins, 33sec) > Status: SUCCESS > Failure Info: > Analyse This Job > Kind Total Tasks(successful+failed+killed) Successful tasks Failed > tasks > Killed tasks Start Time Finish Time > Setup 1 1 0 0 5-Jul-2013 22:15:31 > 5-Jul-2013 22:15:32 (1sec) > Map 3043 3043 0 0 5-Jul-2013 22:14:41 5-Jul-2013 > 23:16:56 (1hrs, > 2mins, 14sec) > Reduce 40 40 0 0 5-Jul-2013 22:18:25 > 5-Jul-2013 23:55:35 (1hrs, > 37mins, 10sec) > Cleanup 1 1 0 0 5-Jul-2013 23:55:10 > 5-Jul-2013 23:55:11 (1sec) > > > > <http://lucene.472066.n3.nabble.com/file/n4076369/Capture.jpg> > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/crawldb-contents-tp4076345p4076369.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

