Hello, It appears that in my previous message I had ommitted to write -dir in my message, but had actually written -dir in my console.
However, I have found out that I need to nutch parse /home/crawl/segments/12345 before updating a db. By the way: what exactly is a segment, and how is data stored under this segment? I think it is a hadoop format. Best Regards, -C.B. On Fri, Jul 8, 2011 at 11:00 PM, lewis john mcgibbney <[email protected]> wrote: > Hi C.B., > > It looks like you may have simply missed the '-dir' when you were specifying > your crawldb directory to be updated from the fetched segment. Have a look > here [1] > > Can you please try and post your results. > > [1] http://wiki.apache.org/nutch/bin/nutch_updatedb > > > > On Fri, Jul 8, 2011 at 5:06 PM, Cam Bazz <[email protected]> wrote: > >> Hello, >> >> I tried to crawl manually, only a list of urls. I have issued the >> following commands: >> >> bin/nutch inject /home/crawl/crawldb /home/urls >> >> bin/nutch generate /home/crawl/crawldb /home/crawl/segments >> >> bin/nutch fetch /home/crawl/segments/123456789 >> >> bin/nutch updatedb /home/crawl/crawldb /home/crawl/segments/123456789 >> -noAdditions >> >> however for the last command: it skips the segment 12345789 saying it >> is an invalid segment? >> >> This is exactly what I need (the -noAdditions flag) but it will not >> updatedb. What might have done wrong? >> >> Best Regards, >> -C.B. >> > > > > -- > *Lewis* >

