I should also mention that I'm running nutch version 0.9
On 7/2/07, Lyndon Maydwell <[EMAIL PROTECTED]> wrote:
Hi, I'm a new user to nutch and am wondering about seeding the database by running a crawl with a very shallow depth, then growing the database every time the periodic update script is done. I have two scripts that I'm currently using, but I'm not sure if the update script is actually adding searchable data. The initial crawl script is doing a great job, and I can verify that it is working by using the search app that comes with nutch, but my maintenance script doesn't seem to be adding any results, although it throws no errors. Below are the two small scripts. Am I missing any simple errors? -- initial crawl script << END1 -- #!/bin/sh ./../bin/nutch crawl urls -dir crawl -depth 2 -topN 10000 END1 -- updater script << END2 -- first="crawl" second="100000" ../bin/nutch generate $first/crawldb $first/segments -topN $second segment=`ls -d $first/segments/* | tail -1 | grep "[a-zA-Z0-9/]*"` ../bin/nutch fetch $segment ../bin/nutch updatedb $first/crawldb $segment rm -r $first/indexes ../bin/nutch invertlinks $first/linkdb $first/segments/* ../bin/nutch index $first/indexes $first/crawldb $first/linkdb $first/segments/* END2
