In my hacking, i inadvertedly lost a seqment.  What happened?  How in
the Heck did I manage to do something so stupid?
 
Well, when i started another round of fetching I did a generate command
and specified /crawldb/segements/* as the segment, and it made my new
segment under the
directory of the last segment i fetched.  I deleted it, when I probably
should have instead moved it up one level, but I wasn't sure it had the
right stuff, and I was a little impatient, so I blasted it.
 
Now when I run the generate , it does 0 records.  Is there anyway to
recover?
 
I think this was because on my first generate, I specified topN=134513,
the number of records I have in my crawldb.  I dont know if that was
right either , but I figured "What the Heck".
 
My goal is to get nutch to crawl every linked page that there is in a
limited list of urls.  Its so simple.  I am not sure if topN=<the number
of record in your crawldb> is the right way to do it.
When you fetch another another round, and you want to get the documents
you haven;t fecthed yet plus the documents the prior fetches failed on,
wouldn't you generate a fecthlist off of every record you have in
crawldb?
 
I feel like I am getting close.
 
 
 
 

Richard Braman
mailto:[EMAIL PROTECTED]
561.748.4002 (voice) 

http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> 
Free Open Source Tax Software

 

Reply via email to