Can i tell generate too generate a fetch based on the status field in
MYSQL, i wish to index only status 1 meaning not yet fetched and parse
them only till there all done. This would be a great help.
Cheers
Shane.
On 27/03/14 13:15, Shane Wood wrote:
Could someone comment in what these fields do when using Nutch and
MYSQL ?
or is there a web page where this information is already available.
Thanks
id
headers
text
status
markers
parseStatus
modifiedTime <---- this is always NULL ? any idea why.
prevModifiedTime <---- this is always NULL ? any idea why.
score
typ
batchId
baseUrl
content
title
reprUrl
fetchInterval
prevFetchTime
inlinks
prevSignature
outlinks
fetchTime
retriesSinceFetch
Ascending
protocolStatus
signature
metadata