Please feel free to add this to the wiki as it is a question that will undoubtably arise in the future.
Lewis On Sat, Jul 16, 2011 at 12:37 PM, Gabriele Kahlout <[email protected] > wrote: > On Sat, Jul 16, 2011 at 1:29 PM, lewis john mcgibbney < > [email protected]> wrote: > > > Hi Gabriele, > > > > At first this seems like a plausable arguement, > > > Indeed, I think it could be a FAQ. Shall I add it to nutch wiki? > > > > however my question concerns > > what Nutch would do if we wished to change the Solr core which to index > to? > > > > If we removed this functionality from the crawldb there would be no way > to > > determine what Nutch was to fetch and what it wasn't. > > > > Indeed, you confirm my though. > > > > > > crawled, the fetch status, and the date. This data is maintained beyond > > > fetch so that pages may be re-crawled, after the a re-crawling period. > > > At the same time Solr maintains an inverted index of all the fetched > > pages. > > > It'd seem more efficient if nutch relied on the index instead of > > > maintaining its own crawldb, to !store the same url twice. > > > [BUT THAT'S JUST A KEY/ID, NOT WASTE AT ALL, WOULD ALSO END UP THE SAME > > IN > > > SOLR] > > > > > > -- > > > Regards, > > > K. Gabriele > > > > > > --- unchanged since 20/9/10 --- > > > P.S. If the subject contains "[LON]" or the addressee acknowledges the > > > receipt within 48 hours then I don't resend the email. > > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > > > time(x) < Now + 48h) ⇒ ¬resend(I, this). > > > > > > If an email is sent by a sender that is not a trusted contact or the > > email > > > does not contain a valid code then the email is not received. A valid > > code > > > starts with a hyphen and ends with "X". > > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y > ∈ > > > L(-[a-z]+[0-9]X)). > > > > > > > > > > > > > > -- > > *Lewis* > > > > > > -- > Regards, > K. Gabriele > > --- unchanged since 20/9/10 --- > P.S. If the subject contains "[LON]" or the addressee acknowledges the > receipt within 48 hours then I don't resend the email. > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ > time(x) > < Now + 48h) ⇒ ¬resend(I, this). > > If an email is sent by a sender that is not a trusted contact or the email > does not contain a valid code then the email is not received. A valid code > starts with a hyphen and ends with "X". > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ > L(-[a-z]+[0-9]X)). > -- *Lewis*

