Please feel free to add this to the wiki as it is a question that will
undoubtably arise in the future.

Lewis

On Sat, Jul 16, 2011 at 12:37 PM, Gabriele Kahlout <[email protected]
> wrote:

> On Sat, Jul 16, 2011 at 1:29 PM, lewis john mcgibbney <
> [email protected]> wrote:
>
> > Hi Gabriele,
> >
> > At first this seems like a plausable arguement,
>
>
> Indeed, I think it could be a FAQ. Shall I add it to nutch wiki?
>
>
> > however my question concerns
> > what Nutch would do if we wished to change the Solr core which to index
> to?
> >
> > If we removed this functionality from the crawldb there would be no way
> to
> > determine what Nutch was to fetch and what it wasn't.
> >
>
> Indeed, you confirm my though.
>
> >
> > > crawled, the fetch status, and the date. This data is maintained beyond
> > > fetch so that pages may be re-crawled, after the a re-crawling period.
> > > At the same time Solr maintains an inverted index of all the fetched
> > pages.
> > > It'd seem more efficient if nutch relied on the index instead of
> > > maintaining its own crawldb, to !store the same url twice.
> > > [BUT THAT'S JUST A KEY/ID, NOT WASTE AT ALL, WOULD ALSO END UP THE SAME
> > IN
> > > SOLR]
> > >
> > > --
> > > Regards,
> > > K. Gabriele
> > >
> > > --- unchanged since 20/9/10 ---
> > > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > > receipt within 48 hours then I don't resend the email.
> > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> > > time(x) < Now + 48h) ⇒ ¬resend(I, this).
> > >
> > > If an email is sent by a sender that is not a trusted contact or the
> > email
> > > does not contain a valid code then the email is not received. A valid
> > code
> > > starts with a hyphen and ends with "X".
> > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
> ∈
> > > L(-[a-z]+[0-9]X)).
> >
> >
> >
> >
> >
> >
> > --
> > *Lewis*
> >
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>



-- 
*Lewis*

Reply via email to