You specify the urls in a urls file, see "Option 2. Bootstrapping from an initial seed list" in the Nutch Tutorial<http://wiki.apache.org/nutch/NutchTutorial> .
On Sat, May 14, 2011 at 11:15 AM, jeffersonzhou <[email protected]>wrote: > Please help!! > > > > From: jeffersonzhou [mailto:[email protected]] > Sent: Saturday, May 14, 2011 4:36 PM > To: '[email protected]' > Subject: how to force nutch to crawl specific urls? > > > > Hi, as shown in the subject, how can I force nutch to crawl/fetch certain > urls? Thanks > > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).

