You specify the urls in a urls file, see "Option 2. Bootstrapping from an
initial seed list" in the Nutch
Tutorial<http://wiki.apache.org/nutch/NutchTutorial>
.


On Sat, May 14, 2011 at 11:15 AM, jeffersonzhou <[email protected]>wrote:

> Please help!!
>
>
>
> From: jeffersonzhou [mailto:[email protected]]
> Sent: Saturday, May 14, 2011 4:36 PM
> To: '[email protected]'
> Subject: how to force nutch to crawl specific urls?
>
>
>
> Hi, as shown in the subject, how can I force nutch to crawl/fetch certain
> urls? Thanks
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Reply via email to