On Thu, Mar 24, 2011 at 1:36 PM, McGibbney, Lewis John <
[email protected]> wrote:

> Hi Gabriele,
>
> Out of curiosity, how large is your crawl job? How many URL's are you
> fetching on each increment. Is it a continuous crawl job?
>

I guess the -topN 1 triggered your interest. I was fetching only one local
page out of testing. Now I'm testing to crawl simple wikipedia with -topN
100. I'm also trying to figure out wherether my $3 represents the depth of
crawls or not.
It's for sure if all the urls <= -topN, but when doing what I'm trying
(incremental crawling) I'd like all urls injected to be fetched, in topN
increments, rather than start fetch urls found in the previous iteration
topN urls.



>
> Lewis
> ________________________________________
> From: Gabriele Kahlout [[email protected]]
> Sent: 24 March 2011 12:30
> To: [email protected]
> Cc: [email protected]; Claudio Martella; [email protected]
> Subject: Re: Index while crawling
>
> This seems to work.
>
> i=0
> while true;
> do
>    if [[ $i -ge $3 ]]
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Reply via email to