On Thu, Mar 24, 2011 at 1:36 PM, McGibbney, Lewis John < [email protected]> wrote:
> Hi Gabriele, > > Out of curiosity, how large is your crawl job? How many URL's are you > fetching on each increment. Is it a continuous crawl job? > I guess the -topN 1 triggered your interest. I was fetching only one local page out of testing. Now I'm testing to crawl simple wikipedia with -topN 100. I'm also trying to figure out wherether my $3 represents the depth of crawls or not. It's for sure if all the urls <= -topN, but when doing what I'm trying (incremental crawling) I'd like all urls injected to be fetched, in topN increments, rather than start fetch urls found in the previous iteration topN urls. > > Lewis > ________________________________________ > From: Gabriele Kahlout [[email protected]] > Sent: 24 March 2011 12:30 > To: [email protected] > Cc: [email protected]; Claudio Martella; [email protected] > Subject: Re: Index while crawling > > This seems to work. > > i=0 > while true; > do > if [[ $i -ge $3 ]] > > Glasgow Caledonian University is a registered Scottish charity, number > SC021474 > > Winner: Times Higher Education’s Widening Participation Initiative of the > Year 2009 and Herald Society’s Education Initiative of the Year 2009. > > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html > > Winner: Times Higher Education’s Outstanding Support for Early Career > Researchers of the Year 2010, GCU as a lead with Universities Scotland > partners. > > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).

