Hi,

This is not solr interface and not solr index. 
Have you tried to send nutch segments to solr using bin/nutch solrindex  
command while crawling?

Alex.


-----Original Message-----
From: Gabriele Kahlout <[email protected]>
To: user <[email protected]>
Cc: McGibbney, Lewis John <[email protected]>
Sent: Fri, Mar 25, 2011 9:24 am
Subject: Re: Index while crawling


On Fri, Mar 25, 2011 at 3:44 PM, McGibbney, Lewis John <

[email protected]> wrote:



> Hi Gabriele,

>

> Would it be worth making this script available on the wiki with an

> explanation of exactly what it's purpose is, what it does, and a use case.

>



I'd be very happy to contribute it, how to do?



>

> When I get a chance I will try it out using Solr as indexing mechanism.

>



I'm not sure why use the Solr index since the results  are already viewable

through Solr interface.[1 <http://screencast.com/t/E38ITblZia>]

Anyway I've gone on modifying the script tryin 2 and will report the new

findings and script soon.





> Thank you for this

>

> Lewis

> ________________________________________

> From: Gabriele Kahlout [[email protected]]

> Sent: 24 March 2011 15:33

> To: [email protected]

> Cc: McGibbney, Lewis John; [email protected]

> Subject: Re: Index while crawling

>

> It indeed is this way. I'guess my options would be:

>

> 1. use a scoring plugin that assigns a lower score to links that the

> initial score, so that urls from the urls list are retrieved first using

> -topN than links added to the db after fetching. My understanding is that

> the OpicScoringFilter right now assigns 0 to start with and so all urls are

> equal and the hashtable works more like a LIFO, hence links are crawled

> before urls in the list.

>

> 2. Include inject in the loop and have the size of the urls in the file ==

> topN such that one iteration is enough for all urls and then inject again.

> Once the whole list is therefore fetched (with depth=0) one can iterate for

> depth if desired. I guess this solution is aka merging crawls.

>

> I'll be tryin 2. Meanwhile I've changed the script to the attached.

>

>

> Glasgow Caledonian University is a registered Scottish charity, number

> SC021474

>

> Winner: Times Higher Education’s Widening Participation Initiative of the

> Year 2009 and Herald Society’s Education Initiative of the Year 2009.

>

> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

>

> Winner: Times Higher Education’s Outstanding Support for Early Career

> Researchers of the Year 2010, GCU as a lead with Universities Scotland

> partners.

>

> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

>







-- 

Regards,

K. Gabriele



--- unchanged since 20/9/10 ---

P.S. If the subject contains "[LON]" or the addressee acknowledges the

receipt within 48 hours then I don't resend the email.

subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)

< Now + 48h) ⇒ ¬resend(I, this).



If an email is sent by a sender that is not a trusted contact or the email

does not contain a valid code then the email is not received. A valid code

starts with a hyphen and ends with "X".

∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈

L(-[a-z]+[0-9]X)).




 

Reply via email to