Re: crawl and update one url already in crawldb

Markus Jelsma Thu, 22 Mar 2012 05:57:51 -0700


On Thursday 22 March 2012 13:53:02 webdev1977 wrote:
> I have created an application that can detect when files are
> created/modified/deleted in one of our Windows Share drives and I would
> like to know if it is possible upon notification of this to crawl just a
> single URL in the crawldb?
>


Easiest would be to use the freegenerator tool to generate a segment from a 
plain text file with seed URL's, much like the injector. That segment can then 
later join other segments when updating the crawldb.

> I think it is possible to run individual new crawls for each url with the
> goal of merging the linkdbs and crawldbs at somepoint (once a night).  But
> I wonder if there is a more efficient  way of doing this.  The other
> obstacle is that the main crawldb is part of a continuous looping crawl
> that technically could never end (unless I force it to).  Would it be an
> issue to update a database that could potentially be locked at any point
> in time?
> 
> Thanks!
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/crawl-and-update-one-url-already-in-cra
> wldb-tp3848358p3848358.html Sent from the Nutch - User mailing list archive
> at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex

Re: crawl and update one url already in crawldb

Reply via email to