On 9/21/06, Jacob Brunson <[EMAIL PROTECTED]> wrote:
> On 9/21/06, Gianni Parini <[EMAIL PROTECTED]> wrote:
> >         -Is it possible to have an automatic recrawling? have i got to write
> > my own application by myself? I need an application running in
> > background that re-crawl my intranet site 2-3 times a week..
>
> On the nutch wiki you will find an intranet recrawl script.  That
> probably will work for you.  However, I think the script has a problem
> with duplicating segment data during the mergesegs step, but I've
> asked about it here and haven't had any confirmations.
>
Well, I can confirm my index grew to ~5 GB from ~1.5 GB after (if I
remember correctly) 2 recrawls.
It doesn't solve the problem I was after anyway, as it only indexes
pages according to the time of the last crawl, rather than crawling
everything, checking if it the new content has a newer
modification/creation date and indexing only that (typical intranet
scenario). But I'm running like a madman in the opposite direction of
the topic: please ignore me. :)

t.n.a.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to