On 9/21/06, Jacob Brunson <[EMAIL PROTECTED]> wrote: > On 9/21/06, Gianni Parini <[EMAIL PROTECTED]> wrote: > > -Is it possible to have an automatic recrawling? have i got to write > > my own application by myself? I need an application running in > > background that re-crawl my intranet site 2-3 times a week.. > > On the nutch wiki you will find an intranet recrawl script. That > probably will work for you. However, I think the script has a problem > with duplicating segment data during the mergesegs step, but I've > asked about it here and haven't had any confirmations. > Well, I can confirm my index grew to ~5 GB from ~1.5 GB after (if I remember correctly) 2 recrawls. It doesn't solve the problem I was after anyway, as it only indexes pages according to the time of the last crawl, rather than crawling everything, checking if it the new content has a newer modification/creation date and indexing only that (typical intranet scenario). But I'm running like a madman in the opposite direction of the topic: please ignore me. :)
t.n.a. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
