Further to this... if you really struggling for time please check out the
older scripts which now live in the archive section of our wiki.

They will provide some basis for you to pick up from where Markus'
references sent you.

On Wed, Oct 5, 2011 at 12:44 AM, Markus Jelsma
<[email protected]>wrote:

> Check the wiki, it's there:
>
> http://wiki.apache.org/nutch/NutchTutorial
> http://wiki.apache.org/nutch/CommandLineOptions
> http://wiki.apache.org/nutch/FAQ
>
> The configuration explains a lot as well:
>
> http://svn.apache.org/viewvc/nutch/trunk/conf/nutch-default.xml?view=markup
>
>
> > I'm having a hard time figuring out how to get a simple crawl working for
> 4
> > websites we'd like to add to an existing Solr index.
> >
> > It seems like the requirements are pretty basic:
> >
> > - 4 websites
> > - Recrawl every however often (weekly? daily?)
> > - Update existing Solr index that a Drupal installation is also updating
> > - Remove pages that 404 that existed previously
> >
> > The Drupal part is all working, the Drupal and Nutch-crawled pages both
> > come up and work correctly when doing a search on the website.
> >
> > So what I need help with is figuring out a crawl script that will update
> > the index and also remove deleted pages.
> >
> > I've been searching for quite some time, but none of the scripts that
> I've
> > found seem to be updated to work with Nutch 1.3 correctly, and none of
> > them remove the pages that 404 from the index.
> >
> > Can anyone offer any suggestions?
> >
> > Thanks!
> >
> > -Karl
>



-- 
*Lewis*

Reply via email to