Re: Website (crawler for) indexing

Dominique Bejean Fri, 07 Sep 2012 08:26:59 -0700

May be you can take a look at Crawl-Anywhere which have administrationweb interface, solr indexer and search web application.


www.crawl-anywhere.com


Regards.

Dominique

Le 05/09/12 17:05, Lochschmied, Alexander a écrit :

This may be a bit off topic: How do you index an existing website and control 
the data going into index?

We already have Java code to process the HTML (or XHTML) and turn it into a 
SolrJ Document (removing tags and other things we do not want in the index). We 
use SolrJ for indexing.
So I guess the question is essentially which Java crawler could be useful.

We used to use wget on command line in our publishing process, but we do no 
longer want to do that.

Thanks,
Alexander

Re: Website (crawler for) indexing

Reply via email to