RE: [ANNOUNCE] Web Crawler

Thumuluri, Sai Wed, 02 Mar 2011 06:05:06 -0800

Dominique, Does your crawler support NTLM2 authentication? We have content 
under SiteMinder which uses NTLM2 and that is posing challenges with Nutch?


-----Original Message-----
From: Dominique Bejean [mailto:dominique.bej...@eolya.fr] 
Sent: Wednesday, March 02, 2011 6:22 AM
To: solr-user@lucene.apache.org
Subject: Re: [ANNOUNCE] Web Crawler

Aditya,

The crawler is not open source and won't be in the next future. Anyway, 
I have to change the license because it can be use for any personal or 
commercial projects.

Sincerely,

Dominique

Le 02/03/11 10:02, findbestopensource a écrit :
> Hello Dominique Bejean,
>
> Good job.
>
> We identified almost 8 open source web crawlers 
> http://www.findbestopensource.com/tagged/webcrawler   I don't know how 
> far yours would be different from the rest.
>
> Your license states that it is not open source but it is free for 
> personnel use.
>
> Regards
> Aditya
> www.findbestopensource.com <http://www.findbestopensource.com>
>
>
> On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean 
> <dominique.bej...@eolya.fr <mailto:dominique.bej...@eolya.fr>> wrote:
>
>     Hi,
>
>     I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java
>     Web Crawler. It includes :
>
>       * a crawler
>       * a document processing pipeline
>       * a solr indexer
>
>     The crawler has a web administration in order to manage web sites
>     to be crawled. Each web site crawl is configured with a lot of
>     possible parameters (no all mandatory) :
>
>       * number of simultaneous items crawled by site
>       * recrawl period rules based on item type (html, PDF, ...)
>       * item type inclusion / exclusion rules
>       * item path inclusion / exclusion / strategy rules
>       * max depth
>       * web site authentication
>       * language
>       * country
>       * tags
>       * collections
>       * ...
>
>     The pileline includes various ready to use stages (text
>     extraction, language detection, Solr ready to index xml writer, ...).
>
>     All is very configurable and extendible either by scripting or
>     java coding.
>
>     With scripting technology, you can help the crawler to handle
>     javascript links or help the pipeline to extract relevant title
>     and cleanup the html pages (remove menus, header, footers, ..)
>
>     With java coding, you can develop your own pipeline stage stage
>
>     The Crawl Anywhere web site provides good explanations and screen
>     shots. All is documented in a wiki.
>
>     The current version is 1.1.4. You can download and try it out from
>     here : www.crawl-anywhere.com <http://www.crawl-anywhere.com>
>
>
>     Regards
>
>     Dominique
>
>

RE: [ANNOUNCE] Web Crawler

Reply via email to