Re: [ANNOUNCE] Web Crawler

Rajesh Nikam Wed, 22 May 2013 22:39:22 -0700

Hi,

crawl anywhere seems to using old versions of java, tomcat, etc.


http://www.crawl-anywhere.com/installation-v300/

Will it work with new versions of these required software ?

Is there updated installation guide available ?

Thanks
Rajesh





On Wed, May 22, 2013 at 6:48 PM, Dominique Bejean <dominique.bej...@eolya.fr
> wrote:

> Hi,
>
> Crawl-Anywhere is now open-source - https://github.com/bejean/**
> crawl-anywhere <https://github.com/bejean/crawl-anywhere>
>
> Best regards.
>
>
> Le 02/03/11 10:02, findbestopensource a écrit :
>
>> Hello Dominique Bejean,
>>
>> Good job.
>>
>> We identified almost 8 open source web crawlers
>> http://www.findbestopensource.**com/tagged/webcrawler<http://www.findbestopensource.com/tagged/webcrawler>
>>   I don't know how far yours would be different from the rest.
>>
>> Your license states that it is not open source but it is free for
>> personnel use.
>>
>> Regards
>> Aditya
>> www.findbestopensource.com 
>> <http://www.**findbestopensource.com<http://www.findbestopensource.com>
>> >
>>
>>
>> On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean <
>> dominique.bej...@eolya.fr 
>> <mailto:dominique.bejean@**eolya.fr<dominique.bej...@eolya.fr>>>
>> wrote:
>>
>>     Hi,
>>
>>     I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java
>>     Web Crawler. It includes :
>>
>>       * a crawler
>>       * a document processing pipeline
>>       * a solr indexer
>>
>>     The crawler has a web administration in order to manage web sites
>>     to be crawled. Each web site crawl is configured with a lot of
>>     possible parameters (no all mandatory) :
>>
>>       * number of simultaneous items crawled by site
>>       * recrawl period rules based on item type (html, PDF, …)
>>       * item type inclusion / exclusion rules
>>       * item path inclusion / exclusion / strategy rules
>>       * max depth
>>       * web site authentication
>>       * language
>>       * country
>>       * tags
>>       * collections
>>       * ...
>>
>>     The pileline includes various ready to use stages (text
>>     extraction, language detection, Solr ready to index xml writer, ...).
>>
>>     All is very configurable and extendible either by scripting or
>>     java coding.
>>
>>     With scripting technology, you can help the crawler to handle
>>     javascript links or help the pipeline to extract relevant title
>>     and cleanup the html pages (remove menus, header, footers, ..)
>>
>>     With java coding, you can develop your own pipeline stage stage
>>
>>     The Crawl Anywhere web site provides good explanations and screen
>>     shots. All is documented in a wiki.
>>
>>     The current version is 1.1.4. You can download and try it out from
>>     here : www.crawl-anywhere.com <http://www.crawl-anywhere.com**>
>>
>>
>>     Regards
>>
>>     Dominique
>>
>>
>>
> --
> Dominique Béjean
> +33 6 08 46 12 43
> skype: dbejean
> www.eolya.fr
> www.crawl-anywhere.com
> www.mysolrserver.com
>
>

Re: [ANNOUNCE] Web Crawler

Reply via email to