Hi, crawl anywhere seems to using old versions of java, tomcat, etc.
http://www.crawl-anywhere.com/installation-v300/ Will it work with new versions of these required software ? Is there updated installation guide available ? Thanks Rajesh On Wed, May 22, 2013 at 6:48 PM, Dominique Bejean <dominique.bej...@eolya.fr > wrote: > Hi, > > Crawl-Anywhere is now open-source - https://github.com/bejean/** > crawl-anywhere <https://github.com/bejean/crawl-anywhere> > > Best regards. > > > Le 02/03/11 10:02, findbestopensource a écrit : > >> Hello Dominique Bejean, >> >> Good job. >> >> We identified almost 8 open source web crawlers >> http://www.findbestopensource.**com/tagged/webcrawler<http://www.findbestopensource.com/tagged/webcrawler> >> I don't know how far yours would be different from the rest. >> >> Your license states that it is not open source but it is free for >> personnel use. >> >> Regards >> Aditya >> www.findbestopensource.com >> <http://www.**findbestopensource.com<http://www.findbestopensource.com> >> > >> >> >> On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean < >> dominique.bej...@eolya.fr >> <mailto:dominique.bejean@**eolya.fr<dominique.bej...@eolya.fr>>> >> wrote: >> >> Hi, >> >> I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java >> Web Crawler. It includes : >> >> * a crawler >> * a document processing pipeline >> * a solr indexer >> >> The crawler has a web administration in order to manage web sites >> to be crawled. Each web site crawl is configured with a lot of >> possible parameters (no all mandatory) : >> >> * number of simultaneous items crawled by site >> * recrawl period rules based on item type (html, PDF, …) >> * item type inclusion / exclusion rules >> * item path inclusion / exclusion / strategy rules >> * max depth >> * web site authentication >> * language >> * country >> * tags >> * collections >> * ... >> >> The pileline includes various ready to use stages (text >> extraction, language detection, Solr ready to index xml writer, ...). >> >> All is very configurable and extendible either by scripting or >> java coding. >> >> With scripting technology, you can help the crawler to handle >> javascript links or help the pipeline to extract relevant title >> and cleanup the html pages (remove menus, header, footers, ..) >> >> With java coding, you can develop your own pipeline stage stage >> >> The Crawl Anywhere web site provides good explanations and screen >> shots. All is documented in a wiki. >> >> The current version is 1.1.4. You can download and try it out from >> here : www.crawl-anywhere.com <http://www.crawl-anywhere.com**> >> >> >> Regards >> >> Dominique >> >> >> > -- > Dominique Béjean > +33 6 08 46 12 43 > skype: dbejean > www.eolya.fr > www.crawl-anywhere.com > www.mysolrserver.com > >