RE: [ANNOUNCE] Web Crawler

2013-07-15 Thread Ramakrishna
he web-page.Can i do this.. I think rest of the parts(segments,updater,indexer,parser) I've to write all these things, I think it'll(Html-parser) not work with the already existing (parts) if i replace fetcher with Html-parser. -- View this message in context: http://lucene.472066.n

Re: [ANNOUNCE] Web Crawler

2013-07-15 Thread Ramakrishna
he web-page.Can i do this.. I think rest of the parts(segments,updater,indexer,parser) I've to write all these things, I think it'll(Html-parser) not work with the already existing (parts) if i replace fetcher with Html-parser. -- View this message in context: http://lucene.472066.n

RE: [ANNOUNCE] Web Crawler

2013-07-15 Thread karl.wright
To: java-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Lucene does not provide any capabilities for crawling websites. You would have to contact the Nutch project, the ManifoldCF project, or other web crawling projects. As far as bypassing robots.txt, that is a very unethical thing

Re: [ANNOUNCE] Web Crawler

2013-07-15 Thread Jack Krupansky
anybody on this mailing list would engage in such an unethical or unprofessional activity. -- Jack Krupansky -Original Message- From: Ramakrishna Sent: Monday, July 15, 2013 9:13 AM To: java-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Hi.. I'm trying nutch to

Re: [ANNOUNCE] Web Crawler

2013-07-15 Thread Ramakrishna
else plz suggest me which are the crawlers to use to crawl web-sites without bothering about robots.txt of that particular site. Its urgent plz reply as soon as possible. Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p4078039

Re: [ANNOUNCE] Web Crawler

2011-05-27 Thread Dominique Bejean
Crawl Anywehre? Concern with crawl anywhere is it supports solr 1.3 index not the latest version Any help on the is really appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p2937762.html Sent from the Lucene - Java Users mailing list archi

Re: [ANNOUNCE] Web Crawler

2011-05-16 Thread abhayd
in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p2947623.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac

Re: [ANNOUNCE] Web Crawler

2011-05-16 Thread Julien Nioche
> I dont see any activities on Nutch wiki so wondering if its not being > developed anymore. But most forums say Nutch is standard for solr. > Looking at the mail archives is a good clue of whether a project is still alive or not. In the case of Nutch, the project is active as you can see on the l

RE: [ANNOUNCE] Web Crawler

2011-05-15 Thread karl.wright
You might want to look at ManifoldCF also. Karl -Original Message- From: ext abhayd [mailto:ajdabhol...@hotmail.com] Sent: Saturday, May 14, 2011 9:29 AM To: java-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler hi Dominique, I am looking for a crawler to feed solr index

Re: [ANNOUNCE] Web Crawler

2011-05-15 Thread abhayd
/ANNOUNCE-Web-Crawler-tp2607833p2937762.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h

[ANNOUNCE] Web Crawler

2011-03-01 Thread Dominique Bejean
Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a lo