Re: 答复: Someone Please respond ... Delet ing Urls already crawled from the crawlDB

2008-05-05 Thread oddaniel
be specified in arguments. -邮件原件- 发件人: oddaniel [mailto:[EMAIL PROTECTED] 发送时间: 2008年5月5日 13:27 收件人: nutch-user@lucene.apache.org 主题: Someone Please respond ... Deleting Urls already crawled from the crawlDB Guys i have been trying to get this done for weeks now. No progress

Someone Please respond ... Deleting Urls already crawled from the crawlDB

2008-05-04 Thread oddaniel
Guys i have been trying to get this done for weeks now. No progress. Someone please help me. I am trying to delete a domain already crawled from my crawldb and index. I have a list of domains already crawled in my index. How do I exclude or delete domains from my crawl output folder. I have

Re: Delete Urls from CrawlsDB

2008-05-02 Thread oddaniel
-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: oddaniel [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Saturday, April 19, 2008 4:20:04 AM Subject: Delete Urls from CrawlsDB Is it possible to remove or delete one of the urls that has

Searching For Images

2008-04-21 Thread oddaniel
Is it possible to index images with nutch? Please how can this be done. Any article or sample code will be very helpful. Thanks. A nudge in the right direction will be ok. Thanks. -- View this message in context: http://www.nabble.com/Searching-For-Images-tp16807326p16807326.html Sent from the

Delete Urls from CrawlsDB

2008-04-19 Thread oddaniel
Is it possible to remove or delete one of the urls that has been crawled from the crawl database? If this is possible, how can it be done? -- View this message in context: http://www.nabble.com/Delete-Urls-from-CrawlsDB-tp16773512p16773512.html Sent from the Nutch - User mailing list archive at

Search for Just PDF documents

2008-04-16 Thread oddaniel
Hi please how can I do a Nutch search for just PDF document results only. Thanks. Daniel -- View this message in context: http://www.nabble.com/Search-for-Just-PDF-documents-tp16721681p16721681.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: java.io.IOException: No input paths specified in input

2008-04-15 Thread oddaniel
); . . . . oddaniel wrote: I am trying to merge two crawl results. 1. Merge linkdbs - WORKS FINE. 2. Merge crawldbs - WORKS FINE. For some reason I keep getting java.io.IOException: No input paths specified in input whern trying to Merge Segments. Can Anyone please tell me what could be causing

Merging Two Crawls

2008-04-12 Thread oddaniel
Please how can i merge two different crawls? Can I do this from within my Java class? And how please? I have searched through the forum and all i see is scripts on how to do this. I dont have a clue how to get this done from with the actual java code. A nudge in the right direction would be

java.io.IOException: No input paths specified in input

2008-04-12 Thread oddaniel
I am trying to merge two crawl results. 1. Merge linkdbs - WORKS FINE. 2. Merge crawldbs - WORKS FINE. For some reason I keep getting java.io.IOException: No input paths specified in input whern trying to Merge Segments. Can Anyone please tell me what could be causing this and how to fix it.