parse-rss test problem

2007-01-25 Thread kauu
I can't test my parse-rss pluging in the nutch-0.8.1 I just can't test the default rsstest.rss file. 2007-01-25 17:04:34,703 INFO conf.Configuration (Configuration.java:getConfResourceAsInputStream(340)) - found resource parse-plugins.xml at

Re: Fetcher2

2007-01-25 Thread kauu
please give us the url,thx On 1/25/07, chee wu [EMAIL PROTECTED] wrote: Just appended the portion for .81 to NUTCH-339 - Original Message - From: Armel T. Nene [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent: Thursday, January 25, 2007 8:06 AM Subject: RE: Fetcher2 Chee,

RE: Fetcher2

2007-01-25 Thread Armel T. Nene
Kauu, The url for fetcher too is: https://issues.apache.org/jira/browse/NUTCH-339 Armel - Armel T. Nene iDNA Solutions Tel: +44 (207) 257 6124 Mobile: +44 (788) 695 0483 http://blog.idna-solutions.com -Original Message- From: kauu

Modified date in crawldb

2007-01-25 Thread Armel T. Nene
Hi guys, I am using Nutch 0.8.2-dev. I have notice that the crawldb does not actually save the last modified date of files. I have run a crawl on my local file system and the web. When I dumped the content of crawldb for both crawl, the modified date of the files were set to 01-Jan-1970

RE: Modified date in crawldb

2007-01-25 Thread Armel T. Nene
Chee, Have you successfully applied Nutch-61 to Nutch 0.8.1. I worked on the version, was able to apply fully but not entirely successful in running with the XML parser plugin. If you have applied successfully let me know. Regards, Armel - Armel

threads-safe methods in Nutch

2007-01-25 Thread Armel T. Nene
Hi guys, I know it's me again. I have been testing Nutch robustly lately and here some threads issues that I found. I am running version 0.8.2-dev. When Nutch is initially run (either from script or ANT), it has a default of 10 threads for the fetcher. This is actually good for performance

[jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer

2007-01-25 Thread Brian Whitman (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467471 ] Brian Whitman commented on NUTCH-433: - This is still not fixed in the latest nightly --

[jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer

2007-01-25 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467478 ] Andrzej Bialecki commented on NUTCH-433: - Nutch and Hadoop are separate projects, with the latter evolving

[jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer

2007-01-25 Thread Brian Whitman (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467486 ] Brian Whitman commented on NUTCH-433: - OK, understand. But the nutch nightly should at least include a version of

Re: i18n in nutch home page is misnomor

2007-01-25 Thread Doug Cutting
Teruhiko Kurosaka wrote: I suggest i18n be renamed to l10n, short for localization. Can you please file an issue in Jira for this? Ideally you could even provide a patch. The source for the website is in subversion at: http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/site Forrest

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Doug Cutting
Scott Ganyo (JIRA) wrote: ... since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter static constructor ... FYI, Hadoop no longer does this. Doug

[jira] Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer

2007-01-25 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467491 ] Sami Siren commented on NUTCH-433: -- ok, now it is committed, sorry. java.io.EOFException in newer nightlies in

Re: Modified date in crawldb

2007-01-25 Thread Andrzej Bialecki
Armel T. Nene wrote: Hi guys, I am using Nutch 0.8.2-dev. I have notice that the crawldb does not actually save the last modified date of files. I have run a crawl on my local file system and the web. When I dumped the content of crawldb for both crawl, the modified date of the files were

Re: Next Nutch release

2007-01-25 Thread Doug Cutting
Dennis Kubes wrote: Andrzej Bialecki wrote: I believe that at this point it's crucial to keep the project well-focused (at the moment I think the main focus is on larger installations, and not the small ones), and also to make Nutch attractive to developers as a reusable search engine

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
Hi Doug, So, does this render the patch that I wrote obsolete? Cheers, Chris On 1/25/07 10:08 AM, Doug Cutting [EMAIL PROTECTED] wrote: Scott Ganyo (JIRA) wrote: ... since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Doug Cutting
Chris Mattmann wrote: So, does this render the patch that I wrote obsolete? It's at least out-of-date and perhaps obsolete. A quick read of Fetcher.java looks like there might be a case where a fatal error is logged but the fetcher doesn't exit, in FetcherThread#output(). Doug

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
It's at least out-of-date and perhaps obsolete. A quick read of Fetcher.java looks like there might be a case where a fatal error is logged but the fetcher doesn't exit, in FetcherThread#output(). So this raises an interesting question: People (such as Scott G.) out there -- are you folks

Re: Modified date in crawldb

2007-01-25 Thread chee wu
Armel, Sorry,I haven't tried this patch yet.. - Original Message - From: Armel T. Nene [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent: Thursday, January 25, 2007 11:07 PM Subject: RE: Modified date in crawldb Chee, Have you successfully applied Nutch-61 to Nutch 0.8.1. I

parse-rss make them items as different pages

2007-01-25 Thread kauu
冰雪。1月24日,工作人员在德国南部的慕尼黑机场清扫飞机跑道上的积雪。 据报道,迟来的暴风雪连续两天横扫中... /description linkhttp://news.sohu.com/20070125/n247833568.shtml/link category搜狐焦点图新闻/category author[EMAIL PROTECTED]/author pubDateThu, 25 Jan 2007 11:29:11 +0800/pubDate commentshttp://comment.news.sohu.com/comment