[jira] Updated: (NUTCH-162) country code jp is used instead of language code ja for Japanese

2006-10-23 Thread KuroSaka TeruHiko (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-162?page=all ] KuroSaka TeruHiko updated NUTCH-162: It seems many .html files are actually generated by ant target generate-docs in build.xml, and only these four changes are needed to fix this bug: mv

[jira] Commented: (NUTCH-385) Server delay feature conflicts with maxThreadsPerHost

2006-10-23 Thread Ken Krugler (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-385?page=comments#action_12444162 ] Ken Krugler commented on NUTCH-385: --- There is a middle ground, though we don't know yet how important it is to address. When we crawl partner sites, we

[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-10-23 Thread Rida Benjelloun (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ] Rida Benjelloun updated NUTCH-185: -- Attachment: parse-xml.zip Hi, The plugin parse-xml has been updated. I have tested it with 0.8.1 version. The plugin fix also the bug related the

[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-10-23 Thread Rida Benjelloun (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ] Rida Benjelloun updated NUTCH-185: -- Affects Version/s: 0.8.1 0.8 XMLParser is configurable xml parser plugin.

outlink extractor finds lots of junk

2006-10-23 Thread AJ Chen
During fetching, OutlinkExtractor.getOutlinks() finds lots of junk, such as the following: rdf:about= xmlns:pdf= http://ns.adobe.com/pdf/1.3/ pdf:Producer pdf:Producer rdf:Description rdf:Description rdf:about= xmlns:xap= http://ns.adobe.com/xap/1.0/ xap:CreatorTool xap:CreatorTool xap:ModifyDate

[jira] Commented: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-10-23 Thread nutch.newbie (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-185?page=comments#action_12444205 ] nutch.newbie commented on NUTCH-185: Thank you very much! I will be giving it a go now. Will this plugin be added to the Nutch trunk as a part of distribution?