[jira] Commented: (NUTCH-437) MapFile in Hadoop 0.10.2 has changed, must update references

2007-02-13 Thread [EMAIL PROTECTED] (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472841 ] [EMAIL PROTECTED] commented on NUTCH-437: - +1. I reviewed and applied patch along with a hadoop-0.11.1-core

[jira] Updated: (NUTCH-425) parse-js pollutes anchor text with base URL of source page

2007-01-04 Thread [EMAIL PROTECTED] (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] [EMAIL PROTECTED] updated NUTCH-425: Attachment: nutch425.patch parse-js pollutes anchor text with base URL of source page

[jira] Commented: (NUTCH-425) parse-js pollutes anchor text with base URL of source page

2007-01-04 Thread [EMAIL PROTECTED] (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462291 ] [EMAIL PROTECTED] commented on NUTCH-425: - I took a look at what is passed to parse-js both when called from

[jira] Created: (NUTCH-426) parse-js skips parsing if found URL fails java.net.URL parse

2007-01-04 Thread [EMAIL PROTECTED] (JIRA)
parse-js skips parsing if found URL fails java.net.URL parse Key: NUTCH-426 URL: https://issues.apache.org/jira/browse/NUTCH-426 Project: Nutch Issue Type: Bug

[jira] Updated: (NUTCH-426) parse-js skips parsing if found URL fails java.net.URL parse

2007-01-04 Thread [EMAIL PROTECTED] (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] [EMAIL PROTECTED] updated NUTCH-426: Attachment: nutch426.patch parse-js skips parsing if found URL fails java.net.URL parse

[jira] Commented: (NUTCH-426) parse-js skips parsing if found URL fails java.net.URL parse

2007-01-04 Thread [EMAIL PROTECTED] (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462307 ] [EMAIL PROTECTED] commented on NUTCH-426: - Just attached a patch that catches the MalformedURLException, logs

[jira] Created: (NUTCH-423) Add other index-basic fields as query plugins

2006-12-28 Thread [EMAIL PROTECTED] (JIRA)
Add other index-basic fields as query plugins - Key: NUTCH-423 URL: http://issues.apache.org/jira/browse/NUTCH-423 Project: Nutch Issue Type: Improvement Components: searcher Affects

[jira] Updated: (NUTCH-423) Add other index-basic fields as query plugins

2006-12-28 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-423?page=all ] [EMAIL PROTECTED] updated NUTCH-423: Attachment: other-index-basic-query-fields.patch Add other index-basic fields as query plugins -

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-20 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: fixIllegalXmlChars08-v5.patch No, the double call to getLegalXml is not intentional. Its a mistake. Thanks for finding it. I've attached

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-19 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: fixIllegalXmlChars08-v4.patch v3 mistakenly included debugging code. Attached cleaned up v4. OpenSearchServlet outputs illegal xml

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-16 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: fixIllegalXmlChars08-v3.patch Version of patch that doesn't ...process the String twice if it contains some illegal characters!. Its name

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-16 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Version: 0.8-dev (was: 0.7) Was version 0.7. Changed 'Affects Version' to 0.8-dev. OpenSearchServlet outputs illegal xml characters

[jira] Created: (NUTCH-269) CrawlDbReducer: OOME because no upper-bound on inlinks count

2006-05-15 Thread [EMAIL PROTECTED] (JIRA)
CrawlDbReducer: OOME because no upper-bound on inlinks count Key: NUTCH-269 URL: http://issues.apache.org/jira/browse/NUTCH-269 Project: Nutch Type: Bug Reporter: [EMAIL PROTECTED] Priority:

[jira] Updated: (NUTCH-269) CrawlDbReducer: OOME because no upper-bound on inlinks count

2006-05-15 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-269?page=all ] [EMAIL PROTECTED] updated NUTCH-269: Attachment: too-many-links.patch Add configurable upper limit to amount of links we'll read. CrawlDbReducer: OOME because no upper-bound on inlinks

[jira] Updated: (NUTCH-269) CrawlDbReducer: OOME because no upper-bound on inlinks count

2006-05-15 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-269?page=all ] [EMAIL PROTECTED] updated NUTCH-269: Attachment: too-many-links2.patch Previous patch is useless. This one actually breaks the loop. CrawlDbReducer: OOME because no upper-bound on

[jira] Created: (NUTCH-257) Summary#toString always Entity encodes -- problem for OpenSearchServlet#description field

2006-04-28 Thread [EMAIL PROTECTED] (JIRA)
Summary#toString always Entity encodes -- problem for OpenSearchServlet#description field - Key: NUTCH-257 URL: http://issues.apache.org/jira/browse/NUTCH-257 Project: Nutch

[jira] Commented: (NUTCH-257) Summary#toString always Entity encodes -- problem for OpenSearchServlet#description field

2006-04-28 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-257?page=comments#action_12376997 ] [EMAIL PROTECTED] commented on NUTCH-257: - I took a closer look. Turns out Summary is inherently all about rendering HTML (See the different Summary.Fragment

[jira] Commented: (NUTCH-256) Cannot open filename ....index.done.crc

2006-04-28 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-256?page=comments#action_12376999 ] [EMAIL PROTECTED] commented on NUTCH-256: - Works for me. Thanks. Please close as fixed. Cannot open filename index.done.crc

[jira] Created: (NUTCH-256) Cannot open filename ....index.done.crc

2006-04-27 Thread [EMAIL PROTECTED] (JIRA)
Cannot open filename index.done.crc --- Key: NUTCH-256 URL: http://issues.apache.org/jira/browse/NUTCH-256 Project: Nutch Type: Bug Components: indexer Versions: 0.8-dev Reporter: [EMAIL PROTECTED]

[jira] Updated: (NUTCH-256) Cannot open filename ....index.done.crc

2006-04-27 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-256?page=all ] [EMAIL PROTECTED] updated NUTCH-256: Attachment: index.done.crc.patch Ensure creation of companion index.done .crc file Cannot open filename index.done.crc

[jira] Created: (NUTCH-190) ParseUtil drops reason for failed parse

2006-01-26 Thread [EMAIL PROTECTED] (JIRA)
ParseUtil drops reason for failed parse --- Key: NUTCH-190 URL: http://issues.apache.org/jira/browse/NUTCH-190 Project: Nutch Type: Bug Components: fetcher Versions: 0.8-dev Environment: linux Reporter: [EMAIL

[jira] Updated: (NUTCH-190) ParseUtil drops reason for failed parse

2006-01-26 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-190?page=all ] [EMAIL PROTECTED] updated NUTCH-190: Attachment: ParseUtil_drops_failure_reason.patch Attached is a suggested patch against revision 369598. ParseUtil drops reason for failed parse

[jira] Commented: (NUTCH-190) ParseUtil drops reason for failed parse

2006-01-26 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-190?page=comments#action_12364145 ] [EMAIL PROTECTED] commented on NUTCH-190: - Here's an example of failure output after patch is applied: 060126 141413 task_m_bx2ifn Error parsing:

[jira] Commented: (NUTCH-130) Be explicit about target JVM when building (1.4.x?)

2005-11-30 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-130?page=comments#action_12358981 ] [EMAIL PROTECTED] commented on NUTCH-130: - Need to do same for plugin compile: $ /usr/local/bin/svn diff src/plugin/build-plugin.xml Index:

[jira] Created: (NUTCH-130) Be explicit about target JVM when building (1.4.x?)

2005-11-29 Thread [EMAIL PROTECTED] (JIRA)
Be explicit about target JVM when building (1.4.x?) --- Key: NUTCH-130 URL: http://issues.apache.org/jira/browse/NUTCH-130 Project: Nutch Type: Improvement Reporter: [EMAIL PROTECTED] Priority: Minor Below is

[jira] Commented: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-11-10 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=comments#action_12357300 ] [EMAIL PROTECTED] commented on NUTCH-110: - Scrub NUTCH-110-version2.patch. This patch double-encode certain entities (First by the new toValidXmlText method, second by

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-14 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: NUTCH-110-version2.patch Patch version 2. This patch benefits from discussion held up on nutch dev list. This patch differs from the first

[jira] Created: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-12 Thread [EMAIL PROTECTED] (JIRA)
OpenSearchServlet outputs illegal xml characters Key: NUTCH-110 URL: http://issues.apache.org/jira/browse/NUTCH-110 Project: Nutch Type: Bug Components: searcher Versions: 0.7 Environment: linux, jdk 1.5

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-12 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: fixIllegalXmlChars.patch Attached patch runs all xml text through a check for bad xml characters. This patch is brutal dropping silently