[
https://issues.apache.org/jira/browse/NUTCH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472841
]
[EMAIL PROTECTED] commented on NUTCH-437:
-
+1. I reviewed and applied patch along with a hadoop-0.11.1-core
[
https://issues.apache.org/jira/browse/NUTCH-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
[EMAIL PROTECTED] updated NUTCH-425:
Attachment: nutch425.patch
parse-js pollutes anchor text with base URL of source page
[
https://issues.apache.org/jira/browse/NUTCH-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462291
]
[EMAIL PROTECTED] commented on NUTCH-425:
-
I took a look at what is passed to parse-js both when called from
parse-js skips parsing if found URL fails java.net.URL parse
Key: NUTCH-426
URL: https://issues.apache.org/jira/browse/NUTCH-426
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
[EMAIL PROTECTED] updated NUTCH-426:
Attachment: nutch426.patch
parse-js skips parsing if found URL fails java.net.URL parse
[
https://issues.apache.org/jira/browse/NUTCH-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462307
]
[EMAIL PROTECTED] commented on NUTCH-426:
-
Just attached a patch that catches the MalformedURLException, logs
Add other index-basic fields as query plugins
-
Key: NUTCH-423
URL: http://issues.apache.org/jira/browse/NUTCH-423
Project: Nutch
Issue Type: Improvement
Components: searcher
Affects
[ http://issues.apache.org/jira/browse/NUTCH-423?page=all ]
[EMAIL PROTECTED] updated NUTCH-423:
Attachment: other-index-basic-query-fields.patch
Add other index-basic fields as query plugins
-
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
[EMAIL PROTECTED] updated NUTCH-110:
Attachment: fixIllegalXmlChars08-v5.patch
No, the double call to getLegalXml is not intentional. Its a mistake. Thanks
for finding it.
I've attached
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
[EMAIL PROTECTED] updated NUTCH-110:
Attachment: fixIllegalXmlChars08-v4.patch
v3 mistakenly included debugging code.
Attached cleaned up v4.
OpenSearchServlet outputs illegal xml
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
[EMAIL PROTECTED] updated NUTCH-110:
Attachment: fixIllegalXmlChars08-v3.patch
Version of patch that doesn't ...process the String twice if it contains some
illegal characters!. Its name
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
[EMAIL PROTECTED] updated NUTCH-110:
Version: 0.8-dev
(was: 0.7)
Was version 0.7. Changed 'Affects Version' to 0.8-dev.
OpenSearchServlet outputs illegal xml characters
CrawlDbReducer: OOME because no upper-bound on inlinks count
Key: NUTCH-269
URL: http://issues.apache.org/jira/browse/NUTCH-269
Project: Nutch
Type: Bug
Reporter: [EMAIL PROTECTED]
Priority:
[ http://issues.apache.org/jira/browse/NUTCH-269?page=all ]
[EMAIL PROTECTED] updated NUTCH-269:
Attachment: too-many-links.patch
Add configurable upper limit to amount of links we'll read.
CrawlDbReducer: OOME because no upper-bound on inlinks
[ http://issues.apache.org/jira/browse/NUTCH-269?page=all ]
[EMAIL PROTECTED] updated NUTCH-269:
Attachment: too-many-links2.patch
Previous patch is useless. This one actually breaks the loop.
CrawlDbReducer: OOME because no upper-bound on
Summary#toString always Entity encodes -- problem for
OpenSearchServlet#description field
-
Key: NUTCH-257
URL: http://issues.apache.org/jira/browse/NUTCH-257
Project: Nutch
[
http://issues.apache.org/jira/browse/NUTCH-257?page=comments#action_12376997 ]
[EMAIL PROTECTED] commented on NUTCH-257:
-
I took a closer look. Turns out Summary is inherently all about rendering HTML
(See the different Summary.Fragment
[
http://issues.apache.org/jira/browse/NUTCH-256?page=comments#action_12376999 ]
[EMAIL PROTECTED] commented on NUTCH-256:
-
Works for me. Thanks. Please close as fixed.
Cannot open filename index.done.crc
Cannot open filename index.done.crc
---
Key: NUTCH-256
URL: http://issues.apache.org/jira/browse/NUTCH-256
Project: Nutch
Type: Bug
Components: indexer
Versions: 0.8-dev
Reporter: [EMAIL PROTECTED]
[ http://issues.apache.org/jira/browse/NUTCH-256?page=all ]
[EMAIL PROTECTED] updated NUTCH-256:
Attachment: index.done.crc.patch
Ensure creation of companion index.done .crc file
Cannot open filename index.done.crc
ParseUtil drops reason for failed parse
---
Key: NUTCH-190
URL: http://issues.apache.org/jira/browse/NUTCH-190
Project: Nutch
Type: Bug
Components: fetcher
Versions: 0.8-dev
Environment: linux
Reporter: [EMAIL
[ http://issues.apache.org/jira/browse/NUTCH-190?page=all ]
[EMAIL PROTECTED] updated NUTCH-190:
Attachment: ParseUtil_drops_failure_reason.patch
Attached is a suggested patch against revision 369598.
ParseUtil drops reason for failed parse
[
http://issues.apache.org/jira/browse/NUTCH-190?page=comments#action_12364145 ]
[EMAIL PROTECTED] commented on NUTCH-190:
-
Here's an example of failure output after patch is applied:
060126 141413 task_m_bx2ifn Error parsing:
[
http://issues.apache.org/jira/browse/NUTCH-130?page=comments#action_12358981 ]
[EMAIL PROTECTED] commented on NUTCH-130:
-
Need to do same for plugin compile:
$ /usr/local/bin/svn diff src/plugin/build-plugin.xml
Index:
Be explicit about target JVM when building (1.4.x?)
---
Key: NUTCH-130
URL: http://issues.apache.org/jira/browse/NUTCH-130
Project: Nutch
Type: Improvement
Reporter: [EMAIL PROTECTED]
Priority: Minor
Below is
[
http://issues.apache.org/jira/browse/NUTCH-110?page=comments#action_12357300 ]
[EMAIL PROTECTED] commented on NUTCH-110:
-
Scrub NUTCH-110-version2.patch. This patch double-encode certain entities
(First by the new toValidXmlText method, second by
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
[EMAIL PROTECTED] updated NUTCH-110:
Attachment: NUTCH-110-version2.patch
Patch version 2. This patch benefits from discussion held up on nutch dev
list. This patch differs from the first
OpenSearchServlet outputs illegal xml characters
Key: NUTCH-110
URL: http://issues.apache.org/jira/browse/NUTCH-110
Project: Nutch
Type: Bug
Components: searcher
Versions: 0.7
Environment: linux, jdk 1.5
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
[EMAIL PROTECTED] updated NUTCH-110:
Attachment: fixIllegalXmlChars.patch
Attached patch runs all xml text through a check for bad xml characters. This
patch is brutal dropping silently
29 matches
Mail list logo