[
https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-956:
-
Attachment: solr.patch2
- NPE related to content-type field
- tld field in Solr schema
- string comparison in
[
https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-965:
-
Summary: Skip parsing for truncated documents (was: Parsing takes up 100%
CPU)
Skip parsing for truncated
[
https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-965:
-
Attachment: parserJob.patch
In the parser mapper, compare Content-Length header to the size of the content
[
https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983125#action_12983125
]
Alexis commented on NUTCH-955:
--
Sorry please disregard the nutch.root first bullet in the
soldindex issues
Key: NUTCH-956
URL: https://issues.apache.org/jira/browse/NUTCH-956
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 2.0
Reporter: Alexis
I ran into a few
[
https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-956:
-
Attachment: solr.patch
Here are the changes:
- Avoid multiple values for id field. (NUTCH-819)
- Allow multiple
Ivy configuration
-
Key: NUTCH-955
URL: https://issues.apache.org/jira/browse/NUTCH-955
Project: Nutch
Issue Type: Improvement
Components: build
Affects Versions: 2.0
Reporter: Alexis
As mentioned
[
https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-955:
-
Attachment: ivy.patch
In the patch, the required dependencies for MySQL and HBase are included in the
Ivy
Content-Length limit, URL filter and few minor issues
-
Key: NUTCH-950
URL: https://issues.apache.org/jira/browse/NUTCH-950
Project: Nutch
Issue Type: Bug
Affects Versions: 2.0
[
https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-950:
-
Attachment: nutch4.patch
Content-Length limit, URL filter and few minor issues
[
https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexis updated NUTCH-899:
-
Attachment: httpContentLimit.patch
We stick with the default gora schema for the MySQL backend, which says
[
https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970336#action_12970336
]
Alexis commented on NUTCH-899:
--
I ran into the exact same issue, with MySQL. The blob column
12 matches
Mail list logo