Support for the x-robots-tag HTTP Header
Key: NUTCH-1257
URL: https://issues.apache.org/jira/browse/NUTCH-1257
Project: Nutch
Issue Type: New Feature
Components: fetcher
MoreIndexingFilter should be able to read Content-Type from both parse metadata
and content metadata
Key: NUTCH-1258
URL:
[
https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1258:
-
Attachment: NUTCH-1258-1.5-1.patch
Patch for 1.5. Adds configuration to read from contentmeta,
[
https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192992#comment-13192992
]
Markus Jelsma commented on NUTCH-1258:
--
Comments? Tested and things work as expected,
[
https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192997#comment-13192997
]
Julien Nioche commented on NUTCH-1258:
--
What about using a similar mechanism for the
[
https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193001#comment-13193001
]
Markus Jelsma commented on NUTCH-1258:
--
That may be a good idea indeed but we need to
[
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193007#comment-13193007
]
Ferdy Galema commented on NUTCH-1086:
-
Seems like a JVM bug, perhaps you could
[
https://issues.apache.org/jira/browse/NUTCH-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193019#comment-13193019
]
Markus Jelsma commented on NUTCH-1258:
--
Ah, the Content-Type detected by Tika is
[
https://issues.apache.org/jira/browse/NUTCH-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193030#comment-13193030
]
Markus Jelsma commented on NUTCH-1259:
--
A solution would be to prevent the type to be
[
https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193031#comment-13193031
]
Oleg Kalnichevski commented on NUTCH-1086:
--
For what it is worth to you,
[
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193042#comment-13193042
]
Ferdy Galema commented on NUTCH-1253:
-
Hi,
Looking at the revision history it seems
[
https://issues.apache.org/jira/browse/NUTCH-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1252:
-
Fix Version/s: 1.5
Thanks. Marked for 1.5, keeping it on the radar.
[
https://issues.apache.org/jira/browse/NUTCH-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1256:
-
Attachment: NUTCH-1256-1.5-1.patch
Patch introduces new parameter with two mandatory arguments.
[
https://issues.apache.org/jira/browse/NUTCH-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1252:
-
Thanks. Marked for 1.5, keeping it on the radar.
SegmentReader -get shows wrong
[
https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193115#comment-13193115
]
Sebastian Nagel commented on NUTCH-1113:
I had a look at the attached segment
15 matches
Mail list logo