Hi All,
I am new Using Apache Nutch to crawl some sites , filter and get
content on the base of word not on the base of url. e.g.
1. I have to crawl those sites that contain words like 'shop' or 'product'
in contents(text). if these word not exists then not crawl further
Markus Jelsma created NUTCH-2419:
Summary: Domain blacklist URL filter does not respect command-line
override for file
Key: NUTCH-2419
URL: https://issues.apache.org/jira/browse/NUTCH-2419
Project:
[
https://issues.apache.org/jira/browse/NUTCH-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154937#comment-16154937
]
ASF GitHub Bot commented on NUTCH-2375:
---
Omkar20895 commented on issue #188: NUTCH-2375 Upgrade the
[
https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2419:
-
Attachment: NUTCH-2419.patch
Patch for trunk!
> Domain blacklist URL filter does not respect
[
https://issues.apache.org/jira/browse/NUTCH-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2417:
-
Attachment: NUTCH-2417.patch
Patch for trnk!
> Support for variable fetch delay via
[
https://issues.apache.org/jira/browse/NUTCH-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155138#comment-16155138
]
Markus Jelsma commented on NUTCH-2417:
--
No patch, wrong ticket!
> Support for variable fetch delay
[
https://issues.apache.org/jira/browse/NUTCH-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2417:
-
Attachment: (was: NUTCH-2417.patch)
> Support for variable fetch delay via FreeGenerator
>
[
https://issues.apache.org/jira/browse/NUTCH-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156020#comment-16156020
]
ASF GitHub Bot commented on NUTCH-2375:
---
lewismc commented on issue #188: NUTCH-2375 Upgrade the
Hi user@ and dev@,
As part of the Nutch Google Summer of Code effort this year, Omkar Reddy
and I have been working persistently throughout the summer months on the
Hadoop MapReduce API upgrade e.g. NUTCH-2375 Upgrade the code base from
org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
9 matches
Mail list logo