[
https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-546:
Attachment: NUTCH-546-validator-plugin_v1.patch
Here is a patch that removes UrlValidator code from
[
https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525419
]
Doğacan Güney commented on NUTCH-524:
-
Hi Ian and Daniel,
Have you tried max.threads.per.host option? Or are you
[
https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525452
]
Emmanuel Joke commented on NUTCH-548:
-
My mistake, you re right i was using the command crawl to make my test,
[
https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525475
]
Andrzej Bialecki commented on NUTCH-530:
-
I'm still against this patch, exactly because we are not sure how
Hello everybody,
I'm working on a project that is essentially a searchable database for
academic citations at the University of Pittsburgh. One of our
searching requirements was to be able to break the search results into
sections--in order to do this, I implemented something similar to
Google's
Hi,
I have noticed that Nutch considers img/@src as an outlink. I suppose in many
cases people do not want to threat image as an outlink. At least I don't want.
The same case is with script/@src. But, it seems there is no way to limit
outlink tags. The DOMContentUtils.getOutlinks() takes links
Bug
---
Key: NUTCH-549
URL: https://issues.apache.org/jira/browse/NUTCH-549
Project: Nutch
Issue Type: Bug
Reporter: crossany
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a