[jira] Updated: (NUTCH-546) file URL are filtered out by the crawler

2007-09-06 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-546: Attachment: NUTCH-546-validator-plugin_v1.patch Here is a patch that removes UrlValidator code from

[jira] Commented: (NUTCH-524) Generate Problem with Single Node

2007-09-06 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525419 ] Doğacan Güney commented on NUTCH-524: - Hi Ian and Daniel, Have you tried max.threads.per.host option? Or are you

[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat

2007-09-06 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525452 ] Emmanuel Joke commented on NUTCH-548: - My mistake, you re right i was using the command crawl to make my test,

[jira] Commented: (NUTCH-530) Add a combiner to improve performance on updatedb

2007-09-06 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525475 ] Andrzej Bialecki commented on NUTCH-530: - I'm still against this patch, exactly because we are not sure how

Labeling URLs a-la Google

2007-09-06 Thread Jeff Maki
Hello everybody, I'm working on a project that is essentially a searchable database for academic citations at the University of Pittsburgh. One of our searching requirements was to be able to break the search results into sections--in order to do this, I implemented something similar to Google's

Limiting outlink tags.

2007-09-06 Thread Marcin Okraszewski
Hi, I have noticed that Nutch considers img/@src as an outlink. I suppose in many cases people do not want to threat image as an outlink. At least I don't want. The same case is with script/@src. But, it seems there is no way to limit outlink tags. The DOMContentUtils.getOutlinks() takes links

[jira] Created: (NUTCH-549) Bug

2007-09-06 Thread crossany (JIRA)
Bug --- Key: NUTCH-549 URL: https://issues.apache.org/jira/browse/NUTCH-549 Project: Nutch Issue Type: Bug Reporter: crossany -- This message is automatically generated by JIRA. - You can reply to this email to add a