See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/74/
--
[...truncated 322 lines...]
A site/tutorial8.pdf
A site/skin
A site/skin/menu.js
A site/skin/getMenu.js
A site/skin/skinconf.xsl
A
hi all,
i am new to Nutch. I would like to crawl a particular site and get the
result in the following pattern.I dont want to list other urls from the
Crwaled site.
Site to be Crwal :eg www.example.com
^http://([a-z0-9]*\.)example.com/([a-zA-Z]*)-\([a-z0-9]*\)-.*-\([0-9]*-[A-Za-z0-9]*\)\.html$
Extend URLFilters to support different filtering chains
---
Key: NUTCH-477
URL: https://issues.apache.org/jira/browse/NUTCH-477
Project: Nutch
Issue Type: Improvement
Affects Versions:
[
https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-477:
Attachment: urlfilters.patch
This patch implements suggested changes.
Extend URLFilters