Build failed in Hudson: Nutch-Nightly #74

2007-05-03 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/74/ -- [...truncated 322 lines...] A site/tutorial8.pdf A site/skin A site/skin/menu.js A site/skin/getMenu.js A site/skin/skinconf.xsl A

Nutch - Filtering (REGEX)

2007-05-03 Thread simon_ece
hi all, i am new to Nutch. I would like to crawl a particular site and get the result in the following pattern.I dont want to list other urls from the Crwaled site. Site to be Crwal :eg www.example.com ^http://([a-z0-9]*\.)example.com/([a-zA-Z]*)-\([a-z0-9]*\)-.*-\([0-9]*-[A-Za-z0-9]*\)\.html$

[jira] Created: (NUTCH-477) Extend URLFilters to support different filtering chains

2007-05-03 Thread Andrzej Bialecki (JIRA)
Extend URLFilters to support different filtering chains --- Key: NUTCH-477 URL: https://issues.apache.org/jira/browse/NUTCH-477 Project: Nutch Issue Type: Improvement Affects Versions:

[jira] Updated: (NUTCH-477) Extend URLFilters to support different filtering chains

2007-05-03 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-477: Attachment: urlfilters.patch This patch implements suggested changes. Extend URLFilters