Filter fetching by mime type

2008-02-26 Thread Nynodata Development Team
Hello, I'm crawling sites that have mime types that I don't want to fetch, although the URLs themselves don't have any distinguishing pattern, so I can't use the regex URL filter to skip these URLs. As far as I know, there is presently no way to filter fetched content by mime type. E.g. How can

[jira] Updated: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval

2008-02-26 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-615: Attachment: NUTCH-615.patch Redirected URL are fetched wihtout setting any FetchInterval

[jira] Created: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval

2008-02-26 Thread Emmanuel Joke (JIRA)
Redirected URL are fetched wihtout setting any FetchInterval Key: NUTCH-615 URL: https://issues.apache.org/jira/browse/NUTCH-615 Project: Nutch Issue Type: Bug

[jira] Updated: (NUTCH-616) Reset Fetch Retry counter when fetch is successful

2008-02-26 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-616: Attachment: NUTCH-616.patch Patch provided Reset Fetch Retry counter when fetch is successful

[jira] Created: (NUTCH-616) Reset Fetch Retry counter when fetch is successful

2008-02-26 Thread Emmanuel Joke (JIRA)
Reset Fetch Retry counter when fetch is successful -- Key: NUTCH-616 URL: https://issues.apache.org/jira/browse/NUTCH-616 Project: Nutch Issue Type: Bug Affects Versions: 1.0.0

[jira] Updated: (NUTCH-614) Order Inlinks by OPIC score of parent page

2008-02-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-614: --- Attachment: NUTCH-614-2-20080226.patch Very, very messy patch. This is a first cut at both allowing

Build failed in Hudson: Nutch-trunk #371

2008-02-26 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/371/changes -- [...truncated 2055 lines...] AUsrc/plugin/parse-html/plugin.xml AUsrc/plugin/parse-html/build.xml A src/plugin/protocol-httpclient A

Re: Build failed in Hudson: Nutch-trunk #371

2008-02-26 Thread Nigel Daley
Sorry, ignore this. I'm trying to fix the whoami test problem. Nige On Feb 26, 2008, at 5:06 PM, Apache Hudson Server wrote: See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/371/changes -- [...truncated 2055 lines...] AU

Build failed in Hudson: Nutch-trunk #372

2008-02-26 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/372/changes -- [...truncated 4603 lines...] copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex init:

Build failed in Hudson: Nutch-trunk #373

2008-02-26 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/373/changes -- [...truncated 4603 lines...] copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex init:

Build failed in Hudson: Nutch-trunk #374

2008-02-26 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/374/changes -- [...truncated 4605 lines...] copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex init: