PowerPoint Parsing Exception

2009-03-12 Thread Bullard, Luke
Hi, I'm using Nutch 0.9 to crawl part of my intranet, and am getting the following when attempting to parse ppt files: 2009-03-11 16:30:47,000 ERROR mspowerpoint.ContentReaderListener - extractClientTextBoxes java.lang.ArrayIndexOutOfBoundsException: -55133188 at

[jira] Created: (NUTCH-718) urlfilter-subnets plugin

2009-03-12 Thread Dmitry Lihachev (JIRA)
urlfilter-subnets plugin Key: NUTCH-718 URL: https://issues.apache.org/jira/browse/NUTCH-718 Project: Nutch Issue Type: New Feature Reporter: Dmitry Lihachev Priority: Minor This plugin

Re: planning for nutch-1.0-rc1

2009-03-12 Thread Bartosz Gadzimski
Hello Dennis, We'v been trying your new framework and indexer and everything looks better now. But we can't understand what should be output of last command (FieldIndexer). We have: u...@kubuntu:~/nutch-1.0$ ls crawl/indexes/part-0/ index.done segments_1 segments.gen

[jira] Updated: (NUTCH-718) urlfilter-subnets plugin

2009-03-12 Thread Dmitry Lihachev (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Lihachev updated NUTCH-718: -- Attachment: NUTCH-718_urlfilter_subnets.patch {code} cd nutch-trunk patch -p0

[jira] Created: (NUTCH-720) site: search operator with no query term

2009-03-12 Thread Frank McCown (JIRA)
site: search operator with no query term Key: NUTCH-720 URL: https://issues.apache.org/jira/browse/NUTCH-720 Project: Nutch Issue Type: Improvement Affects Versions: 1.1 Reporter: