Hi,
I'm using Nutch 0.9 to crawl part of my intranet, and am getting the
following when attempting to parse ppt files:
2009-03-11 16:30:47,000 ERROR mspowerpoint.ContentReaderListener -
extractClientTextBoxes
java.lang.ArrayIndexOutOfBoundsException: -55133188
at
urlfilter-subnets plugin
Key: NUTCH-718
URL: https://issues.apache.org/jira/browse/NUTCH-718
Project: Nutch
Issue Type: New Feature
Reporter: Dmitry Lihachev
Priority: Minor
This plugin
Hello Dennis,
We'v been trying your new framework and indexer and everything looks
better now. But we can't understand what should be output of last
command (FieldIndexer).
We have:
u...@kubuntu:~/nutch-1.0$ ls crawl/indexes/part-0/
index.done segments_1 segments.gen
[
https://issues.apache.org/jira/browse/NUTCH-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Lihachev updated NUTCH-718:
--
Attachment: NUTCH-718_urlfilter_subnets.patch
{code}
cd nutch-trunk
patch -p0
site: search operator with no query term
Key: NUTCH-720
URL: https://issues.apache.org/jira/browse/NUTCH-720
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.1
Reporter: