Hello Jon, and sorry for the late response, > I'd appreciate any thoughts. Perhaps something for parser policy. I've > > traced the source code a bit and nothing jumped out at me...
There's some currently identified issues on the parser policy (ie ParserFactory), and we are actively working on it. I don't undestand why the parse-ext plugin is called in your case, whereas it should be parser-pdf or parse-html plugins. Here's a workaround: if you don't have needs for the parse-ext (plugin used to perform parsing using some exernal commands), simply remove it and all should be ok. Could you please send me your /usr/local/nutch/plugins/parse-ext/plugin.xml file so that I can check if something goes wrong in it. Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
