I've got a few very large (upwards of 3 MB) XML files I'm trying to index, and
I'm having trouble. Previously I'd had trouble with the fetch; now that seems
to be okay, but due to the size of the files the parse takes much too long.
Is there a good way to optimize this that I'm missing? Is
The actual parse which is producing time outs happens early in the process.
There are, to my knowledge, no Nutch settings to make this faster or change
its behaviour, it's all about the parser implementation.
Try increasing your parser.timeout setting.
On Wednesday 26 October 2011 16:45:33
Increasing parser.timeout to 3600 got me what I needed. I only have a few files
this huge, so I'll live with that.
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, October 26, 2011 10:55 AM
To: user@nutch.apache.org
Subject: Re: Extremely long
3 matches
Mail list logo