Extremely long parsing of large XML files (Was RE: Good workaround for timeout?)

2011-10-26 Thread Chip Calhoun
I've got a few very large (upwards of 3 MB) XML files I'm trying to index, and I'm having trouble. Previously I'd had trouble with the fetch; now that seems to be okay, but due to the size of the files the parse takes much too long. Is there a good way to optimize this that I'm missing? Is

Re: Extremely long parsing of large XML files (Was RE: Good workaround for timeout?)

2011-10-26 Thread Markus Jelsma
The actual parse which is producing time outs happens early in the process. There are, to my knowledge, no Nutch settings to make this faster or change its behaviour, it's all about the parser implementation. Try increasing your parser.timeout setting. On Wednesday 26 October 2011 16:45:33

RE: Extremely long parsing of large XML files (Was RE: Good workaround for timeout?)

2011-10-26 Thread Chip Calhoun
Increasing parser.timeout to 3600 got me what I needed. I only have a few files this huge, so I'll live with that. -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, October 26, 2011 10:55 AM To: user@nutch.apache.org Subject: Re: Extremely long