Re: how are CSV/TXT files handled

2012-02-15 Thread remi tassing
Hi, Tika is parsing properly, I think it was some kind of proxy issue and also the http.content.limit. Thanks! Remi On Fri, Feb 10, 2012 at 11:16 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Remi, Please ensure that your http.content limit is sufficient, what are you url

Re: how are CSV/TXT files handled

2012-02-08 Thread remi tassing
Ok I just did (It's great but I've been reluctant because recompiling always gives me errors). However, I'm still having a similar error: $ bin/nutch parsechecker http://URL fetching: http://URL parsing: http://URL contentType: application/ms-excel - Url ---

Re: how are CSV/TXT files handled

2012-02-07 Thread remi tassing
With the nutch parsechecker command I get the following error message: Error: Could not find or load main class parsechecker, this doesn't sound good! On Tue, Feb 7, 2012 at 9:58 AM, remi tassing tassingr...@gmail.com wrote: The point that made me start thinking is because I got this error

how are CSV/TXT files handled

2012-02-07 Thread remi tassing
Hey guys, I checked the mailing-list archive but couldn't get an answer on this. I think CSV and TXT don't need any kind of parsing, but how.are handled by default? Remi

Re: how are CSV/TXT files handled

2012-02-07 Thread Markus Jelsma
Upgrade to 1.4. With the nutch parsechecker command I get the following error message: Error: Could not find or load main class parsechecker, this doesn't sound good! On Tue, Feb 7, 2012 at 9:58 AM, remi tassing tassingr...@gmail.com wrote: The point that made me start thinking is

how are CSV/TXT files handled

2012-02-06 Thread remi tassing
Hey guys, I checked the mailing-list archive but couldn't get an answer on this. I think CSV and TXT don't need any kind of parsing, but how.are handled by default? Remi

Re: how are CSV/TXT files handled

2012-02-06 Thread remi tassing
The point that made me start thinking is because I got this error message: failed(2,0): Can't retrieve Tika parser for mime-type application/ms-excel I'm using Nutch-1.2 and my nutch-site.xml has: property nameplugin.includes/name