Hi Oliver, I think if you need more info on the Tika parser implementation for csv you should head over to [email protected] and see what folks there can offer you.
On Thu, Jun 21, 2012 at 8:42 AM, Olivier LEVILLAIN <[email protected]> wrote: > Nevertheless, my parse-plugins.xml contains: > <mimeType name="text/rtf"> > <plugin id="parse-tika" /> > </mimeType> > <mimeType name="application/rtf"> > <plugin id="parse-tika" /> > </mimeType> I thought we were talking about csv? The mime type definition for csv and rtf is different... This being said, the above config looks fine... if however it was not there parse-tika would still automatically pick this up due to wildcard default settings. > becquse nutch complained that first it didn't find text/rtf and the second > time it didn't find application/rtf Strange. Tika certainly has an rtf parser implementation > > I'll take a look at your solution with Tika-CSV-parser but it's a shame it > does not come out of the box with tika/nutch, but it seems to imply > recompiling things and so on AFAIK its a simple case of obtaining tika source making the necessary changes, compiling the jar then putting this on your Nutch class path. I don't envisage it to be too much hassle. > In the meantime, what would be the simple trick to parse csv as plain text? > Which *existing* parser should I use? As you have highlighted it appears parse-tika is not working out tf the box with this mimeType. I suggest you invest half an hour or so and get the github csv parser working. hth

