RE: CSV Parser in Tika

Allison, Timothy B. Fri, 19 Jun 2015 07:29:11 -0700

Y, that’s my belief.

As of now, we’re treating them as text files, which can lead to some really 
long = bogus tokens in Lucene/Solr with analyzers that don’t split on commas. ☹

Detection without filename would be difficult.

From: lewis john mcgibbney [mailto:[email protected]]
Sent: Friday, June 19, 2015 9:59 AM
To: [email protected]
Subject: CSV Parser in Tika

Hi Folks,
Am I correct in saying that we can't detect CSV in Tika?
We import commons-csv in tika-parsers/pom.xml, however I don't see a csv 
package and registered parser.
Also, when I use the webapp I get the following for a test csv file with 
semicolon ';' separators

Content-Encoding: ISO-8859-1
Content-Length: 217
Content-Type: text/plain; charset=ISO-8859-1
X-Parsed-By: org.apache.tika.parser.DefaultParser
resourceName: test-semicolon.csv
Any comments please?
Thanks
Lewis

RE: CSV Parser in Tika

Reply via email to