A quick update: stripping out the carriage returns '\r' before passing to Ruby's CSV.parse does the trick -- nothing elsr is needed.
On Oct 27, 2012, at 11:54 AM, "Mattmann, Chris A (388J)" <[email protected]> wrote: > Hey David, > > Thanks man for following this up on list and for the blog post -- great work! > I love TIKA-593 (our REST > server) too! :) > > Cheers, > Chris > > On Oct 26, 2012, at 2:37 PM, David James wrote: > >> I have found no evidence that Tika is the problem. I have found reason >> to suspect that Ruby 1.9.3.'s CSV is acting funny. This is my >> work-around for Ruby 1.9.3, maybe it will be useful to someone besides >> me. >> >> class TikaCSV >> def self.parse(s) >> s.split(/\n(?="[^"])/).reduce([]) { |a, x| a += CSV.parse(x) } >> end >> end >> >> I also wrote it up here: >> http://djwonk.tumblr.com/post/34370338490/visions-of-comma-separated-values >
