Tika stand alone CLI --text output mostly not working, other output formats are 
fine
------------------------------------------------------------------------------------

                 Key: TIKA-179
                 URL: https://issues.apache.org/jira/browse/TIKA-179
             Project: Tika
          Issue Type: Bug
          Components: cli
    Affects Versions: 0.2, 0.3
         Environment: Java 1.5 (also tried Java 1.6). OS used:  Mac OS X, Linux 
(CentOS)
            Reporter: Paul Borgermans


When using Tika standalone jar after mvn install in CLI mode, in most of my 
test documents (pdf, doc, ppt, odt, ), the plain text output option (-t or 
--text) does not produce any result. When using the other options (xml, html, 
metadata), the output is correct. Activating debug mode (-v) does not produce 
additional info either.

When using the GUI, dragging and dropping does produce the expected results, 
also in the plain text tab/window

I rebuilt tika many times in the past 2 months (cleared .m2 directory every 
time) from svn (latest revision tried:  724002), the CLI --text result is 
always the same: usually missing output.

For now, I use the -x output option chained to html2txt as a workaround, but 
would prefer to use just tika to convert to plain text (which is used for 
further indexing in Solr).

Thanks




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to