Issues with data files used for testing by TestParsers.
-------------------------------------------------------

                 Key: TIKA-16
                 URL: https://issues.apache.org/jira/browse/TIKA-16
             Project: Tika
          Issue Type: Bug
            Reporter: Keith R. Bennett


The TestParsers class requires that the following test files be available:

testPDF.PDF
testTXT.TXT
testRTF.RTF
textXML.XML
testPPT.PPT
testWORD.doc
testEXCEL.xls
testOO2.odt
testHTML.html

Only the following are provided:

testHTML.html
testRTF.rtf
testTXT.txt
testXML.xml

Issue #1: When specifying the file names in the source code and for the files 
themselves, we should make sure that they are equal case-sensitively.  My 
personal preference is for lower case extensions, but in any case they should 
be consistent.  Therefore, where necessary, we need to rename the file or 
change the source code so that they match.  (This is the case for *.rtf, *.txt, 
*.xml above.)

Issue #2: The missing files need to be added to the repository so that the test 
does not fail.  A minimal file would suffice for the short term, but ultimately 
it would be nice to have files that exercise the parsers to the fullest.

Issue #3: We need to agree on a directory.  The source code currently specifies 
that they should be in src/test/resources/testFiles, but in the respository 
they are in src/test/resources.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to