Sorry, I don't understand what this output is telling me? Ie these 5 files are Tika's sources.... but, what's wrong with them?
I thought we were talking about certain emails from the Enron corpus where Tika hits an exception or fails to extract text... Mike McCandless http://blog.mikemccandless.com On Wed, Sep 7, 2011 at 1:04 PM, Steve Aulenbach <[email protected]> wrote: > Hi Mike, > Here you go. I ran a quick analysis on revision 1166216 and saw the > following: > > Analysis Summary: > > Files: 510 > > *** Warning *** File(s) Not Found 5: > > /tika-parsers/src/main/java/org/apache/tika/detect/ContainerAwareDetector.java > > /tika-parsers/src/main/java/org/apache/tika/detect/POIFSContainerDetector.java > > /tika-parsers/src/main/java/org/apache/tika/detect/ZipContainerDetector.java > > /tika-parsers/src/test/java/org/apache/tika/parser/chm/TestUtils.java > > /tika-parsers/target/surefire-reports/TEST-org.apache.tika.parser.chm.TestUtils.xml > > Thanks, > Steve > > > On Wed, Sep 7, 2011 at 6:29 AM, Michael McCandless > <[email protected]> wrote: >> >> On Tue, Sep 6, 2011 at 9:29 PM, Mark Kerzner <[email protected]> >> wrote: >> >> > Is anybody interested in the results of all the testing that >> > I am doing, and if yes, how should I report my findings? >> >> I'm interested! This sounds great.... >> >> Tika should strive to have no errors on any valid documents... so if >> you (or anyone) are hitting bugs in Tika/POI/PDFBox/etc., let's >> characterize them, open issues, and get them fixed :) >> >> Mike McCandless >> >> http://blog.mikemccandless.com > >
