Hi Mike, My mistake. I thought this discussion was taking place on the dev list, not the user list. *Steve*
On Wed, Sep 7, 2011 at 11:30 AM, Michael McCandless < [email protected]> wrote: > Sorry, I don't understand what this output is telling me? > > Ie these 5 files are Tika's sources.... but, what's wrong with them? > > I thought we were talking about certain emails from the Enron corpus > where Tika hits an exception or fails to extract text... > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Sep 7, 2011 at 1:04 PM, Steve Aulenbach <[email protected]> > wrote: > > Hi Mike, > > Here you go. I ran a quick analysis on revision 1166216 and saw the > > following: > > > > Analysis Summary: > > > > Files: 510 > > > > *** Warning *** File(s) Not Found 5: > > > > > /tika-parsers/src/main/java/org/apache/tika/detect/ContainerAwareDetector.java > > > > > /tika-parsers/src/main/java/org/apache/tika/detect/POIFSContainerDetector.java > > > > > /tika-parsers/src/main/java/org/apache/tika/detect/ZipContainerDetector.java > > > > /tika-parsers/src/test/java/org/apache/tika/parser/chm/TestUtils.java > > > > > /tika-parsers/target/surefire-reports/TEST-org.apache.tika.parser.chm.TestUtils.xml > > > > Thanks, > > Steve > > > > > > On Wed, Sep 7, 2011 at 6:29 AM, Michael McCandless > > <[email protected]> wrote: > >> > >> On Tue, Sep 6, 2011 at 9:29 PM, Mark Kerzner <[email protected]> > >> wrote: > >> > >> > Is anybody interested in the results of all the testing that > >> > I am doing, and if yes, how should I report my findings? > >> > >> I'm interested! This sounds great.... > >> > >> Tika should strive to have no errors on any valid documents... so if > >> you (or anyone) are hitting bugs in Tika/POI/PDFBox/etc., let's > >> characterize them, open issues, and get them fixed :) > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > > > > >
