Hi Mike,

My mistake. I thought this discussion was taking place on the dev list, not
the user list.
*Steve*



On Wed, Sep 7, 2011 at 11:30 AM, Michael McCandless <
[email protected]> wrote:

> Sorry, I don't understand what this output is telling me?
>
> Ie these 5 files are Tika's sources.... but, what's wrong with them?
>
> I thought we were talking about certain emails from the Enron corpus
> where Tika hits an exception or fails to extract text...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Sep 7, 2011 at 1:04 PM, Steve Aulenbach <[email protected]>
> wrote:
> > Hi Mike,
> > Here you go. I ran a quick analysis on revision 1166216 and saw the
> > following:
> >
> > Analysis Summary:
> >
> > Files: 510
> >
> > *** Warning *** File(s) Not Found 5:
> >
> >
> /tika-parsers/src/main/java/org/apache/tika/detect/ContainerAwareDetector.java
> >
> >
> /tika-parsers/src/main/java/org/apache/tika/detect/POIFSContainerDetector.java
> >
> >
> /tika-parsers/src/main/java/org/apache/tika/detect/ZipContainerDetector.java
> >
> > /tika-parsers/src/test/java/org/apache/tika/parser/chm/TestUtils.java
> >
> >
> /tika-parsers/target/surefire-reports/TEST-org.apache.tika.parser.chm.TestUtils.xml
> >
> > Thanks,
> > Steve
> >
> >
> > On Wed, Sep 7, 2011 at 6:29 AM, Michael McCandless
> > <[email protected]> wrote:
> >>
> >> On Tue, Sep 6, 2011 at 9:29 PM, Mark Kerzner <[email protected]>
> >> wrote:
> >>
> >> > Is anybody interested in the results of all the testing that
> >> > I am doing, and if yes, how should I report my findings?
> >>
> >> I'm interested!  This sounds great....
> >>
> >> Tika should strive to have no errors on any valid documents... so if
> >> you (or anyone) are hitting bugs in Tika/POI/PDFBox/etc., let's
> >> characterize them, open issues, and get them fixed :)
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >
> >
>

Reply via email to