Premature end of file Exception
-------------------------------

                 Key: TIKA-236
                 URL: https://issues.apache.org/jira/browse/TIKA-236
             Project: Tika
          Issue Type: Bug
    Affects Versions: 0.3
         Environment: Windows / Unix
            Reporter: Karl Heinz Marbaise
            Priority: Critical


I have reduced the problem down to the following:

        @Test
        public void testZipFile() throws IOException, SAXException, 
TikaException {
                String fileName = "lucene-2.2.0-src.zip";
                FileInputStream fis = new FileInputStream(fileName);
                Metadata metadata = new Metadata();
                metadata.set(Metadata.RESOURCE_NAME_KEY, fileName);
                AutoDetectParser parser = new AutoDetectParser();
                DefaultHandler handler = new BodyContentHandler();
                parser.parse(fis, handler, metadata);
                System.out.println("Handler:" + handler.toString());
        }

and the result of the above is the following:
FAILED: testZipFile
org.xml.sax.SAXParseException: Premature end of file.
        at 
org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown 
Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at 
org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
Source)
        at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:176)
        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:59)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:108)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:78)
        at 
org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:93)
        at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:56)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:108)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:78)
        at 
com.soebes.supose.scan.ScanZIPDocumentTest.testZipFile(ScanZIPDocumentTest.java:30)
... Removed 22 stack frames

I have tested the ZIP file with 7-zip, with unzip on command line if it has any 
errors in there...but there seemed to be none. If you need this file i can 
attach that file, but it's about 7 mb size...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to