+1

Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7
Windows 7, Java 1.7

I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000 docs (all 
formats) plus all available msoffice-x files in govdocs1, yielding 10,413 docs. 
 There were several improvements in text extraction for PDFs (mostly spacing) 
and 4 fewer exceptions (2 ppt, 1 doc and 1 pdf).

There was one regression:
http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx 

Stacktrace:
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -369073454
        at java.lang.String.checkBounds(String.java:371)
        at java.lang.String.<init>(String.java:415)
        at 
org.apache.poi.util.StringUtil.getFromCompressedUnicode(StringUtil.java:114)
        at 
org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:163)
        at 
org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(Ole10Native.java:91)
        at 
org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(Ole10Native.java:63)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedOLE(AbstractOOXMLExtractor.java:250)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:199)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:115)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)


-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Monday, July 28, 2014 12:22 AM
To: d...@tika.apache.org
Cc: user@tika.apache.org
Subject: [VOTE] Apache Tika 1.6 release candidate #1

Hi Folks,

A candidate for the Tika 1.6 release is available at:

http://people.apache.org/~mattmann/apache-tika-1.6/rc1/


The release candidate is a zip archive of the sources in:

    http://svn.apache.org/repos/asf/tika/tags/1.6/

The SHA1 checksum of the archive is
076ad343be56a540a4c8e395746fa4fda5b5b6d3.

A Maven staging repository is available at:

https://repository.apache.org/content/repositories/orgapachetika-1003/


Please vote on releasing this package as Apache Tika 1.6.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

    [ ] +1 Release this package as Apache Tika 1.6
    [ ] -1 Do not release this package becauseŠ

Thank you!

Cheers,
Chris

P.S. Here is my +1!





Reply via email to