[jira] [Updated] (TIKA-2124) IOException "expected number, actual=COSArray{...}" on a valid PDF

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seva Alekseyev updated TIKA-2124: - Description: On the following PDF file that opens with Acrobat:

[jira] [Commented] (TIKA-2118) Misleading exception on a password protected XLS

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590401#comment-15590401 ] Seva Alekseyev commented on TIKA-2118: -- I'll have to do some homework for that. > Misleading

[jira] [Commented] (TIKA-2127) NullPointerException on a valid PPTX

2016-10-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590395#comment-15590395 ] Hudson commented on TIKA-2127: -- SUCCESS: Integrated in Jenkins build tika-2.x #163 (See

[jira] [Commented] (TIKA-2127) NullPointerException on a valid PPTX

2016-10-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590326#comment-15590326 ] Hudson commented on TIKA-2127: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #65 (See

tika-2.x-windows - Build # 65 - Still Failing

2016-10-19 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #65) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/65/ to view the results.

[jira] [Commented] (TIKA-2118) Misleading exception on a password protected XLS

2016-10-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590311#comment-15590311 ] Tim Allison commented on TIKA-2118: --- Does this happen with non-password-protected XLS version 5 files? >

[jira] [Commented] (TIKA-2121) ClassCastException on a valid PDF (fixed in PDFBox)

2016-10-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590306#comment-15590306 ] Tim Allison commented on TIKA-2121: --- Y, sorry, again, PDFBox's ExtractText doesn't exercise

[jira] [Commented] (TIKA-2127) NullPointerException on a valid PPTX

2016-10-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590299#comment-15590299 ] Hudson commented on TIKA-2127: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1122 (See

[jira] [Commented] (TIKA-2117) NullPointerException on PDF (fixed in PDFBox)

2016-10-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590291#comment-15590291 ] Tim Allison commented on TIKA-2117: --- Thank you for checking. I forgot that PDFBox's ExtractText doesn't

[jira] [Resolved] (TIKA-2127) NullPointerException on a valid PPTX

2016-10-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2127. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Thank you! Fixed. >

[jira] [Resolved] (TIKA-2128) IllegalArgumentException on a valid Word file

2016-10-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2128. --- Resolution: Duplicate > IllegalArgumentException on a valid Word file >

[jira] [Updated] (TIKA-2129) IllegalArgumentException/"Unknown shape type" on a valid Powerpoint file

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seva Alekseyev updated TIKA-2129: - Description: The following valid Powerpoint file:

[jira] [Created] (TIKA-2129) IllegalArgumentException/"Unknown shape type" on a valid Powerpoint file

2016-10-19 Thread Seva Alekseyev (JIRA)
Seva Alekseyev created TIKA-2129: Summary: IllegalArgumentException/"Unknown shape type" on a valid Powerpoint file Key: TIKA-2129 URL: https://issues.apache.org/jira/browse/TIKA-2129 Project: Tika

[jira] [Commented] (TIKA-2122) Extract all email headers from Outlook .msg files into Metadata

2016-10-19 Thread Chris Knott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590005#comment-15590005 ] Chris Knott commented on TIKA-2122: --- Wow, thanks! Very fast turnaround. > Extract all email headers from

[jira] [Created] (TIKA-2128) IllegalArgumentException on a valid Word file

2016-10-19 Thread Seva Alekseyev (JIRA)
Seva Alekseyev created TIKA-2128: Summary: IllegalArgumentException on a valid Word file Key: TIKA-2128 URL: https://issues.apache.org/jira/browse/TIKA-2128 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-2127) NullPointerException on a valid PPTX

2016-10-19 Thread Seva Alekseyev (JIRA)
Seva Alekseyev created TIKA-2127: Summary: NullPointerException on a valid PPTX Key: TIKA-2127 URL: https://issues.apache.org/jira/browse/TIKA-2127 Project: Tika Issue Type: Bug Affects

[jira] [Updated] (TIKA-2104) Upgrade to a version of POI that fixes common bugs in macro extraction, when available

2016-10-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2104: -- Description: On TIKA-2069, we found two bugs in POI that prevented the extraction of macros from

[jira] [Created] (TIKA-2126) Pull out more embedded objects in PPT

2016-10-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2126: - Summary: Pull out more embedded objects in PPT Key: TIKA-2126 URL: https://issues.apache.org/jira/browse/TIKA-2126 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-2125) XmlValueOutOfRangeException on a good Word document

2016-10-19 Thread Seva Alekseyev (JIRA)
Seva Alekseyev created TIKA-2125: Summary: XmlValueOutOfRangeException on a good Word document Key: TIKA-2125 URL: https://issues.apache.org/jira/browse/TIKA-2125 Project: Tika Issue Type:

[VOTE] Apache Tika 1.14 Release Candidate #1

2016-10-19 Thread Chris Mattmann
Hi Folks, A first candidate for the Tika 1.14 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c9778e4f49f2834a07e5a9d99b23042b The SHA1

[jira] [Updated] (TIKA-2104) Upgrade to a version of POI that fixes common bugs in macro extraction, when available

2016-10-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2104: -- Description: On TIKA-2069, we found two bugs in POI that prevented the extraction of macros from

[jira] [Updated] (TIKA-2121) ClassCastException on a valid PDF (fixed in PDFBox)

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seva Alekseyev updated TIKA-2121: - Summary: ClassCastException on a valid PDF (fixed in PDFBox) (was: ClassCastException on a valid

[jira] [Updated] (TIKA-2117) NullPointerException on PDF (fixed in PDFBox)

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seva Alekseyev updated TIKA-2117: - Summary: NullPointerException on PDF (fixed in PDFBox) (was: NullPointerException on PDF) >

[jira] [Commented] (TIKA-2117) NullPointerException on PDF (fixed in PDFBox)

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589481#comment-15589481 ] Seva Alekseyev commented on TIKA-2117: -- Doesn't reproduce in PDFBox trunk. > NullPointerException on

[jira] [Updated] (TIKA-2121) ClassCastException on a valid PDF [fixed in PDFBox]

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seva Alekseyev updated TIKA-2121: - Summary: ClassCastException on a valid PDF [fixed in PDFBox] (was: ClassCastException on a valid

[jira] [Commented] (TIKA-2121) ClassCastException on a valid PDF

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589475#comment-15589475 ] Seva Alekseyev commented on TIKA-2121: -- Does not reproduce in the PDFBox trunk. > ClassCastException

[jira] [Updated] (TIKA-2124) IOException "expected number, actual=COSArray{...}" on a valid PDF

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seva Alekseyev updated TIKA-2124: - External issue URL: https://issues.apache.org/jira/browse/PDFBOX-3533?filter=-2 External

[jira] [Updated] (TIKA-2124) IOException "expected number, actual=COSArray{...}" on a valid PDF

2016-10-19 Thread Seva Alekseyev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seva Alekseyev updated TIKA-2124: - Summary: IOException "expected number, actual=COSArray{...}" on a valid PDF (was: IOException ""

[jira] [Created] (TIKA-2124) IOException "" on a valid PDF

2016-10-19 Thread Seva Alekseyev (JIRA)
Seva Alekseyev created TIKA-2124: Summary: IOException "" on a valid PDF Key: TIKA-2124 URL: https://issues.apache.org/jira/browse/TIKA-2124 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2123) CommonsDigester calculates wrong hashes on large files

2016-10-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589252#comment-15589252 ] Hudson commented on TIKA-2123: -- SUCCESS: Integrated in Jenkins build tika-2.x #162 (See

[jira] [Commented] (TIKA-2123) CommonsDigester calculates wrong hashes on large files

2016-10-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589216#comment-15589216 ] Hudson commented on TIKA-2123: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1119 (See

[jira] [Updated] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1329: Fix Version/s: (was: 1.14) 1.15 > Add RecursiveParserWrapper aka

[jira] [Updated] (TIKA-1738) ForkClient does not always delete temporary bootstrap jar

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1738: Fix Version/s: (was: 1.14) 1.15 > ForkClient does not always delete

[jira] [Updated] (TIKA-1724) Create parser for .obo file format.

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1724: Fix Version/s: (was: 1.14) 1.15 > Create parser for .obo file format.

[jira] [Updated] (TIKA-1367) Tika documentation should list tika-parsers parser dependencies

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1367: Fix Version/s: (was: 1.14) 1.15 > Tika documentation should list

[jira] [Updated] (TIKA-1640) Make ExternalParser support aliases for key names in extracted metadata

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1640: Fix Version/s: (was: 1.14) 1.15 > Make ExternalParser support aliases

[jira] [Updated] (TIKA-1390) Create tika-example module

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1390: Fix Version/s: (was: 1.14) 1.15 > Create tika-example module >

[jira] [Updated] (TIKA-1808) Head section closed too eager

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1808: Fix Version/s: (was: 1.14) 1.15 > Head section closed too eager >

[jira] [Updated] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-819: --- Fix Version/s: (was: 1.14) 1.15 > Make Option to Exclude Embedded Files'

[jira] [Updated] (TIKA-1436) improvement to PDFParser

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1436: Fix Version/s: (was: 1.14) 1.15 > improvement to PDFParser >

[jira] [Updated] (TIKA-1705) Update ASM dependency to 5.0.4

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1705: Fix Version/s: (was: 1.14) 1.15 > Update ASM dependency to 5.0.4 >

[jira] [Updated] (TIKA-1840) No way to link slide notes to slide in PPT output.

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1840: Fix Version/s: (was: 1.14) 1.15 > No way to link slide notes to slide

[jira] [Updated] (TIKA-891) Use POST in addition to PUT on method calls in tika-server

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-891: --- Fix Version/s: (was: 1.14) 1.15 > Use POST in addition to PUT on method

[jira] [Updated] (TIKA-1815) Text content from parser is empty when NamedEntityParser is enabled

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1815: Fix Version/s: (was: 1.14) 1.15 > Text content from parser is empty

[jira] [Updated] (TIKA-1456) Visual Sentiment API parser

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1456: Fix Version/s: (was: 1.14) 1.15 > Visual Sentiment API parser >

[jira] [Updated] (TIKA-1328) Translate Metadata and Content

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1328: Fix Version/s: (was: 1.14) 1.15 > Translate Metadata and Content >

[jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1706: Fix Version/s: (was: 1.14) 1.15 > Bring back commons-io to tika-core

[jira] [Updated] (TIKA-1518) Docker with Tika Server

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1518: Fix Version/s: (was: 1.14) 1.15 > Docker with Tika Server >

[jira] [Updated] (TIKA-1800) MediaType#parse does not decode escaped special characters

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1800: Fix Version/s: (was: 1.14) 1.15 > MediaType#parse does not decode

[jira] [Updated] (TIKA-1106) CLAVIN Integration

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1106: Fix Version/s: (was: 1.14) 1.15 > CLAVIN Integration >

[jira] [Updated] (TIKA-1220) Parser implementration for IFC files

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1220: Fix Version/s: (was: 1.14) 1.15 > Parser implementration for IFC

[jira] [Updated] (TIKA-1829) org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:92) NPE

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1829: Fix Version/s: (was: 1.14) 1.15 >

[jira] [Updated] (TIKA-1343) Create a Tika Translator implementation that uses JoshuaDecoder

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1343: Fix Version/s: (was: 1.14) 1.15 > Create a Tika Translator

[jira] [Updated] (TIKA-1577) NetCDF Data Extraction

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1577: Fix Version/s: (was: 1.14) 1.15 > NetCDF Data Extraction >

[jira] [Updated] (TIKA-1308) Support in memory parse mode(don't create temp file): to support run Tika in GAE

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1308: Fix Version/s: (was: 1.14) 1.15 > Support in memory parse mode(don't

[jira] [Updated] (TIKA-1059) Better Handling of InterruptedException in ExternalParser and ExternalEmbedder

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1059: Fix Version/s: (was: 1.14) 1.15 > Better Handling of

[jira] [Updated] (TIKA-2017) Tika Server Cannot handle large files; add option for metadata only

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-2017: Fix Version/s: (was: 1.14) 1.15 > Tika Server Cannot handle large

[jira] [Updated] (TIKA-2016) A parser that combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-2016: Fix Version/s: (was: 1.14) 1.15 > A parser that combines Apache

[jira] [Updated] (TIKA-776) ExifTool Embedder

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-776: --- Fix Version/s: (was: 1.14) 1.15 > ExifTool Embedder > -

Re: 1.14?

2016-10-19 Thread Mattmann, Chris A (3010)
OK I’m seriously releasing it right now (see JIRA). Generating RC #1 ☺ Sorry. Email forthcoming in 20 ++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, Open Source Projects

[jira] [Updated] (TIKA-1301) Establish TikaServer on Apache hosted VM

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1301: Fix Version/s: (was: 1.14) 1.15 > Establish TikaServer on Apache

[jira] [Updated] (TIKA-1697) Parser Implementation for AkomaNtoso Legal XML Documents

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1697: Fix Version/s: (was: 1.14) 1.15 > Parser Implementation for

[jira] [Updated] (TIKA-987) Embedded drawing (SHAPE MERGEFORMAT) sometimes not extracted

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-987: --- Fix Version/s: (was: 1.14) 1.15 > Embedded drawing (SHAPE MERGEFORMAT)

[jira] [Updated] (TIKA-1208) Migrate Any23 mime contributions to Tika

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1208: Fix Version/s: (was: 1.14) 1.15 > Migrate Any23 mime contributions to

[jira] [Updated] (TIKA-1395) Create embedded image extraction example

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1395: Fix Version/s: (was: 1.14) 1.15 > Create embedded image extraction

[jira] [Updated] (TIKA-1672) Integrate tika-java7 component

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1672: Fix Version/s: (was: 1.14) 1.15 > Integrate tika-java7 component >

[jira] [Updated] (TIKA-1379) error in Tika().detect for xml files with xades signature

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1379: Fix Version/s: (was: 1.14) 1.15 > error in Tika().detect for xml

[jira] [Updated] (TIKA-1952) Access Date is getting modified while capturing the MetaData information using AutoDetectParser

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1952: Fix Version/s: (was: 1.14) 1.15 > Access Date is getting modified

[jira] [Updated] (TIKA-980) MicrodataContentHandler for Apache Tika

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-980: --- Fix Version/s: (was: 1.14) 1.15 > MicrodataContentHandler for Apache

[jira] [Updated] (TIKA-539) Encoding detection is too biased by encoding in meta tag

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-539: --- Fix Version/s: (was: 1.14) 1.15 > Encoding detection is too biased by

[jira] [Updated] (TIKA-1108) Represent individual slides in pptx

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1108: Fix Version/s: (was: 1.14) 1.15 > Represent individual slides in pptx

[jira] [Updated] (TIKA-1688) Tika Version in Metadata

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1688: Fix Version/s: (was: 1.14) 1.15 > Tika Version in Metadata >

[jira] [Updated] (TIKA-1295) Make some Dublin Core items multi-valued

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1295: Fix Version/s: (was: 1.14) 1.15 > Make some Dublin Core items

[jira] [Updated] (TIKA-1465) Implement extraction of non-global variables from netCDF3 and netCDF4

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1465: Fix Version/s: (was: 1.14) 1.15 > Implement extraction of non-global

[jira] [Updated] (TIKA-1953) tika-server NullPointerException while processing rtfs

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1953: Fix Version/s: (was: 1.14) 1.15 > tika-server NullPointerException

[jira] [Updated] (TIKA-1616) Tika Parser for GIBS Metadata

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1616: Fix Version/s: (was: 1.14) 1.15 > Tika Parser for GIBS Metadata >

[jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-715: --- Fix Version/s: (was: 1.14) 1.15 > Some parsers produce non-well-formed

[jira] [Updated] (TIKA-1609) Leverage Google's LibPhonenumber for enhanced phone number extraction and metadata modeling

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1609: Fix Version/s: (was: 1.14) 1.15 > Leverage Google's LibPhonenumber

[jira] [Updated] (TIKA-988) We don't extract a placeholder for a Word document embedded in an Excel document

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-988: --- Fix Version/s: (was: 1.14) 1.15 > We don't extract a placeholder for a

[jira] [Updated] (TIKA-1425) Automatic batching of Microsoft service calls

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1425: Fix Version/s: (was: 1.14) 1.15 > Automatic batching of Microsoft

[jira] [Updated] (TIKA-1454) Extracting as HTML loses links in xlsx, ppt, and pptx files

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1454: Fix Version/s: (was: 1.14) (was: 2.0) 1.15 >

[jira] [Updated] (TIKA-1318) Use of Deprecated Word6Extractor.getParagraphText() Method

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1318: Fix Version/s: (was: 1.14) 1.15 > Use of Deprecated

[jira] [Updated] (TIKA-1505) chmparser breaks down when extracting from file of CHM format v3

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1505: Fix Version/s: (was: 1.14) 1.15 > chmparser breaks down when

[jira] [Updated] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-894: --- Fix Version/s: (was: 1.14) 1.15 > Add webapp mode for Tika Server,

[jira] [Updated] (TIKA-1276) Missing embedded dependencies in tika-bundle

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1276: Fix Version/s: (was: 1.14) 1.15 > Missing embedded dependencies in

[jira] [Updated] (TIKA-1417) Create Extract Embedded Images from PDFs Example

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1417: Fix Version/s: (was: 1.14) 1.15 > Create Extract Embedded Images from

[jira] [Updated] (TIKA-1598) Parser Implementation for Streaming Video

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1598: Fix Version/s: (was: 1.14) 1.15 > Parser Implementation for Streaming

[jira] [Updated] (TIKA-774) ExifTool Parser

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-774: --- Fix Version/s: (was: 1.14) 1.15 > ExifTool Parser > --- > >

[jira] [Updated] (TIKA-1674) Add example to show how to extract embedded files

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1674: Fix Version/s: (was: 1.14) 1.15 > Add example to show how to extract

[jira] [Updated] (TIKA-1540) New Tika plugin for image based feature extraction using computer vision techniques

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1540: Fix Version/s: (was: 1.14) 1.15 > New Tika plugin for image based

[jira] [Updated] (TIKA-985) Support for HTML5 elements

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-985: --- Fix Version/s: (was: 1.14) 1.15 > Support for HTML5 elements >

[jira] [Updated] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-10-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1607: Fix Version/s: (was: 1.14) 1.15 > Introduce new arbitrary object

[jira] [Commented] (TIKA-2123) CommonsDigester calculates wrong hashes on large files

2016-10-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589142#comment-15589142 ] Hudson commented on TIKA-2123: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #64 (See

tika-2.x-windows - Build # 64 - Still Failing

2016-10-19 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #64) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/64/ to view the results.

[jira] [Commented] (TIKA-2123) CommonsDigester calculates wrong hashes on large files

2016-10-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588996#comment-15588996 ] Tim Allison commented on TIKA-2123: --- Thank you for opening this. I can replicate it. Testing the fix

[jira] [Updated] (TIKA-2123) CommonsDigester calculates wrong hashes on large files

2016-10-19 Thread Yahav Amsalem (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahav Amsalem updated TIKA-2123: Description: When passing more than one algorithm to CommonsDigester constructor and then trying to

[jira] [Updated] (TIKA-2123) CommonsDigester calculates wrong hashes on large files

2016-10-19 Thread Yahav Amsalem (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahav Amsalem updated TIKA-2123: Description: When passing more than one algorithm to CommonsDigester constructor and then trying to

[jira] [Created] (TIKA-2123) CommonsDigester calculates wrong hashes on large files

2016-10-19 Thread Yahav Amsalem (JIRA)
Yahav Amsalem created TIKA-2123: --- Summary: CommonsDigester calculates wrong hashes on large files Key: TIKA-2123 URL: https://issues.apache.org/jira/browse/TIKA-2123 Project: Tika Issue Type: