[GitHub] tika pull request #145: Tika5

2017-01-13 Thread ashutoshvsingh
GitHub user ashutoshvsingh opened a pull request: https://github.com/apache/tika/pull/145 Tika5 change snapshot to 1.0.0 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lafaspot/tika tika5 Alternatively you can review and apply t

[jira] [Commented] (TIKA-2239) Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822344#comment-15822344 ] Tim Allison commented on TIKA-2239: --- Will take a look next week. Thank you for opening t

[jira] [Resolved] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2232. --- Resolution: Fixed Let me know if you'd like different behavior. > Add JBIG2 image parsing support > --

[jira] [Comment Edited] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822253#comment-15822253 ] Tim Allison edited comment on TIKA-2232 at 1/13/17 7:51 PM: Pro

[jira] [Commented] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822253#comment-15822253 ] Tim Allison commented on TIKA-2232: --- Proposed change if jbig2 is not on the classpath: P

[jira] [Updated] (TIKA-2240) MS Write File

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2240: -- Attachment: 746255.doc > MS Write File > - > > Key: TIKA-2240 >

[jira] [Created] (TIKA-2240) MS Write File

2017-01-13 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2240: - Summary: MS Write File Key: TIKA-2240 URL: https://issues.apache.org/jira/browse/TIKA-2240 Project: Tika Issue Type: Improvement Reporter: Tim Allison

[jira] [Reopened] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-2232: --- Reopen to handle jbig2 not on class path > Add JBIG2 image parsing support > -

[jira] [Commented] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822212#comment-15822212 ] Tim Allison commented on TIKA-2232: --- We should be catching that and storing it in a metad

[jira] [Commented] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Nicholas DiPiazza (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822136#comment-15822136 ] Nicholas DiPiazza commented on TIKA-2232: - [~pascal.essiembre] totally obviously w

[jira] [Comment Edited] (TIKA-2232) Add JBIG2 image parsing support

2017-01-13 Thread Nicholas DiPiazza (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822136#comment-15822136 ] Nicholas DiPiazza edited comment on TIKA-2232 at 1/13/17 6:39 PM: ---

[jira] [Updated] (TIKA-2239) Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser

2017-01-13 Thread Jorge Spinsanti (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Spinsanti updated TIKA-2239: -- Description: I got an exception to extract text from DOCX due to SAXParseException on Apache POI

[jira] [Updated] (TIKA-2239) Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser

2017-01-13 Thread Jorge Spinsanti (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Spinsanti updated TIKA-2239: -- Attachment: tika2239.docx > Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXML

[jira] [Created] (TIKA-2239) Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser

2017-01-13 Thread Jorge Spinsanti (JIRA)
Jorge Spinsanti created TIKA-2239: - Summary: Illegal IOException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser Key: TIKA-2239 URL: https://issues.apache.org/jira/browse/TIKA-2239 Project: Ti

RE: FW: tika-2.x-windows - Build # 94 - Still Failing

2017-01-13 Thread Allison, Timothy B.
Until we get 2.x-windows working (again ? did it ever work?), polling should be never or hourly for change in git, I guess, like the others? The build seems to have stopped for now, which is good. I’m not sure why it was running hourly even without a git change… Any help you could offer getting

Re: FW: tika-2.x-windows - Build # 94 - Still Failing

2017-01-13 Thread lewis john mcgibbney
Hi Tim, What do you want to change the polling to? We can make it nightly or something. What do you want? Thanks On Thu, Jan 5, 2017 at 4:35 AM, Allison, Timothy B. wrote: > Lewis, > Looks like our 2.x windows build is still failing. The new behavior, > though, is that Jenkins is trying every

[jira] [Resolved] (TIKA-2238) Add mime detection for embedded MSEquation files

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2238. --- Resolution: Fixed Fix Version/s: 1.15 2.0 > Add mime detection for embedded M

[jira] [Created] (TIKA-2238) Add mime detection for embedded MSEquation files

2017-01-13 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2238: - Summary: Add mime detection for embedded MSEquation files Key: TIKA-2238 URL: https://issues.apache.org/jira/browse/TIKA-2238 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-2181) Upgrade to POI 3.16-beta2 when available

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821796#comment-15821796 ] Tim Allison commented on TIKA-2181: --- Remove NPE check around {{getShapes}} in XSSFExcelEx

[jira] [Comment Edited] (TIKA-2181) Upgrade to POI 3.16-beta2 when available

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821796#comment-15821796 ] Tim Allison edited comment on TIKA-2181 at 1/13/17 1:52 PM: Rem

[jira] [Resolved] (TIKA-2134) Different NullPointerException on a valid Excel file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2134. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Workaround added in Tika for now.

[jira] [Resolved] (TIKA-2216) ArrayIndexOutOfBoundsException on a valid Word file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2216. --- Resolution: Duplicate > ArrayIndexOutOfBoundsException on a valid Word file > -

[jira] [Resolved] (TIKA-2205) IllegalArgumentException on a valid Excel file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2205. --- Resolution: Duplicate > IllegalArgumentException on a valid Excel file > --

[jira] [Comment Edited] (TIKA-2152) NullPointerException on a valid Word file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821755#comment-15821755 ] Tim Allison edited comment on TIKA-2152 at 1/13/17 1:16 PM: Thi

[jira] [Commented] (TIKA-2152) NullPointerException on a valid Word file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821755#comment-15821755 ] Tim Allison commented on TIKA-2152: --- These files are parsed without problem by the new ex

[jira] [Commented] (TIKA-2163) POIXMLException from ClassCastException on a valid Word template

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821756#comment-15821756 ] Tim Allison commented on TIKA-2163: --- This parsed without problem by the new experimental

[jira] [Commented] (TIKA-2147) ClassCastException on a valid Word template

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821754#comment-15821754 ] Tim Allison commented on TIKA-2147: --- These files are parsed without problem by the new ex

[jira] [Resolved] (TIKA-2207) ArrayIndexOutOfBoundsException on a valid Excel file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2207. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Works now. > ArrayIndexOutOfBounds

[jira] [Resolved] (TIKA-2166) TaggedIOException from a ZipException on a valid Word file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2166. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Works now > TaggedIOException from

[jira] [Resolved] (TIKA-2162) "Unknown compression method" on a Powerpoint file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2162. --- Resolution: Fixed Fix Version/s: 1.15 2.0 works now. > "Unknown compression

[jira] [Resolved] (TIKA-2153) TaggedIOException on a valid Powerpoint file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2153. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Works now. > TaggedIOException on

[jira] [Resolved] (TIKA-2136) External file links in PPTX misparsed

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2136. --- Resolution: Fixed Fix Version/s: 1.15 2.0 > External file links in PPTX mispa

[jira] [Resolved] (TIKA-2161) EOFException on a valid Powerpoint file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2161. --- Resolution: Fixed Fix Version/s: 1.15 2.0 > EOFException on a valid Powerpoin

[jira] [Resolved] (TIKA-2164) HSLFException from ZipException "invalid stored block lengths" on a valid Powerpoint file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2164. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Tested with all attached. No excep

[jira] [Resolved] (TIKA-2215) TikaException about "Invalid embedded resource" on a valid PPT file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2215. --- Resolution: Fixed Fix Version/s: 1.15 2.0 resolved with TIKA-2159 > TikaExce

[jira] [Resolved] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2159. --- Resolution: Fixed Fix Version/s: 1.15 2.0 I added TikaCoreProperties.TIKA_MET

[jira] [Resolved] (TIKA-2204) IndexOutOfBoundsException on a valid Powerpoint file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2204. --- Resolution: Fixed fixed with TIKA-2159 > IndexOutOfBoundsException on a valid Powerpoint file > --

[jira] [Updated] (TIKA-2204) IndexOutOfBoundsException on a valid Powerpoint file

2017-01-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2204: -- Fix Version/s: 1.15 2.0 > IndexOutOfBoundsException on a valid Powerpoint file > -