Re: [VOTE] Release Apache Tika 1.17 Candidate #2

2017-12-08 Thread Tim Allison
On Friday, December 8, 2017, 7:43:05 PM EST, Tim Allison wrote: A candidate for the Tika 1.17 release is available at:  https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: 

[CANCELLED] Re: [VOTE] Release Apache Tika 1.17 Candidate #1

2017-12-08 Thread Tim Allison
There was a data transfer glitch to nexus.  Will respin #2. On Friday, December 8, 2017, 2:51:14 PM EST, Tim Allison wrote: A candidate for the Tika 1.17 release is available at:  https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip

Re: 1.17 rc1 and two repos in nexus?!

2017-12-08 Thread Chris Mattmann
RC #2…. On 12/8/17, 2:30 PM, "Allison, Timothy B." wrote: Wait, no that's totally hosed, there's not even a source zip file in: https://repository.apache.org/content/repositories/orgapachetika-1027

RE: 1.17 rc1 and two repos in nexus?!

2017-12-08 Thread Allison, Timothy B.
Wait, no that's totally hosed, there's not even a source zip file in: https://repository.apache.org/content/repositories/orgapachetika-1027 https://repository.apache.org/content/repositories/orgapachetika-1027/org/apache/tika/tika/1.17/ Do I need to respin w rc2? Or is there a way to push to

RE: 1.17 rc1 and two repos in nexus?!

2017-12-08 Thread Allison, Timothy B.
Do we expect only the src to be in nexus, not the jar artifacts (with sigs and digests) for app, server, eval? -Original Message- From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Friday, December 8, 2017 5:07 PM To: dev@tika.apache.org Subject: Re: 1.17 rc1 and two repos in

[jira] [Commented] (TIKA-2523) Regression in ppt parsing -- "typeface can't be null or empty"

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284321#comment-16284321 ] Tim Allison commented on TIKA-2523: --- https://bz.apache.org/bugzilla/show_bug.cgi?id=61881 Turns out this

Re: 1.17 rc1 and two repos in nexus?!

2017-12-08 Thread Chris Mattmann
Hey Tim, probably just upload errors on the first one and so it tried again. No worries. Drop and close the first, and just use the 2nd. Cheers, Chris On 12/8/17, 12:05 PM, "Allison, Timothy B." wrote: Not sure what happened, but two repos were created in Nexus:

[jira] [Updated] (TIKA-2523) Regression in ppt parsing -- "typeface can't be null or empty"

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2523: -- Summary: Regression in ppt parsing -- "typeface can't be null or empty" (was: Regression in ppt

[jira] [Updated] (TIKA-2523) Regression in ppt parsing

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2523: -- Attachment: 802350.ppt triggering file from govdocs1 > Regression in ppt parsing >

[jira] [Created] (TIKA-2523) Regression in ppt parsing

2017-12-08 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2523: - Summary: Regression in ppt parsing Key: TIKA-2523 URL: https://issues.apache.org/jira/browse/TIKA-2523 Project: Tika Issue Type: Bug Reporter: Tim

RE: 1.17 rc1 and two repos in nexus?!

2017-12-08 Thread Allison, Timothy B.
And do I remember correctly that the full distro should be in nexus, not just the source code as we currently have in: https://repository.apache.org/content/repositories/orgapachetika-1027/ Need to respin rc2 on Monday if that's the case. -Original Message- From: Allison, Timothy B.

1.17 rc1 and two repos in nexus?!

2017-12-08 Thread Allison, Timothy B.
Not sure what happened, but two repos were created in Nexus: https://repository.apache.org/content/repositories/orgapachetika-1026/ https://repository.apache.org/content/repositories/orgapachetika-1027/ The first one (1026) failed with checksum problems, and I dropped it. I closed the second one

[VOTE] Release Apache Tika 1.17 Candidate #1

2017-12-08 Thread Tim Allison
A candidate for the Tika 1.17 release is available at:  https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in:  https://github.com/apache/tika/tree/1.17-rc1 The SHA1 checksum of the archive is  37f3cd19051160a8c488b1aa7ff25c3ae515c359. In addition,

[jira] [Commented] (TIKA-2521) SAX-based docx/pptx should start a new line before second paragraph within a cell

2017-12-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284026#comment-16284026 ] Hudson commented on TIKA-2521: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1410 (See

[jira] [Commented] (TIKA-2522) Trivial regression in MSWord parser -- not extracting Encite Add in text any more

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283948#comment-16283948 ] Tim Allison commented on TIKA-2522: --- I think fixing this very minor regression poses more risk than

[jira] [Updated] (TIKA-2522) Trivial regression in MSWord parser -- not extracting Encite Add in text any more

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2522: -- Summary: Trivial regression in MSWord parser -- not extracting Encite Add in text any more (was:

[jira] [Updated] (TIKA-2522) Regression in MSWord parser -- not extracting Encite Add in text any more

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2522: -- Attachment: 508650.doc Example file. We should extract "pharmacology" from this...among other words in

[jira] [Created] (TIKA-2522) Regression in MSWord parser -- not extracting Encite Add in text any more

2017-12-08 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2522: - Summary: Regression in MSWord parser -- not extracting Encite Add in text any more Key: TIKA-2522 URL: https://issues.apache.org/jira/browse/TIKA-2522 Project: Tika

[jira] [Commented] (TIKA-2483) Using PackageParser in ForkParser causes NPE

2017-12-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283922#comment-16283922 ] Hudson commented on TIKA-2483: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1409 (See

[jira] [Resolved] (TIKA-2521) SAX-based docx/pptx should start a new line before second paragraph within a cell

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2521. --- Resolution: Fixed Fix Version/s: 1.17 > SAX-based docx/pptx should start a new line before

[jira] [Created] (TIKA-2521) SAX-based docx/pptx should start a new line before second paragraph within a cell

2017-12-08 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2521: - Summary: SAX-based docx/pptx should start a new line before second paragraph within a cell Key: TIKA-2521 URL: https://issues.apache.org/jira/browse/TIKA-2521 Project:

[jira] [Commented] (TIKA-2519) Issue parsing multiple CHM files concurrently

2017-12-08 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283830#comment-16283830 ] Hudson commented on TIKA-2519: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1408 (See

[jira] [Resolved] (TIKA-2483) Using PackageParser in ForkParser causes NPE

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2483. --- Resolution: Fixed Fix Version/s: 1.17 > Using PackageParser in ForkParser causes NPE >

[jira] [Comment Edited] (TIKA-2483) Using PackageParser in ForkParser causes NPE

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283780#comment-16283780 ] Tim Allison edited comment on TIKA-2483 at 12/8/17 4:38 PM: Regression tests in

Re: Tika 1.17?

2017-12-08 Thread Luís Filipe Nassif
Yes, Tim, I saw all these reporting artifacs, I agree they are good things. 2017-12-08 14:32 GMT-02:00 Allison, Timothy B. : > Thank you, Luís. I’ve finally had a chance to take a look. As exceptions > go, the PPT is the most eye-opening. I don’t know how I didn’t catch >

RE: Tika 1.17?

2017-12-08 Thread Allison, Timothy B.
Thank you, Luís. I’ve finally had a chance to take a look. As exceptions go, the PPT is the most eye-opening. I don’t know how I didn’t catch those…ugh. There are a bunch more exceptions for zerobyte file exceptions in attachments, but this is a good thing, because now we can figure out if

[jira] [Comment Edited] (TIKA-2483) Using PackageParser in ForkParser causes NPE

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283780#comment-16283780 ] Tim Allison edited comment on TIKA-2483 at 12/8/17 4:26 PM: Regression tests in

[jira] [Comment Edited] (TIKA-2483) Using PackageParser in ForkParser causes NPE

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283780#comment-16283780 ] Tim Allison edited comment on TIKA-2483 at 12/8/17 4:15 PM: Regression tests in

[jira] [Commented] (TIKA-2483) Using PackageParser in ForkParser causes NPE

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283780#comment-16283780 ] Tim Allison commented on TIKA-2483: --- Regression tests in prep for 1.17 show that we need to add quite a

[jira] [Created] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2017-12-08 Thread Vincent van Donselaar (JIRA)
Vincent van Donselaar created TIKA-2520: --- Summary: OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request Key: TIKA-2520 URL:

[jira] [Commented] (TIKA-2519) Issue parsing multiple CHM files concurrently

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283679#comment-16283679 ] Tim Allison commented on TIKA-2519: --- Thank you [~esaunders]! > Issue parsing multiple CHM files

[jira] [Resolved] (TIKA-2519) Issue parsing multiple CHM files concurrently

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2519. --- Resolution: Fixed Fix Version/s: 1.17 > Issue parsing multiple CHM files concurrently >

[jira] [Comment Edited] (TIKA-2519) Issue parsing multiple CHM files concurrently

2017-12-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281225#comment-16281225 ] Tim Allison edited comment on TIKA-2519 at 12/8/17 2:51 PM: Thank you for