[jira] [Commented] (TIKA-2082) Upgrade to PDFBox 2.0.3

2016-09-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503458#comment-15503458 ] Luis Filipe Nassif commented on TIKA-2082: -- Sorry Tim, did not see Tika-2051 > Upgrade to PDFBox

[jira] [Commented] (TIKA-2082) Upgrade to PDFBox 2.0.3

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503467#comment-15503467 ] Tim Allison commented on TIKA-2082: --- No need to apologize whatsoever. Thank you for the ping! > Upgrade

[jira] [Created] (TIKA-2082) Upgrade to PDFBox 2.0.3

2016-09-19 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2082: Summary: Upgrade to PDFBox 2.0.3 Key: TIKA-2082 URL: https://issues.apache.org/jira/browse/TIKA-2082 Project: Tika Issue Type: Improvement

[jira] [Resolved] (TIKA-2082) Upgrade to PDFBox 2.0.3

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2082. --- Resolution: Duplicate Fix Version/s: 2.0 Building locally now before I commit (should be 10-15

[jira] [Resolved] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2045. --- Resolution: Fixed Upgraded to PDFBox 2.0.3. > TIKA crashes / runs out of memory on simple PDF >

[jira] [Updated] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2045: -- Fix Version/s: 1.14 2.0 > TIKA crashes / runs out of memory on simple PDF >

[jira] [Resolved] (TIKA-2051) Upgrade to PDFBox 2.0.3 when available

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2051. --- Resolution: Fixed Fix Version/s: 1.14 2.0 > Upgrade to PDFBox 2.0.3 when

[jira] [Commented] (TIKA-2015) MAPIMessage String fileName constructor leaves file open

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503577#comment-15503577 ] Tim Allison commented on TIKA-2015: --- Doh. Typo in commit message. Should have been TIKA-2051. >

tika-2.x-windows - Build # 48 - Still Failing

2016-09-19 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #48) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/48/ to view the results.

[jira] [Commented] (TIKA-2015) MAPIMessage String fileName constructor leaves file open

2016-09-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503616#comment-15503616 ] Hudson commented on TIKA-2015: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #48 (See

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
Hi, I think it's a good thing to discuss. I know there are other features that are targeted for 2.0. Do we have a general sense of where those features are at? My concern is we have been dual maintaining 2 branches for about 9 months. I think the longer we do this the more risk there is

[jira] [Commented] (TIKA-2051) Upgrade to PDFBox 2.0.3 when available

2016-09-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503579#comment-15503579 ] Hudson commented on TIKA-2051: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1102 (See

[jira] [Commented] (TIKA-2015) MAPIMessage String fileName constructor leaves file open

2016-09-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503535#comment-15503535 ] Hudson commented on TIKA-2015: -- SUCCESS: Integrated in Jenkins build tika-2.x #144 (See

[jira] [Created] (TIKA-2083) Tika 2.0 - Audit master branch against 2.x branch

2016-09-19 Thread Bob Paulin (JIRA)
Bob Paulin created TIKA-2083: Summary: Tika 2.0 - Audit master branch against 2.x branch Key: TIKA-2083 URL: https://issues.apache.org/jira/browse/TIKA-2083 Project: Tika Issue Type: Task

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
Bob, As always, thank you for driving 2.0! > My concern is we have been dual maintaining 2 branches for about 9 months. I > think the longer we do this the more risk there is that we miss something. Agreed. I think we're already missing a few things. > Would it make sense to at least put

[jira] [Created] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2084: - Summary: Create resettable OutputStream to support "backoff on exception" strategy Key: TIKA-2084 URL: https://issues.apache.org/jira/browse/TIKA-2084 Project: Tika

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
Thanks Tim! Replies in line. - Bob On 9/19/2016 12:33 PM, Allison, Timothy B. wrote: Bob, As always, thank you for driving 2.0! My concern is we have been dual maintaining 2 branches for about 9 months. I think the longer we do this the more risk there is that we miss something.

[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2016-09-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504023#comment-15504023 ] Nick Burch commented on TIKA-1997: -- Running your file through the openssl tool {{ asn1parse }}, it shows

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
>> 1) Implement various strategies for chaining multiple parsers against >> individual files. Much of this has been implemented, but what's holding us >> up on this one (I think?) is a resettable outputstream. >I think we need a JIRA for this. Is there any existing design ideas on how >this

[jira] [Updated] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1607: -- Issue Type: Sub-task (was: Improvement) Parent: TIKA-2085 > Introduce new arbitrary object

[jira] [Updated] (TIKA-1974) Tika 2.0 - remove deprecated metadata properties

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1974: -- Issue Type: Sub-task (was: Task) Parent: TIKA-2085 > Tika 2.0 - remove deprecated metadata

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
> Should we create a tika-2_0-blocker label to differentiate from regular > "blockers"? How about a single master issue: TIKA-2085. What else do we need to add?

[jira] [Updated] (TIKA-1509) Create configurable strategies for composite parsers

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1509: -- Issue Type: Sub-task (was: Improvement) Parent: TIKA-2085 > Create configurable strategies for

[jira] [Updated] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2084: -- Issue Type: New Feature (was: Sub-task) Parent: (was: TIKA-1509) > Create resettable

[jira] [Created] (TIKA-2085) Tika 2.0 -- Overarching task list for what we need to do before 2.0

2016-09-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2085: - Summary: Tika 2.0 -- Overarching task list for what we need to do before 2.0 Key: TIKA-2085 URL: https://issues.apache.org/jira/browse/TIKA-2085 Project: Tika

[jira] [Updated] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2084: -- Description: If we want a backoff on exception strategy, "try xmlparser, if that fails, try the

[jira] [Updated] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2084: -- Description: If we want a backoff on exception strategy, "try xmlparser, if that fails, try the

[jira] [Commented] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504592#comment-15504592 ] Tim Allison commented on TIKA-2084: --- Good point. Thank you. > Create resettable OutputStream to support

[jira] [Updated] (TIKA-2083) Tika 2.0 - Audit master branch against 2.x branch

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2083: -- Issue Type: Sub-task (was: Task) Parent: TIKA-2085 > Tika 2.0 - Audit master branch against 2.x

[jira] [Commented] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504321#comment-15504321 ] Luis Filipe Nassif commented on TIKA-2084: -- I think the reset could be optional, because some

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
I think that could work! I've also created a custom filter that might help https://issues.apache.org/jira/browse/TIKA-2083?filter=12338448 Logic is as follows: project = TIKA AND affectedVersion = 2.0 AND priority >= Blocker AND status != Closed AND status != Fixed - Bob On 9/19/2016

[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504643#comment-15504643 ] Tim Allison commented on TIKA-2069: --- Just realized that we might want to handle extraction of Actions

Plans for the first Tika 2.0 release

2016-09-19 Thread Sergey Beryozkin
Hi All Back in May I updated one of our CXF demos on the master 3.2 branch to depend on Tika 2.0 SNAPSHOT to verify the new module system works well. It is feasible that CXF 3.2.0 may be released by the end of the year or early next year. As far as Tika 2.0 dependencies are concerned it will