[jira] [Resolved] (TIKA-1687) Upgrade xerial.org's sqlite-jdbc to 3.8.10.1

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1687. --- Resolution: Fixed r 1691302 Upgrade xerial.org's sqlite-jdbc to 3.8.10.1

[jira] [Created] (TIKA-1687) Upgrade xerial.org's sqlite-jdbc to 3.8.10.1

2015-07-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1687: - Summary: Upgrade xerial.org's sqlite-jdbc to 3.8.10.1 Key: TIKA-1687 URL: https://issues.apache.org/jira/browse/TIKA-1687 Project: Tika Issue Type: Task

[jira] [Resolved] (TIKA-1684) Clean up metadata properties in Jackcess parser

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1684. --- Resolution: Fixed r 1691297 Clean up metadata properties in Jackcess parser

[jira] [Assigned] (TIKA-1684) Clean up metadata properties in Jackcess parser

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1684: - Assignee: Tim Allison Clean up metadata properties in Jackcess parser

[jira] [Created] (TIKA-1685) Clean up deprecated components

2015-07-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1685: - Summary: Clean up deprecated components Key: TIKA-1685 URL: https://issues.apache.org/jira/browse/TIKA-1685 Project: Tika Issue Type: Task Reporter:

[jira] [Resolved] (TIKA-1685) Clean up some deprecated components

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1685. --- Resolution: Fixed r1691299 Clean up some deprecated components ---

[jira] [Commented] (TIKA-1686) Upgrade metadata-extractor to 2.8.1

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629046#comment-14629046 ] Tim Allison commented on TIKA-1686: --- We're getting a test failure: {noformat}

[jira] [Created] (TIKA-1686) Upgrade metadata-extractor to 2.8.1

2015-07-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1686: - Summary: Upgrade metadata-extractor to 2.8.1 Key: TIKA-1686 URL: https://issues.apache.org/jira/browse/TIKA-1686 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-1679) Parse PDF file page by page

2015-07-15 Thread Raymond Wu (JIRA)
Raymond Wu created TIKA-1679: Summary: Parse PDF file page by page Key: TIKA-1679 URL: https://issues.apache.org/jira/browse/TIKA-1679 Project: Tika Issue Type: Improvement Components:

[jira] [Updated] (TIKA-1682) Add formatting for values in Jackcess

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1682: -- Attachment: formats.xlsx This includes a dump of various formats by data type found in ~3k mdb files

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627956#comment-14627956 ] Tim Allison commented on TIKA-1678: --- Related:

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-15 Thread Andrew Jackson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627960#comment-14627960 ] Andrew Jackson commented on TIKA-1678: -- As far as I can tell, the PDF spec seems to

[jira] [Commented] (TIKA-1679) Parse PDF file page by page

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627891#comment-14627891 ] Tim Allison commented on TIKA-1679: --- To confirm,iIs the problem that an exception is

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627897#comment-14627897 ] Tim Allison edited comment on TIKA-1678 at 7/15/15 11:05 AM: -

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627897#comment-14627897 ] Tim Allison commented on TIKA-1678: --- @Andrew Jackson, good to hear from you! Y, the

[jira] [Comment Edited] (TIKA-1679) Parse PDF file page by page

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627891#comment-14627891 ] Tim Allison edited comment on TIKA-1679 at 7/15/15 10:48 AM: -

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627897#comment-14627897 ] Tim Allison edited comment on TIKA-1678 at 7/15/15 11:07 AM: -

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627897#comment-14627897 ] Tim Allison edited comment on TIKA-1678 at 7/15/15 11:08 AM: -

[jira] [Comment Edited] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627956#comment-14627956 ] Tim Allison edited comment on TIKA-1678 at 7/15/15 12:20 PM: -

[jira] [Created] (TIKA-1683) Add encryption support to Jackcess parser

2015-07-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1683: - Summary: Add encryption support to Jackcess parser Key: TIKA-1683 URL: https://issues.apache.org/jira/browse/TIKA-1683 Project: Tika Issue Type: New Feature

bouncy castle version

2015-07-15 Thread Allison, Timothy B.
All, I just noticed that we're importing bcprov-jdk15on version 1.52. PDFBox and POI are still using 1.51. Do we have any dependencies that are using 1.52? Should we try to request a bump to 1.52 from PDFBox and POI? I'm asking jackcess to move to at least bcprov-jdk15on, and I'd like

[jira] [Commented] (TIKA-1683) Add encryption support to Jackcess parser

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628585#comment-14628585 ] Tim Allison commented on TIKA-1683: --- At this point we have a version clash on

[jira] [Commented] (TIKA-1588) Upgrade to PDFBox 1.8.10 when available

2015-07-15 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628890#comment-14628890 ] Tilman Hausherr commented on TIKA-1588: --- The weird thing is that I can't find any

[jira] [Created] (TIKA-1682) Add formatting for values in Jackcess

2015-07-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1682: - Summary: Add formatting for values in Jackcess Key: TIKA-1682 URL: https://issues.apache.org/jira/browse/TIKA-1682 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1588) Upgrade to PDFBox 1.8.10 when available

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628970#comment-14628970 ] Tim Allison commented on TIKA-1588: --- Interesting. This must be another case of the

[jira] [Created] (TIKA-1684) Clean up metadata properties in Jackcess parser

2015-07-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1684: - Summary: Clean up metadata properties in Jackcess parser Key: TIKA-1684 URL: https://issues.apache.org/jira/browse/TIKA-1684 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1678) PDF metadata extraction fails to spot UTF-16 encoded title

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627948#comment-14627948 ] Tim Allison commented on TIKA-1678: --- Shouldn't have taken me this long, but, isn't that

[jira] [Created] (TIKA-1680) Add configuration layer to configure, Parsers default configurable properties.

2015-07-15 Thread Mario Costa (JIRA)
Mario Costa created TIKA-1680: - Summary: Add configuration layer to configure, Parsers default configurable properties. Key: TIKA-1680 URL: https://issues.apache.org/jira/browse/TIKA-1680 Project: Tika

[jira] [Comment Edited] (TIKA-1588) Upgrade to PDFBox 1.8.10 when available

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628102#comment-14628102 ] Tim Allison edited comment on TIKA-1588 at 7/15/15 2:13 PM:

[jira] [Updated] (TIKA-1588) Upgrade to PDFBox 1.8.10 when available

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1588: -- Attachment: reports_1_8_9_vs_1_8_10.zip Current version of reports attached comparing PDFBox 1.8.9 vs

[jira] [Created] (TIKA-1681) Fix file opening in Jackcess to enable read only for v1997 files

2015-07-15 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1681: - Summary: Fix file opening in Jackcess to enable read only for v1997 files Key: TIKA-1681 URL: https://issues.apache.org/jira/browse/TIKA-1681 Project: Tika Issue

[jira] [Updated] (TIKA-1681) Fix file opening in Jackcess to enable read only for v1997 files

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1681: -- Description: We need to make a small modification in how we're opening mdb files with Jackcess to set

[jira] [Assigned] (TIKA-1681) Fix file opening in Jackcess to enable read only for v1997 files

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1681: - Assignee: Tim Allison Fix file opening in Jackcess to enable read only for v1997 files

[jira] [Commented] (TIKA-1680) Add configuration layer to configure, Parsers default configurable properties.

2015-07-15 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628162#comment-14628162 ] Tim Allison commented on TIKA-1680: --- If we implemented TIKA-1508, would that accomplish

[jira] [Comment Edited] (TIKA-1679) Parse PDF file page by page

2015-07-15 Thread Raymond Wu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628196#comment-14628196 ] Raymond Wu edited comment on TIKA-1679 at 7/15/15 3:05 PM: --- I

[jira] [Commented] (TIKA-1679) Parse PDF file page by page

2015-07-15 Thread Raymond Wu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628196#comment-14628196 ] Raymond Wu commented on TIKA-1679: -- I have split this PDF file to 5 files to make sure