[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365799#comment-14365799 ] Tim Allison commented on TIKA-1575: --- I've kicked off a single-threaded batch run of 1.8.9

[jira] [Updated] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1575: -- Attachment: 005937_1_8_9-SNAPSHOT.pdf.json Corrupted characters where monitoring should be. Given that

[jira] [Comment Edited] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368449#comment-14368449 ] Tim Allison edited comment on TIKA-1575 at 3/19/15 3:35 AM:

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368449#comment-14368449 ] Tim Allison commented on TIKA-1575: --- From manual review... Based on the More_in_A

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368427#comment-14368427 ] Tim Allison commented on TIKA-1575: --- I'm not sure the differences we're seeing are in

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371667#comment-14371667 ] Tim Allison commented on TIKA-1575: --- +1 Therefore, we have another improvement with

[jira] [Comment Edited] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371565#comment-14371565 ] Tim Allison edited comment on TIKA-1575 at 3/20/15 4:33 PM: Hi

[jira] [Updated] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1575: -- Attachment: reports_1_8_9_multithread_vs_single.zip I ran 1.8.9 single threaded and compared the output

[jira] [Resolved] (TIKA-1553) Let's add a mock parser to be used in testing parser drivers and wrappers

2015-03-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1553. --- Resolution: Fixed r1664635 Let's add a mock parser to be used in testing parser drivers and wrappers

[jira] [Reopened] (TIKA-1553) Let's add an evil parser to be used in testing parser drivers

2015-03-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-1553: --- I'd like to make this more general, rename it to MockParser and move it into tika-core tests...we could

[jira] [Updated] (TIKA-1553) Let's add an mock parser to be used in testing parser drivers and wrappers

2015-03-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1553: -- Summary: Let's add an mock parser to be used in testing parser drivers and wrappers (was: Let's add an

[jira] [Updated] (TIKA-1553) Let's add a mock parser to be used in testing parser drivers and wrappers

2015-03-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1553: -- Summary: Let's add a mock parser to be used in testing parser drivers and wrappers (was: Let's add an

[jira] [Comment Edited] (TIKA-1553) Let's add a mock parser to be used in testing parser drivers and wrappers

2015-03-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350726#comment-14350726 ] Tim Allison edited comment on TIKA-1553 at 3/6/15 7:10 PM: --- Thank

[jira] [Created] (TIKA-1566) Try to migrate current Tika code around PDFBox 1.8.x from JempBox to XMPBox

2015-03-06 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1566: - Summary: Try to migrate current Tika code around PDFBox 1.8.x from JempBox to XMPBox Key: TIKA-1566 URL: https://issues.apache.org/jira/browse/TIKA-1566 Project: Tika

[jira] [Assigned] (TIKA-1566) Try to migrate current Tika code around PDFBox 1.8.x from JempBox to XMPBox

2015-03-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1566: - Assignee: Tim Allison Try to migrate current Tika code around PDFBox 1.8.x from JempBox to

[jira] [Created] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-13 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1575: - Summary: Upgrade to PDFBox 1.8.9 when available Key: TIKA-1575 URL: https://issues.apache.org/jira/browse/TIKA-1575 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1575: -- Attachment: PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT_reports.zip

[jira] [Comment Edited] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360492#comment-14360492 ] Tim Allison edited comment on TIKA-1575 at 3/13/15 3:47 PM:

[jira] [Updated] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1575: -- Attachment: 10-814_Appendix B_v3.pdf Form clutter...This was embedded inside 776568. With PDFBox 1.8.8,

[jira] [Updated] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1575: -- Attachment: diffs_1_8_9_multithread_vs_single_thread.xlsx When I loosen the restriction to report all

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

2015-03-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377966#comment-14377966 ] Tim Allison commented on TIKA-1489: --- To close the loop on this, there are 2,784 pdfs out

[jira] [Commented] (TIKA-1440) Auto-Paragraph numbers not extracted from Word Document

2015-03-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380393#comment-14380393 ] Tim Allison commented on TIKA-1440: --- Able to post a mock-up document and expected output?

[jira] [Commented] (TIKA-1440) Auto-Paragraph numbers not extracted from Word Document

2015-03-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380496#comment-14380496 ] Tim Allison commented on TIKA-1440: --- Apologies if this message crosses your attachments

[jira] [Assigned] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1584: - Assignee: Tim Allison Tika 1.7 possible regression (nested attachment files not getting parsed)

[jira] [Updated] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1584: -- Priority: Blocker (was: Major) Tika 1.7 possible regression (nested attachment files not getting

[jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385477#comment-14385477 ] Tim Allison commented on TIKA-1584: --- IMHO this is major enough for a fix asap. Whether

[jira] [Comment Edited] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385463#comment-14385463 ] Tim Allison edited comment on TIKA-1584 at 3/28/15 6:53 PM:

[jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385463#comment-14385463 ] Tim Allison commented on TIKA-1584: --- Just checked svn. That's a major regression added in

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386562#comment-14386562 ] Tim Allison commented on TIKA-1511: --- Thank you, [~thetaphi]. I was aware of about half

[jira] [Resolved] (TIKA-1511) Create a parser for SQLite3

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1511. --- Resolution: Fixed r1670069. Removed provided in parsers' pom. Happy to revisit this if there are

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386697#comment-14386697 ] Tim Allison commented on TIKA-1512: --- Temporary fix ignoring tests and excluding test docs

[jira] [Resolved] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1584. --- Resolution: Fixed Fix Version/s: 1.8 r1670095. Thank you, [~rtulloh], for raising this issue!

[jira] [Comment Edited] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343665#comment-14343665 ] Tim Allison edited comment on TIKA-944 at 3/30/15 11:41 AM: Some

[jira] [Created] (TIKA-1588) Upgrade to PDFBox 1.8.10 when available

2015-03-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1588: - Summary: Upgrade to PDFBox 1.8.10 when available Key: TIKA-1588 URL: https://issues.apache.org/jira/browse/TIKA-1588 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1585) Create Example Website with Form Submission

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387826#comment-14387826 ] Tim Allison commented on TIKA-1585: --- Done. Let me know if it works before we shutdown

[jira] [Commented] (TIKA-1590) A particular PDF seems to trigger an infinite loop when being converted to HTML

2015-04-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390469#comment-14390469 ] Tim Allison commented on TIKA-1590: --- Not that this is needed, but I doubly confirmed that

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385773#comment-14385773 ] Tim Allison commented on TIKA-1511: --- Any objections to including xerial with app and

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385782#comment-14385782 ] Tim Allison commented on TIKA-1511: --- [~thetaphi], will there be any problems for Solr if

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385700#comment-14385700 ] Tim Allison commented on TIKA-1575: --- +1 Upgrade to PDFBox 1.8.9 when available

[jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385440#comment-14385440 ] Tim Allison commented on TIKA-1584: --- Able to attach example triggering doc? By same

[jira] [Comment Edited] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385440#comment-14385440 ] Tim Allison edited comment on TIKA-1584 at 3/28/15 6:08 PM:

[jira] [Commented] (TIKA-1321) Add experimental Stax/Streaming XWPF/docx extractor

2015-03-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375778#comment-14375778 ] Tim Allison commented on TIKA-1321: --- Y, but it isn't ready for primetime/committing yet.

[jira] [Comment Edited] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343665#comment-14343665 ] Tim Allison edited comment on TIKA-944 at 3/2/15 9:53 PM: -- Some

[jira] [Comment Edited] (TIKA-964) Ability to specify bind address

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343765#comment-14343765 ] Tim Allison edited comment on TIKA-964 at 3/2/15 9:10 PM: -- As a

[jira] [Resolved] (TIKA-964) Ability to specify bind address

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-964. -- Resolution: Won't Fix Let's move users of tika-app's server to Apache CXF's JAX-RS server in the

[jira] [Comment Edited] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343665#comment-14343665 ] Tim Allison edited comment on TIKA-944 at 3/2/15 9:53 PM: -- Some

[jira] [Commented] (TIKA-964) Ability to specify bind address

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343765#comment-14343765 ] Tim Allison commented on TIKA-964: -- As a newbie to JAX-RS a few months ago(?), I was

[jira] [Commented] (TIKA-758) Address TODOs when we upgrade to next PDFBox release

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343717#comment-14343717 ] Tim Allison commented on TIKA-758: -- And then I remembered

[jira] [Commented] (TIKA-1301) Establish TikaServer on Apache hosted VM

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343743#comment-14343743 ] Tim Allison commented on TIKA-1301: --- Moved to a new server: 162.242.228.174:9998

[jira] [Comment Edited] (TIKA-1301) Establish TikaServer on Apache hosted VM

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343743#comment-14343743 ] Tim Allison edited comment on TIKA-1301 at 3/2/15 8:52 PM: --- Moved

[jira] [Commented] (TIKA-456) Support timeouts for parsers

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343193#comment-14343193 ] Tim Allison commented on TIKA-456: -- [~tpalsulich], until you pinged me on this, I regret

[jira] [Commented] (TIKA-1559) SecureContentHandler.SecureSAXException is not serializable

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343261#comment-14343261 ] Tim Allison commented on TIKA-1559: --- [~sashkap], thank you for raising this issue and

[jira] [Updated] (TIKA-1489) PDF Text extraction without permission

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1489: -- Attachment: testPDF_no_extract_yes_accessibility_owner_user.pdf

[jira] [Commented] (TIKA-955) Unable to extract Track Changes metadata from a microsoft word document

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343631#comment-14343631 ] Tim Allison commented on TIKA-955: -- Let's leave this one open. It is on my list to get to

[jira] [Commented] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343665#comment-14343665 ] Tim Allison commented on TIKA-944: -- There's a slight disconnect in how we handle extraction

[jira] [Commented] (TIKA-758) Address TODOs when we upgrade to next PDFBox release

2015-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343672#comment-14343672 ] Tim Allison commented on TIKA-758: -- Y, we should be good. Please remove or let me know if

[jira] [Resolved] (TIKA-1489) PDF Text extraction without permission

2015-03-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1489. --- Resolution: Fixed Fix Version/s: 1.8 r1663764 PDF Text extraction without permission

[jira] [Commented] (TIKA-1038) Parsing PDF with StackOverlowError

2015-03-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347382#comment-14347382 ] Tim Allison commented on TIKA-1038: --- [~tilman], I hadn't been, but now am. Thank you.

[jira] [Commented] (TIKA-1038) Parsing PDF with StackOverlowError

2015-03-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347426#comment-14347426 ] Tim Allison commented on TIKA-1038: --- Couldn't help it. Y, [~tpalsulich], that looks good

[jira] [Comment Edited] (TIKA-1038) Parsing PDF with StackOverlowError

2015-03-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347426#comment-14347426 ] Tim Allison edited comment on TIKA-1038 at 3/4/15 8:46 PM: ---

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-03-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348047#comment-14348047 ] Tim Allison commented on TIKA-1330: --- Posted patch to review board

[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333266#comment-14333266 ] Tim Allison commented on TIKA-1558: --- I agree with Nick that I'd prefer to migrate more

[jira] [Commented] (TIKA-1560) OutOfMemoryError analyzinig specific file

2015-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337097#comment-14337097 ] Tim Allison commented on TIKA-1560: --- Well, that's exciting! Thank you for reporting this

[jira] [Comment Edited] (TIKA-1560) OutOfMemoryError analyzinig specific file

2015-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337112#comment-14337112 ] Tim Allison edited comment on TIKA-1560 at 2/25/15 8:22 PM:

[jira] [Commented] (TIKA-1560) OutOfMemoryError analyzinig specific file

2015-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337112#comment-14337112 ] Tim Allison commented on TIKA-1560: --- When I just tested this file with Tika 1.8-SNAPSHOT

[jira] [Commented] (TIKA-1509) Create configurable strategies for composite parsers

2015-02-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338538#comment-14338538 ] Tim Allison commented on TIKA-1509: --- To confirm I understand, is the goal/use case of

[jira] [Commented] (TIKA-1509) Create configurable strategies for composite parsers

2015-02-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339180#comment-14339180 ] Tim Allison commented on TIKA-1509: --- I think we agree; I did raise a second and unrelated

[jira] [Resolved] (TIKA-1323) Improve exception reporting in JAX-RS server

2015-02-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1323. --- Resolution: Fixed r1661193 Commandline option -includeStack will enable this behavior. I centralized

[jira] [Resolved] (TIKA-1556) Clean up whitespace in tika-server

2015-02-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1556. --- Resolution: Fixed r1661200. Clean up whitespace in tika-server --

[jira] [Created] (TIKA-1556) Clean up whitespace in tika-server

2015-02-20 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1556: - Summary: Clean up whitespace in tika-server Key: TIKA-1556 URL: https://issues.apache.org/jira/browse/TIKA-1556 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-1323) Improve exception reporting in JAX-RS server

2015-02-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329426#comment-14329426 ] Tim Allison commented on TIKA-1323: --- now running with this option on TIKA-1301's server:

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-03-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383118#comment-14383118 ] Tim Allison commented on TIKA-1512: --- I looked at a handful of docs from govdocs that have

[jira] [Updated] (TIKA-1512) WordParser fails on many Word files

2015-03-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1512: -- Attachment: 046839.doc 040044.doc docs referenced WordParser fails on many Word files

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-03-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383162#comment-14383162 ] Tim Allison commented on TIKA-1512: --- For the few I tried, the short links do work in the

[jira] [Resolved] (TIKA-1330) Add robust tika-batch code

2015-03-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1330. --- Resolution: Fixed added in r1668673. The tests for the new module add 2.5 minutes to my build. Sorry!

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-03-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382093#comment-14382093 ] Tim Allison commented on TIKA-1512: --- Thank you, [~Genstr] for this info, do you happen to

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289304#comment-14289304 ] Tim Allison commented on TIKA-1529: --- I just fixed issues BasicContentHandlerFactoryTest

[jira] [Updated] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1529: -- Description: [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he

[jira] [Comment Edited] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289304#comment-14289304 ] Tim Allison edited comment on TIKA-1529 at 1/23/15 2:41 PM: I

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289449#comment-14289449 ] Tim Allison commented on TIKA-1529: --- Agreed on US-ASCII, but aren't there illegal

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289369#comment-14289369 ] Tim Allison commented on TIKA-1529: --- [~grossws], for the following in

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289410#comment-14289410 ] Tim Allison commented on TIKA-1529: --- Great! I'm still making mods and creating a static

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289362#comment-14289362 ] Tim Allison commented on TIKA-1529: --- Makes sense. I'll try to fix the causes for failure

[jira] [Commented] (TIKA-1518) Docker with Tika Server

2015-01-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298002#comment-14298002 ] Tim Allison commented on TIKA-1518: --- [~tpalsulich], y, the server was initially intended

[jira] [Created] (TIKA-1605) Fix potential NPEs in Throwable.getMessage().XYZ()

2015-04-14 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1605: - Summary: Fix potential NPEs in Throwable.getMessage().XYZ() Key: TIKA-1605 URL: https://issues.apache.org/jira/browse/TIKA-1605 Project: Tika Issue Type: Bug

[jira] [Resolved] (TIKA-1605) Fix potential NPEs in Throwable.getMessage().XYZ()

2015-04-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1605. --- Resolution: Fixed r1673406 Fix potential NPEs in Throwable.getMessage().XYZ()

[jira] [Updated] (TIKA-1605) Fix potential NPEs in Throwable.getMessage().XYZ()

2015-04-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1605: -- Fix Version/s: 1.9 Fix potential NPEs in Throwable.getMessage().XYZ()

[jira] [Resolved] (TIKA-1611) Allow RecursiveParserWrapper to catch exceptions from embedded documents

2015-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1611. --- Resolution: Fixed r1675159. Nothing like testing to see behavior, rather than assumptions. :( Allow

[jira] [Updated] (TIKA-1611) Allow RecursiveParserWrapper to catch exceptions from embedded documents

2015-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1611: -- Description: While parsing embedded documents, currently, if a parser hits an

[jira] [Commented] (TIKA-1612) Exceptions getting image data in PPT files

2015-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505335#comment-14505335 ] Tim Allison commented on TIKA-1612: --- Not sure how we want to fix this. To make this

[jira] [Updated] (TIKA-1611) Allow RecursiveParserWrapper to catch exceptions from embedded documents

2015-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1611: -- Description: While parsing embedded documents, currently, if a parser hits an Exception, the Exception

[jira] [Created] (TIKA-1612) Exceptions getting image data in PPT files

2015-04-21 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1612: - Summary: Exceptions getting image data in PPT files Key: TIKA-1612 URL: https://issues.apache.org/jira/browse/TIKA-1612 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2015-04-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502716#comment-14502716 ] Tim Allison commented on TIKA-1513: --- Hi [~iryndin], I wanted to check in to see how the

[jira] [Resolved] (TIKA-1511) Create a parser for SQLite3

2015-04-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1511. --- Resolution: Fixed moved dependencies to provided in r1674800. Create a parser for SQLite3

[jira] [Commented] (TIKA-1501) Fix the disabled Tika Bundle OSGi related unit tests

2015-04-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502776#comment-14502776 ] Tim Allison commented on TIKA-1501: --- Ah, ok. This is the first time I've really looked

[jira] [Commented] (TIKA-1315) Basic list support in WordExtractor

2015-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505008#comment-14505008 ] Tim Allison commented on TIKA-1315: --- Ha. Ok, but your patch is really well done. Let me

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2015-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505092#comment-14505092 ] Tim Allison commented on TIKA-1513: --- Completely agree. Only 2,386 files. This is the

[jira] [Resolved] (TIKA-1501) Fix the disabled Tika Bundle OSGi related unit tests

2015-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1501. --- Resolution: Fixed Fix Version/s: 1.9 r1675121. Thank you, [~bobpaulin]! Fix the disabled

[jira] [Created] (TIKA-1611) Allow RecursiveParserWrapper to catch exceptions from embedded documents

2015-04-21 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1611: - Summary: Allow RecursiveParserWrapper to catch exceptions from embedded documents Key: TIKA-1611 URL: https://issues.apache.org/jira/browse/TIKA-1611 Project: Tika

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2015-04-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505006#comment-14505006 ] Tim Allison commented on TIKA-1513: --- Y, I was concerned by that generally. Are you

<    2   3   4   5   6   7   8   9   10   11   >