[ANNOUNCE] Apache Tika 1.4 Released

2013-07-02 Thread Chris Mattmann
The Apache Tika project is pleased to announce the release of Apache Tika 1.4. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs. Apache Tika is a toolkit for detecting

[jira] [Created] (TIKA-1139) Modify Tika-1129 to test against a local file

2013-07-02 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1139: - Summary: Modify Tika-1129 to test against a local file Key: TIKA-1139 URL: https://issues.apache.org/jira/browse/TIKA-1139 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-1139) Modify Tika-1129 to test against a local file

2013-07-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1139: -- Attachment: TIKA-1139.patch.tar.gz Patch attached. Modify Tika-1129 to test against a

[jira] [Created] (TIKA-1140) Better table representation, cell spanning in Word Extractor

2013-07-02 Thread Denis Kildishev (JIRA)
Denis Kildishev created TIKA-1140: - Summary: Better table representation, cell spanning in Word Extractor Key: TIKA-1140 URL: https://issues.apache.org/jira/browse/TIKA-1140 Project: Tika

[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-07-02 Thread Thomas Mortagne (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697726#comment-13697726 ] Thomas Mortagne commented on TIKA-1053: --- Are you sure about depending on

[jira] [Comment Edited] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-07-02 Thread Thomas Mortagne (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697726#comment-13697726 ] Thomas Mortagne edited comment on TIKA-1053 at 7/2/13 12:54 PM:

[jira] [Updated] (TIKA-973) PDF form data isn't included in extracted content.

2013-07-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-973: - Attachment: TIKA-973.patch.tar.gz Middle-road change made. The alternate name is an attribute and partial

[jira] [Commented] (TIKA-998) How to handle row span and Colspan in parsing xls or xlsx files

2013-07-02 Thread Himanshu Agrawal (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697807#comment-13697807 ] Himanshu Agrawal commented on TIKA-998: --- anything at all ? How to

[jira] [Commented] (TIKA-998) How to handle row span and Colspan in parsing xls or xlsx files

2013-07-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697818#comment-13697818 ] Nick Burch commented on TIKA-998: - You would need to add extra logic to the parser to detect

[jira] [Resolved] (TIKA-1130) .docx text extract leaves out some portions of text

2013-07-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1130. -- Resolution: Fixed Fix Version/s: 1.5 Thanks for the patch Tim, applied in r1498968.

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

2013-07-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698009#comment-13698009 ] Tim Allison commented on TIKA-1130: --- That was fast. Thank you! .docx

[jira] [Created] (TIKA-1141) javascript files that contain html are detected as text/html

2013-07-02 Thread David Hara (JIRA)
David Hara created TIKA-1141: Summary: javascript files that contain html are detected as text/html Key: TIKA-1141 URL: https://issues.apache.org/jira/browse/TIKA-1141 Project: Tika Issue Type: