[jira] [Updated] (TIKA-1194) Missing text from MS Word (DOC) file

2015-03-18 Thread Tomas Safarik (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Safarik updated TIKA-1194: Attachment: apache-tika-1.5.patch Just for information. Patch of our changes that workarounds the

[jira] [Commented] (TIKA-456) Support timeouts for parsers

2015-03-18 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367172#comment-14367172 ] Ken Krugler commented on TIKA-456: -- Re killing a thread - yes, that's not possible to do

[jira] [Comment Edited] (TIKA-456) Support timeouts for parsers

2015-03-18 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367172#comment-14367172 ] Ken Krugler edited comment on TIKA-456 at 3/18/15 2:18 PM: --- Re

[jira] [Comment Edited] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368449#comment-14368449 ] Tim Allison edited comment on TIKA-1575 at 3/19/15 3:35 AM:

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368449#comment-14368449 ] Tim Allison commented on TIKA-1575: --- From manual review... Based on the More_in_A

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368427#comment-14368427 ] Tim Allison commented on TIKA-1575: --- I'm not sure the differences we're seeing are in

[jira] [Assigned] (TIKA-1365) Incorrectly MimeType detection for Apache Lucene web site

2015-03-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned TIKA-1365: --- Assignee: Chris A. Mattmann Incorrectly MimeType detection for Apache Lucene web

[jira] [Created] (TIKA-1578) Add file type description to HDFParsers

2015-03-18 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1578: - Summary: Add file type description to HDFParsers Key: TIKA-1578 URL: https://issues.apache.org/jira/browse/TIKA-1578 Project: Tika Issue Type: Improvement

[jira] [Created] (TIKA-1579) Add file type to NetCDFParser

2015-03-18 Thread Ann Burgess (JIRA)
Ann Burgess created TIKA-1579: - Summary: Add file type to NetCDFParser Key: TIKA-1579 URL: https://issues.apache.org/jira/browse/TIKA-1579 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-18 Thread Maruan Sahyoun (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367483#comment-14367483 ] Maruan Sahyoun commented on TIKA-1575: -- Let me know your outcome. The changes might

[jira] [Commented] (TIKA-1098) not able to parse pdfs/docs/ppts using 1.1 tika parser‏‏

2015-03-18 Thread JIRA
[ https://issues.apache.org/jira/browse/TIKA-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367950#comment-14367950 ] Andreas Lehmkühler commented on TIKA-1098: -- The parser stumbles upon a malformed