[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-02-01 Thread Thamme Gowda N (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127061#comment-15127061 ] Thamme Gowda N commented on TIKA-1816: -- [~talli...@mitre.org] Sure, I will have a look. Correct me if

[jira] [Commented] (TIKA-1841) Different XML output structure for PPT and PPTX

2016-02-01 Thread Sam H (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126248#comment-15126248 ] Sam H commented on TIKA-1841: - Hi [~gagravarr], There has been no reaction to this issue in the past 6 days.

[jira] [Commented] (TIKA-1843) Tika parser for SEG-Y files and new MIME type application/segy

2016-02-01 Thread Giovanni Usai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126272#comment-15126272 ] Giovanni Usai commented on TIKA-1843: - Hi Nick, Sigrun owner has merged my modifications, so we can go

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126477#comment-15126477 ] Ian Williams commented on TIKA-1845: Tim - thanks for confirming what's in the attachment and for the

[jira] [Commented] (TIKA-1841) Different XML output structure for PPT and PPTX

2016-02-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126532#comment-15126532 ] Nick Burch commented on TIKA-1841: -- Ideally we would break out the header and footer into separate

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126479#comment-15126479 ] Tim Allison commented on TIKA-1845: --- There are two problems that this file reveals. 1) The

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126506#comment-15126506 ] Tim Allison commented on TIKA-1845: --- my failure on TIKA-1010 to set mime correctly. > Unable to extract

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Attachment: example-that-fails.rtf > Unable to extract content from certain RTFs using tika-server

[jira] [Assigned] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1845: - Assignee: Tim Allison > Unable to extract content from certain RTFs using tika-server versions

[jira] [Resolved] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1830. --- Resolution: Fixed Assignee: Tim Allison [~thetaphi], I'm sorry I didn't get this into 1.12. I'd

[jira] [Updated] (TIKA-1830) Upgrade to PDFBox 1.8.11 when available

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1830: -- Fix Version/s: 1.13 > Upgrade to PDFBox 1.8.11 when available > ---

[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127009#comment-15127009 ] Tim Allison commented on TIKA-1816: --- [~thammegowda], if you have a chance, would you be willing to try

[jira] [Created] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
Ian Williams created TIKA-1845: -- Summary: Unable to extract content from certain RTFs using tika-server versions since 1.5 Key: TIKA-1845 URL: https://issues.apache.org/jira/browse/TIKA-1845 Project:

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126320#comment-15126320 ] Tim Allison commented on TIKA-1845: --- >From the stacktrace, this looks to be related to TIKA-1010. Will

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Description: I have some patient letters that are RTF documents. When I extract the text from these

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Description: I have some patient letters that are RTF documents. When I extract the text from these

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Attachment: (was: test-anonymised-letter.rtf) > Unable to extract content from certain RTFs using

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126317#comment-15126317 ] Nick Burch commented on TIKA-1845: -- Near the top of the jira page are some buttons, please hit "More" then

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126354#comment-15126354 ] Ian Williams commented on TIKA-1845: I've deleted the attachment for the time being - sorry. Please

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126375#comment-15126375 ] Ian Williams commented on TIKA-1845: Just being cautious because I don't want to share anything in a

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Description: I have some patient letters that are RTF documents. When I extract the text from these

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Attachment: test-anonymised-letter.rtf > Unable to extract content from certain RTFs using tika-server

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126340#comment-15126340 ] Ian Williams commented on TIKA-1845: OK - thanks. I've attached the file now. > Unable to extract

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126366#comment-15126366 ] Tim Allison commented on TIKA-1845: --- Scooped it from evernote. Let me know if I should srm it. > Unable

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126429#comment-15126429 ] Tim Allison commented on TIKA-1845: --- Looks like there is no trouble with the tika-app with straight

[jira] [Commented] (TIKA-1843) Tika parser for SEG-Y files and new MIME type application/segy

2016-02-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126450#comment-15126450 ] Nick Burch commented on TIKA-1843: -- Ideally you'd work with the Sigrun owner to have them do it - it's