[jira] [Created] (TIKA-2703) Error indexing a xlsx file

2018-08-01 Thread Mario Bisonti (JIRA)
Mario Bisonti created TIKA-2703: --- Summary: Error indexing a xlsx file Key: TIKA-2703 URL: https://issues.apache.org/jira/browse/TIKA-2703 Project: Tika Issue Type: Bug Environment:

[jira] [Commented] (TIKA-2703) Error indexing a xlsx file

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565393#comment-16565393 ] Tim Allison commented on TIKA-2703: --- In general: * Using Solr's integration of Tika is not recommended

[jira] [Commented] (TIKA-2703) Error indexing a xlsx file

2018-08-01 Thread Mario Bisonti (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565432#comment-16565432 ] Mario Bisonti commented on TIKA-2703: - Hallo. Yes, I wold like that you could to watch my file

[jira] [Commented] (TIKA-2703) Error indexing a xlsx file

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565464#comment-16565464 ] Tim Allison commented on TIKA-2703: --- Can’t promise I’ll be able to do anything: tallison [at] apache

FW: Tika DjVu?

2018-08-01 Thread Chris Mattmann
From: KamilD Date: Tuesday, July 31, 2018 at 11:37 PM To: "dev-ow...@tika.apache.org" Subject: Tika DjVu? Helo, I'm trying to use tika for djvu but is problem. When using app version 1.14 I get empty result, but in version 1.18 I get: C:\Users\>java -jar

[jira] [Commented] (TIKA-2700) The HTML parser should parse the contents of the title tag as raw text, not HTML

2018-08-01 Thread Gerard Bouchar (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565053#comment-16565053 ] Gerard Bouchar commented on TIKA-2700: -- Maybe the

[jira] [Commented] (TIKA-2701) Text is not extracted properly from WMF files

2018-08-01 Thread Grigoriy Alekseev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565069#comment-16565069 ] Grigoriy Alekseev commented on TIKA-2701: - Will create a pull request. > Text is not extracted

[jira] [Created] (TIKA-2701) Text is not extracted properly from WMF files

2018-08-01 Thread Grigoriy Alekseev (JIRA)
Grigoriy Alekseev created TIKA-2701: --- Summary: Text is not extracted properly from WMF files Key: TIKA-2701 URL: https://issues.apache.org/jira/browse/TIKA-2701 Project: Tika Issue Type:

[jira] [Created] (TIKA-2702) Different behavior between TIKA and pdfbox

2018-08-01 Thread Lior (JIRA)
Lior created TIKA-2702: -- Summary: Different behavior between TIKA and pdfbox Key: TIKA-2702 URL: https://issues.apache.org/jira/browse/TIKA-2702 Project: Tika Issue Type: Bug Components: app

[jira] [Commented] (TIKA-2703) Error indexing a xlsx file

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565620#comment-16565620 ] Tim Allison commented on TIKA-2703: --- To turn off extraction of charts, you have to construct a

[jira] [Commented] (TIKA-2701) Text is not extracted properly from WMF files

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565701#comment-16565701 ] Tim Allison commented on TIKA-2701: --- +1 cannot describe the joy this brings me that someone cares about

[jira] [Commented] (TIKA-2701) Text is not extracted properly from WMF files

2018-08-01 Thread Grigoriy Alekseev (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566370#comment-16566370 ] Grigoriy Alekseev commented on TIKA-2701: - [~talli...@apache.org], my pleasure :) > Text is not

[jira] [Commented] (TIKA-2701) Text is not extracted properly from WMF files

2018-08-01 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566365#comment-16566365 ] ASF GitHub Bot commented on TIKA-2701: -- grigoriy opened a new pull request #245: fix for TIKA-2701

[jira] [Commented] (TIKA-2702) Different behavior between TIKA and pdfbox

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565254#comment-16565254 ] Tim Allison commented on TIKA-2702: --- If you want to prevent extraction of hyperlinks, we could

[jira] [Updated] (TIKA-2552) Upgrade to POI 4.0.0 when available

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2552: -- Attachment: TIKA-2552_--_first_draft.patch > Upgrade to POI 4.0.0 when available >

[jira] [Commented] (TIKA-2552) Upgrade to POI 4.0.0 when available

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565276#comment-16565276 ] Tim Allison commented on TIKA-2552: --- First draft patch attached. This is a minimal upgrade that relies

[jira] [Commented] (TIKA-2702) Different behavior between TIKA and pdfbox

2018-08-01 Thread Lior (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565288#comment-16565288 ] Lior commented on TIKA-2702: I thought that TIKA is using PDFBox, so I expected to get the same result I

[jira] [Assigned] (TIKA-2552) Upgrade to POI 4.0.0 when available

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2552: - Assignee: Tim Allison > Upgrade to POI 4.0.0 when available >

[jira] [Updated] (TIKA-2552) Upgrade to POI 4.0.0 when available

2018-08-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2552: -- Priority: Major (was: Minor) > Upgrade to POI 4.0.0 when available >