Jorge Spinsanti created TIKA-3021:
-
Summary: Upgrade to PDFBOX 2.0.18
Key: TIKA-3021
URL: https://issues.apache.org/jira/browse/TIKA-3021
Project: Tika
Issue Type: Improvement
Affects
[
https://issues.apache.org/jira/browse/TIKA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992760#comment-16992760
]
Jorge Spinsanti commented on TIKA-3005:
---
[~tallison] thanks a lot for your comments & documentation.
[
https://issues.apache.org/jira/browse/TIKA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992688#comment-16992688
]
Jorge Spinsanti commented on TIKA-3005:
---
Often, we treat with "corrupt" files. However, in these
[
https://issues.apache.org/jira/browse/TIKA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992650#comment-16992650
]
Jorge Spinsanti commented on TIKA-3005:
---
Any news? Remember that I attached 3 files more with same
[
https://issues.apache.org/jira/browse/TIKA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990044#comment-16990044
]
Jorge Spinsanti commented on TIKA-3005:
---
Thanks for the link.
I forgot a detail in my original
[
https://issues.apache.org/jira/browse/TIKA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989982#comment-16989982
]
Jorge Spinsanti edited comment on TIKA-3005 at 12/6/19 5:06 PM:
I can add
[
https://issues.apache.org/jira/browse/TIKA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-3005:
--
Attachment: file3.pdf
file2.pdf
file1.pdf
> Unintelligible text
[
https://issues.apache.org/jira/browse/TIKA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989982#comment-16989982
]
Jorge Spinsanti commented on TIKA-3005:
---
I can add 3 files with same text extraction. Please, can
[
https://issues.apache.org/jira/browse/TIKA-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-3005:
--
Attachment: resume_4.pdf
> Unintelligible text content from PDF file
>
Jorge Spinsanti created TIKA-3005:
-
Summary: Unintelligible text content from PDF file
Key: TIKA-3005
URL: https://issues.apache.org/jira/browse/TIKA-3005
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858901#comment-16858901
]
Jorge Spinsanti commented on TIKA-2835:
---
Currently, pdfbox-2.0.15 is available. See:
[
https://issues.apache.org/jira/browse/TIKA-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858540#comment-16858540
]
Jorge Spinsanti commented on TIKA-2834:
---
Hi all,
PDFBox reported a vulnerability in version 2.0.14.
[
https://issues.apache.org/jira/browse/TIKA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2842:
--
Description:
I got the following exception when I'm trying to convert PDF file to TXT:
[
https://issues.apache.org/jira/browse/TIKA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801659#comment-16801659
]
Jorge Spinsanti commented on TIKA-2842:
---
Related: https://issues.apache.org/jira/browse/PDFBOX-4495
[
https://issues.apache.org/jira/browse/TIKA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799314#comment-16799314
]
Jorge Spinsanti commented on TIKA-2842:
---
I can't attach to file to reproduce it due to
[
https://issues.apache.org/jira/browse/TIKA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2842:
--
Description:
I got the following exception when I'm trying to convert PDF file to TXT:
Jorge Spinsanti created TIKA-2842:
-
Summary: Expected number, actual=COSFloat
Key: TIKA-2842
URL: https://issues.apache.org/jira/browse/TIKA-2842
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070522#comment-16070522
]
Jorge Spinsanti commented on TIKA-2405:
---
Sure, we can more details about the use of Tika :D
>
[
https://issues.apache.org/jira/browse/TIKA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070465#comment-16070465
]
Jorge Spinsanti commented on TIKA-2407:
---
[
https://issues.apache.org/jira/browse/TIKA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070455#comment-16070455
]
Jorge Spinsanti commented on TIKA-2406:
---
IMHO, bad inputs (corrupt files) should be managed more
[
https://issues.apache.org/jira/browse/TIKA-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070440#comment-16070440
]
Jorge Spinsanti commented on TIKA-2404:
---
Yes, you are right again. We are applying your suggestion
[
https://issues.apache.org/jira/browse/TIKA-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070381#comment-16070381
]
Jorge Spinsanti commented on TIKA-2408:
---
Thanks a lot! The issue is not reproducible using SAX-based
[
https://issues.apache.org/jira/browse/TIKA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070379#comment-16070379
]
Jorge Spinsanti commented on TIKA-2405:
---
Thanks! As you commented, the issue is not reproduced with
[
https://issues.apache.org/jira/browse/TIKA-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2408:
--
Comment: was deleted
(was: Thank you for your reply.
Yes, I need help with tika-config.xml
[
https://issues.apache.org/jira/browse/TIKA-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070322#comment-16070322
]
Jorge Spinsanti commented on TIKA-2408:
---
Thank you for your reply.
Yes, I need help with
[
https://issues.apache.org/jira/browse/TIKA-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2408:
--
Attachment: ZipException.docx
> ZipException in text extraction from DOCX file
>
Jorge Spinsanti created TIKA-2408:
-
Summary: ZipException in text extraction from DOCX file
Key: TIKA-2408
URL: https://issues.apache.org/jira/browse/TIKA-2408
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070069#comment-16070069
]
Jorge Spinsanti commented on TIKA-2407:
---
Issue created in PDFBox project:
[
https://issues.apache.org/jira/browse/TIKA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2407:
--
Attachment: IOException.pdf
> Tika crashed while parsing corrupt PDF
>
Jorge Spinsanti created TIKA-2407:
-
Summary: Tika crashed while parsing corrupt PDF
Key: TIKA-2407
URL: https://issues.apache.org/jira/browse/TIKA-2407
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2407:
--
Description:
Tika throws an exception when try to parse a corrupt PDF file to extract text
[
https://issues.apache.org/jira/browse/TIKA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2406:
--
Description:
I got an IllegalArgumentException in text extraction from PDF file (attached):
[
https://issues.apache.org/jira/browse/TIKA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2406:
--
Attachment: IllegalArgumentException.pdf
> IllegalArgumentException in text extraction from PDF
Jorge Spinsanti created TIKA-2406:
-
Summary: IllegalArgumentException in text extraction from PDF file
Key: TIKA-2406
URL: https://issues.apache.org/jira/browse/TIKA-2406
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2405:
--
Attachment: SAXParseException.docx
> SAXParseException in text extraction from DOCX file
>
[
https://issues.apache.org/jira/browse/TIKA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2405:
--
Description:
I got SAXParseException in text extraction from DOCX file (see attachment):
{code}
Jorge Spinsanti created TIKA-2405:
-
Summary: SAXParseException in text extraction from DOCX file
Key: TIKA-2405
URL: https://issues.apache.org/jira/browse/TIKA-2405
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2404:
--
Description:
I got an XMLException when try to extract text from DOCX file (see attached
file):
[
https://issues.apache.org/jira/browse/TIKA-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2404:
--
Description:
I got an XMException when try to extract text from DOCX file:
{code}
Caused by:
[
https://issues.apache.org/jira/browse/TIKA-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2404:
--
Attachment: XmlException.docx
> XMLException in DOCX->TXT conversion
>
[
https://issues.apache.org/jira/browse/TIKA-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839709#comment-15839709
]
Jorge Spinsanti commented on TIKA-2251:
---
{quote}
Would your preference be to catch+log this exception
[
https://issues.apache.org/jira/browse/TIKA-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2251:
--
Attachment: ZipException.docx
> TIKA-198 due to java.util.zip.ZipException: invalid
Jorge Spinsanti created TIKA-2251:
-
Summary: TIKA-198 due to java.util.zip.ZipException: invalid
literal/lengths set
Key: TIKA-2251
URL: https://issues.apache.org/jira/browse/TIKA-2251
Project: Tika
Jorge Spinsanti created TIKA-2239:
-
Summary: Illegal IOException from
org.apache.tika.parser.microsoft.ooxml.OOXMLParser
Key: TIKA-2239
URL: https://issues.apache.org/jira/browse/TIKA-2239
Project:
[
https://issues.apache.org/jira/browse/TIKA-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795207#comment-15795207
]
Jorge Spinsanti commented on TIKA-2229:
---
Great. Thanks!
> NullPointerException at
>
[
https://issues.apache.org/jira/browse/TIKA-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2229:
--
Attachment: NPEatXWPFListManager#getFormattedNumber.docx
File to reproduce the bug.
>
[
https://issues.apache.org/jira/browse/TIKA-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-2229:
--
Labels: OOXML (was: )
> NullPointerException at
>
Jorge Spinsanti created TIKA-2229:
-
Summary: NullPointerException at
org.apache.tika.parser.microsoft.ooxml.XWPFListManager.getFormattedNumber(XWPFListManager.java:64)
Key: TIKA-2229
URL:
[
https://issues.apache.org/jira/browse/TIKA-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15778791#comment-15778791
]
Jorge Spinsanti commented on TIKA-2094:
---
Is it a workaround or a solution for Tika users? POI is a
[
https://issues.apache.org/jira/browse/TIKA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770044#comment-15770044
]
Jorge Spinsanti commented on TIKA-2225:
---
I created an issue on POI too:
Jorge Spinsanti created TIKA-2225:
-
Summary: Parse DOCX file due to NullPointerException on POI code
Key: TIKA-2225
URL: https://issues.apache.org/jira/browse/TIKA-2225
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134049#comment-15134049
]
Jorge Spinsanti commented on TIKA-1836:
---
Great news! Thanks for helping.
> Convertion DOC->TXT
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107212#comment-15107212
]
Jorge Spinsanti edited comment on TIKA-1836 at 1/19/16 7:08 PM:
POI issue
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107212#comment-15107212
]
Jorge Spinsanti commented on TIKA-1836:
---
POI issue was report in 2014-08-22. Perhaps if TIKA needs
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106919#comment-15106919
]
Jorge Spinsanti edited comment on TIKA-1836 at 1/19/16 7:04 PM:
POI is a
Jorge Spinsanti created TIKA-1836:
-
Summary: Convertion DOC->TXT failed due to POI issue
Key: TIKA-1836
URL: https://issues.apache.org/jira/browse/TIKA-1836
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-1836:
--
Component/s: parser
> Convertion DOC->TXT failed due to POI issue
>
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Spinsanti updated TIKA-1836:
--
Attachment: test.doc
File used to find the issue.
> Convertion DOC->TXT failed due to POI issue
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106919#comment-15106919
]
Jorge Spinsanti commented on TIKA-1836:
---
POI is a dependency of TIKA. I think TIKA can be evaluate to
59 matches
Mail list logo