[
https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135122#comment-17135122
]
Cristian Vat commented on TIKA-3008:
Opened PR with handling for basic use-cases and sample documents
[
https://issues.apache.org/jira/browse/TIKA-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043776#comment-17043776
]
Cristian Vat commented on TIKA-2837:
I guess this could be closed?
It serves as documentation, if
[
https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993842#comment-16993842
]
Cristian Vat commented on TIKA-3008:
Added parser test and sample documents to my branch.
Seems to
[
https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993418#comment-16993418
]
Cristian Vat commented on TIKA-3008:
Work-in-progress branch at
[
https://issues.apache.org/jira/browse/TIKA-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-2837:
---
Description:
I got a StackOverflowError while parsing a large PDF file using
ToHTMLContentHandler.
Cristian Vat created TIKA-2837:
--
Summary: Performance/Stability problem in ToHTMLContentHandler
Key: TIKA-2837
URL: https://issues.apache.org/jira/browse/TIKA-2837
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087367#comment-13087367
]
Cristian Vat commented on TIKA-683:
---
Thanks Mike for looking into the issues. I also know
[
https://issues.apache.org/jira/browse/TIKA-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080449#comment-13080449
]
Cristian Vat commented on TIKA-632:
---
Tika uses RTFEditorKit from javax.swing.text.rtf for
[
https://issues.apache.org/jira/browse/TIKA-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080470#comment-13080470
]
Cristian Vat commented on TIKA-666:
---
I checked the error in more detail, mostly to check
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080488#comment-13080488
]
Cristian Vat commented on TIKA-683:
---
I managed to take the original file and slim it down
[
https://issues.apache.org/jira/browse/TIKA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036337#comment-13036337
]
Cristian Vat commented on TIKA-642:
---
For the example file it seems like there's only extra
[
https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995489#comment-12995489
]
Cristian Vat commented on TIKA-469:
---
Possible this is a problem with the PDF or
[
https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995489#comment-12995489
]
Cristian Vat edited comment on TIKA-469 at 2/16/11 8:05 PM:
[
https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995516#comment-12995516
]
Cristian Vat commented on TIKA-469:
---
Also tested the Word file parsing and it looks ok.
[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921110#action_12921110
]
Cristian Vat commented on TIKA-422:
---
Anyone mind looking over the patch so far?
It seems to
[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-422:
--
Attachment: RTFParser.patch
Attached updated version of the patch.
Changed isUnicode to include
[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920365#action_12920365
]
Cristian Vat commented on TIKA-422:
---
Clarification:
The previous patch added some extra
17 matches
Mail list logo