[jira] [Commented] (TIKA-3008) Word Doc/Docx Formatting Extraction - Superscript/Subscript

2020-06-14 Thread Cristian Vat (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135122#comment-17135122 ] Cristian Vat commented on TIKA-3008: Opened PR with handling for basic use-cases and sample documents

[jira] [Commented] (TIKA-2837) Performance/Stability problem in ToHTMLContentHandler

2020-02-24 Thread Cristian Vat (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043776#comment-17043776 ] Cristian Vat commented on TIKA-2837: I guess this could be closed?   It serves as documentation, if

[jira] [Commented] (TIKA-3008) Word Doc/Docx Formatting Extraction - Superscript/Subscript

2019-12-11 Thread Cristian Vat (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993842#comment-16993842 ] Cristian Vat commented on TIKA-3008: Added parser test and sample documents to my branch. Seems to

[jira] [Commented] (TIKA-3008) Word Doc/Docx Formatting Extraction - Superscript/Subscript

2019-12-11 Thread Cristian Vat (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993418#comment-16993418 ] Cristian Vat commented on TIKA-3008: Work-in-progress branch at

[jira] [Updated] (TIKA-2837) Performance/Stability problem in ToHTMLContentHandler

2019-03-02 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristian Vat updated TIKA-2837: --- Description: I got a StackOverflowError while parsing a large PDF file using ToHTMLContentHandler.

[jira] [Created] (TIKA-2837) Performance/Stability problem in ToHTMLContentHandler

2019-03-02 Thread Cristian Vat (JIRA)
Cristian Vat created TIKA-2837: -- Summary: Performance/Stability problem in ToHTMLContentHandler Key: TIKA-2837 URL: https://issues.apache.org/jira/browse/TIKA-2837 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-683) RTF Parser issues with non european characters

2011-08-18 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087367#comment-13087367 ] Cristian Vat commented on TIKA-683: --- Thanks Mike for looking into the issues. I also know

[jira] [Commented] (TIKA-632) Rtf parsing ignores links

2011-08-06 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080449#comment-13080449 ] Cristian Vat commented on TIKA-632: --- Tika uses RTFEditorKit from javax.swing.text.rtf for

[jira] [Commented] (TIKA-666) Unable to extract content from RTF files

2011-08-06 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080470#comment-13080470 ] Cristian Vat commented on TIKA-666: --- I checked the error in more detail, mostly to check

[jira] [Commented] (TIKA-683) RTF Parser issues with non european characters

2011-08-06 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080488#comment-13080488 ] Cristian Vat commented on TIKA-683: --- I managed to take the original file and slim it down

[jira] [Commented] (TIKA-642) Few of RTF files not extracting properly

2011-05-19 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036337#comment-13036337 ] Cristian Vat commented on TIKA-642: --- For the example file it seems like there's only extra

[jira] Commented: (TIKA-469) The Parser is not correctly outputting Arabic text documents

2011-02-16 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995489#comment-12995489 ] Cristian Vat commented on TIKA-469: --- Possible this is a problem with the PDF or

[jira] Issue Comment Edited: (TIKA-469) The Parser is not correctly outputting Arabic text documents

2011-02-16 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995489#comment-12995489 ] Cristian Vat edited comment on TIKA-469 at 2/16/11 8:05 PM:

[jira] Commented: (TIKA-469) The Parser is not correctly outputting Arabic text documents

2011-02-16 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995516#comment-12995516 ] Cristian Vat commented on TIKA-469: --- Also tested the Word file parsing and it looks ok.

[jira] Commented: (TIKA-422) Wrong charset conversion in some RTF documents.

2010-10-14 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921110#action_12921110 ] Cristian Vat commented on TIKA-422: --- Anyone mind looking over the patch so far? It seems to

[jira] Updated: (TIKA-422) Wrong charset conversion in some RTF documents.

2010-10-12 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristian Vat updated TIKA-422: -- Attachment: RTFParser.patch Attached updated version of the patch. Changed isUnicode to include

[jira] Commented: (TIKA-422) Wrong charset conversion in some RTF documents.

2010-10-12 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920365#action_12920365 ] Cristian Vat commented on TIKA-422: --- Clarification: The previous patch added some extra