[jira] [Comment Edited] (TIKA-2892) ForkParser deadlock when InputStreamResource catches/returns IOException

2019-06-06 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857643#comment-16857643 ] Luis Filipe Nassif edited comment on TIKA-2892 at 6/6/19 1:36 PM: -- I am

[jira] [Comment Edited] (TIKA-2892) ForkParser deadlock when InputStreamResource catches/returns IOException

2019-06-06 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857643#comment-16857643 ] Luis Filipe Nassif edited comment on TIKA-2892 at 6/6/19 1:35 PM: -- I am

[jira] [Commented] (TIKA-2892) ForkParser deadlock when InputStreamResource catches/returns IOException

2019-06-06 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857643#comment-16857643 ] Luis Filipe Nassif commented on TIKA-2892: -- I am not able to push the fix to github. Could

[jira] [Created] (TIKA-2892) ForkParser deadlock when InputStreamResource catches/returns IOException

2019-06-05 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2892: Summary: ForkParser deadlock when InputStreamResource catches/returns IOException Key: TIKA-2892 URL: https://issues.apache.org/jira/browse/TIKA-2892

[jira] [Commented] (TIKA-2883) Text not extracted from RTF files

2019-05-29 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851030#comment-16851030 ] Luis Filipe Nassif commented on TIKA-2883: -- Thank you, [~talli...@apache.org]! Worked with all

[jira] [Commented] (TIKA-2883) Text not extracted from RTF files

2019-05-29 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850952#comment-16850952 ] Luis Filipe Nassif commented on TIKA-2883: -- Great! Will test your fix with my other files when it

[jira] [Comment Edited] (TIKA-2883) Text not extracted from RTF files

2019-05-29 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850930#comment-16850930 ] Luis Filipe Nassif edited comment on TIKA-2883 at 5/29/19 3:12 PM: ---

[jira] [Commented] (TIKA-2883) Text not extracted from RTF files

2019-05-29 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850930#comment-16850930 ] Luis Filipe Nassif commented on TIKA-2883: -- Thanks for taking a look, [~talli...@apache.org]!

[jira] [Commented] (TIKA-2883) Text not extracted from RTF files

2019-05-28 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850186#comment-16850186 ] Luis Filipe Nassif commented on TIKA-2883: --

[jira] [Commented] (TIKA-2883) Text not extracted from RTF files

2019-05-28 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850180#comment-16850180 ] Luis Filipe Nassif commented on TIKA-2883: -- hum... tried to extract embedded items from rtf, no

[jira] [Updated] (TIKA-2883) Text not extracted from RTF files

2019-05-28 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2883: - Affects Version/s: 1.20 1.19.1 > Text not extracted from RTF files

[jira] [Updated] (TIKA-2883) Text not extracted from RTF files

2019-05-28 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2883: - Description: I have a number of RTF files (extracted from PST email bodies) which text is

[jira] [Created] (TIKA-2883) Text not extracted from RTF files

2019-05-28 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2883: Summary: Text not extracted from RTF files Key: TIKA-2883 URL: https://issues.apache.org/jira/browse/TIKA-2883 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2814) Extracted content of EML file contains words like "FONT-SIZE: 9pt; FONT-FAMILY: arial"

2019-01-16 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744361#comment-16744361 ] Luis Filipe Nassif commented on TIKA-2814: -- Sorry I disagree changing to text/plain by default.

[jira] [Commented] (TIKA-2765) Regression extracting text from corrupted docx files

2018-12-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724891#comment-16724891 ] Luis Filipe Nassif commented on TIKA-2765: -- Thank you, guys! I am on vacation now and can not

[jira] [Commented] (TIKA-2550) ToTextHandler includes element content

2018-12-03 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707113#comment-16707113 ] Luis Filipe Nassif commented on TIKA-2550: -- Sorry for late reply, [~talli...@apache.org]. Will it

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-11-22 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696034#comment-16696034 ] Luis Filipe Nassif commented on TIKA-2749: -- I don't do that. I thought you questioned if doing

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-11-21 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695351#comment-16695351 ] Luis Filipe Nassif commented on TIKA-2749: -- Hi [~rleir]. Sorry, I meant our main goal when OCRing

[jira] [Commented] (TIKA-2785) Switch parent/child IPC to mmap file from stdout/stderr in tika-server

2018-11-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692508#comment-16692508 ] Luis Filipe Nassif commented on TIKA-2785: -- Hi [~talli...@apache.org], just to confirm ForkParser

[jira] [Commented] (TIKA-2765) Regression extracting text from corrupted docx files

2018-11-05 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675281#comment-16675281 ] Luis Filipe Nassif commented on TIKA-2765: -- POI-62886 created. Thanks [~talli...@apache.org] and

[jira] [Commented] (TIKA-2765) Regression extracting text from corrupted docx files

2018-10-24 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662332#comment-16662332 ] Luis Filipe Nassif commented on TIKA-2765: -- This error is thrown in about 2% of docx files from

[jira] [Commented] (TIKA-2765) Regression extracting text from corrupted docx files

2018-10-24 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662312#comment-16662312 ] Luis Filipe Nassif commented on TIKA-2765: -- [~talli...@apache.org], [~gagravarr], poi-4.0.0

[jira] [Commented] (TIKA-2765) Regression extracting text from corrupted docx files

2018-10-24 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662307#comment-16662307 ] Luis Filipe Nassif commented on TIKA-2765: -- Example file attached. > Regression extracting text

[jira] [Updated] (TIKA-2765) Regression extracting text from corrupted docx files

2018-10-24 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2765: - Attachment: DX IMPORTADORA E EXPORTADORA LTDA.docx > Regression extracting text from

[jira] [Created] (TIKA-2765) Regression extracting text from corrupted docx files

2018-10-24 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2765: Summary: Regression extracting text from corrupted docx files Key: TIKA-2765 URL: https://issues.apache.org/jira/browse/TIKA-2765 Project: Tika

[jira] [Comment Edited] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-10-06 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640724#comment-16640724 ] Luis Filipe Nassif edited comment on TIKA-2749 at 10/6/18 1:39 PM: --- Hi

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-10-06 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640724#comment-16640724 ] Luis Filipe Nassif commented on TIKA-2749: -- Hi [~talli...@apache.org], Yes, currently we run ocr

[jira] [Commented] (TIKA-2473) PCX and DCX image support

2018-10-02 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636013#comment-16636013 ] Luis Filipe Nassif commented on TIKA-2473: -- Hi [~mcaruanagalizia], I think jbig2 is handled

[jira] [Commented] (TIKA-2671) HtmlEncodingDetector doesnt take provided metadata into account

2018-06-15 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514326#comment-16514326 ] Luis Filipe Nassif commented on TIKA-2671: -- Haven't step 2 caused problems in the past? >

[jira] [Commented] (TIKA-2653) Allow users to specify a directory of jars for classloading in ForkParser

2018-05-26 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491866#comment-16491866 ] Luis Filipe Nassif commented on TIKA-2653: -- +1! I will try to take a look, Tim, but unfortunatelly

[jira] [Commented] (TIKA-2646) Tika parse["content"] returns jumbled text across cells of a table in a pdf

2018-05-22 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486646#comment-16486646 ] Luis Filipe Nassif commented on TIKA-2646: -- It does not maintain table structures, but have you

[jira] [Commented] (TIKA-2624) Rendering PDFs for OCR with Tesseract uses different DPI than claimed

2018-04-02 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423041#comment-16423041 ] Luis Filipe Nassif commented on TIKA-2624: -- Wow, that is a major bug, not sure when it was

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-04-02 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422231#comment-16422231 ] Luis Filipe Nassif commented on TIKA-2620: -- Hi [~tilman]. When printing PDFs to images before OCR,

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-30 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420423#comment-16420423 ] Luis Filipe Nassif commented on TIKA-2620: -- Maybe we should add another option to allow

[jira] [Resolved] (TIKA-879) Detection problem: message/rfc822 file is detected as text/plain.

2018-03-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif resolved TIKA-879. - Resolution: Duplicate Fix Version/s: 1.18 2.0 Fixed by commits

[jira] [Commented] (TIKA-2338) Change Scope of Jai-ImageIO-Core dependency

2018-03-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392036#comment-16392036 ] Luis Filipe Nassif commented on TIKA-2338: -- Sorry and thank you [~talli...@mitre.org]! > Change

[jira] [Commented] (TIKA-2603) application/x-iso9660-image extraktion

2018-03-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391700#comment-16391700 ] Luis Filipe Nassif commented on TIKA-2603: -- Currently I have a parser via sevenzipjbinding, it

[jira] [Resolved] (TIKA-2338) Change Scope of Jai-ImageIO-Core dependency

2018-03-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif resolved TIKA-2338. -- Resolution: Fixed Fix Version/s: (was: 1.17) 1.18 > Change

[jira] [Assigned] (TIKA-2338) Change Scope of Jai-ImageIO-Core dependency

2018-03-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif reassigned TIKA-2338: Assignee: Luis Filipe Nassif > Change Scope of Jai-ImageIO-Core dependency >

[jira] [Resolved] (TIKA-2568) Full encrypted 7Z file not detected as such

2018-03-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif resolved TIKA-2568. -- Resolution: Fixed > Full encrypted 7Z file not detected as such >

[jira] [Updated] (TIKA-2568) Full encrypted 7Z file not detected as such

2018-03-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2568: - Fix Version/s: 1.18 2.0 > Full encrypted 7Z file not detected as such >

[jira] [Commented] (TIKA-2594) Mail detected as application/xhtml+xml

2018-03-07 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390028#comment-16390028 ] Luis Filipe Nassif commented on TIKA-2594: -- We have used that magic restricted to 0:1000 for a

[jira] [Commented] (TIKA-1466) Enable overriding of mimetype glob pattern definitions

2018-03-07 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389938#comment-16389938 ] Luis Filipe Nassif commented on TIKA-1466: -- I thought about logging any custom-mimetype override

[jira] [Commented] (TIKA-2585) TikaInputStream support for resetting via a factory of InputStreams

2018-02-28 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380765#comment-16380765 ] Luis Filipe Nassif commented on TIKA-2585: -- Hi [~gagravarr], I don't know. I think we can create

[jira] [Commented] (TIKA-2591) Some tiffs (Big Endian with fax compression) are showing up as x-tarr

2018-02-28 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380159#comment-16380159 ] Luis Filipe Nassif commented on TIKA-2591: -- Hum sorry. The higher the number, higher the priority.

[jira] [Commented] (TIKA-2591) Some tiffs (Big Endian with fax compression) are showing up as x-tarr

2018-02-27 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379402#comment-16379402 ] Luis Filipe Nassif commented on TIKA-2591: -- If we increase the magic priority of tiff to be

[jira] [Commented] (TIKA-1466) Enable overriding of mimetype glob pattern definitions

2018-02-26 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377797#comment-16377797 ] Luis Filipe Nassif commented on TIKA-1466: -- We have hit this again. We encountered some MTS videos

[jira] [Commented] (TIKA-2578) Mails not recognized when unknown X-headers are present

2018-02-26 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377788#comment-16377788 ] Luis Filipe Nassif commented on TIKA-2578: -- Hi [~talli...@mitre.org] I do not like too much

[jira] [Commented] (TIKA-2568) Full encrypted 7Z file not detected as such

2018-02-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357142#comment-16357142 ] Luis Filipe Nassif commented on TIKA-2568: -- I am not able to assign this to me. I need some

[jira] [Created] (TIKA-2568) Full encrypted 7Z file not detected as such

2018-02-08 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2568: Summary: Full encrypted 7Z file not detected as such Key: TIKA-2568 URL: https://issues.apache.org/jira/browse/TIKA-2568 Project: Tika Issue Type:

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2018-02-02 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350504#comment-16350504 ] Luis Filipe Nassif commented on TIKA-1599: -- Hi [~talli...@mitre.org], Moving to DOM could lead to

[jira] [Commented] (TIKA-2546) com.pff:java-libpst is branch EOL

2018-01-12 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324009#comment-16324009 ] Luis Filipe Nassif commented on TIKA-2546: -- That version has Outlook OST 2013 format support,

[jira] [Comment Edited] (TIKA-2471) Tab-prefixed message body lines in Mbox interpreted as headers

2017-10-17 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208769#comment-16208769 ] Luis Filipe Nassif edited comment on TIKA-2471 at 10/18/17 3:47 AM: Hi

[jira] [Commented] (TIKA-2471) Tab-prefixed message body lines in Mbox interpreted as headers

2017-10-17 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208775#comment-16208775 ] Luis Filipe Nassif commented on TIKA-2471: -- Also, the tracking metadata feature was added before

[jira] [Commented] (TIKA-2471) Tab-prefixed message body lines in Mbox interpreted as headers

2017-10-17 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208769#comment-16208769 ] Luis Filipe Nassif commented on TIKA-2471: -- Hi Matthew, If I remember correctly, some headers

[jira] [Commented] (TIKA-2478) MBOX import includes redundant copies of the text

2017-10-17 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208562#comment-16208562 ] Luis Filipe Nassif commented on TIKA-2478: -- Robert, related to your last suggestion, I think

[jira] [Commented] (TIKA-2478) MBOX import includes redundant copies of the text

2017-10-17 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208554#comment-16208554 ] Luis Filipe Nassif commented on TIKA-2478: -- Although I have seen in the past emls with very

[jira] [Commented] (TIKA-2469) False positives with x-ms-owner detection

2017-10-13 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203625#comment-16203625 ] Luis Filipe Nassif commented on TIKA-2469: -- Thanks [~talli...@apache.org]! > False positives with

[jira] [Updated] (TIKA-2469) False positives with x-ms-owner detection

2017-09-18 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2469: - Description: Attached windows system files are incorrectly detected as

[jira] [Updated] (TIKA-2469) False positives with x-ms-owner detection

2017-09-18 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2469: - Attachment: x86_microsoft-windows-i..tional-codepage-870_31bf38.nls_c0c54318

[jira] [Created] (TIKA-2469) False positives with x-ms-owner detection

2017-09-18 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2469: Summary: False positives with x-ms-owner detection Key: TIKA-2469 URL: https://issues.apache.org/jira/browse/TIKA-2469 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2460) Possibility to add custom-mimetypes.xml (and/or also other configuration files) from location outside classpath

2017-09-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159012#comment-16159012 ] Luis Filipe Nassif commented on TIKA-2460: -- Looks like I have no permission to close issues not

[jira] [Commented] (TIKA-2460) Possibility to add custom-mimetypes.xml (and/or also other configuration files) from location outside classpath

2017-09-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158709#comment-16158709 ] Luis Filipe Nassif commented on TIKA-2460: -- For sure! Will do the adjustments. Thanks! >

[jira] [Updated] (TIKA-2460) Possibility to add custom-mimetypes.xml (and/or also other configuration files) from location outside classpath

2017-09-08 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2460: - Attachment: TIKA-2460.patch Final version of the patch, do you have any recommendations

[jira] [Commented] (TIKA-2460) Possibility to add custom-mimetypes.xml (and/or also other configuration files) from location outside classpath

2017-09-07 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157364#comment-16157364 ] Luis Filipe Nassif commented on TIKA-2460: -- The patch is missing a null check. Will add together

[jira] [Updated] (TIKA-2460) Possibility to add custom-mimetypes.xml (and/or also other configuration files) from location outside classpath

2017-09-07 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2460: - Attachment: TIKA-2460.patch Draft of the patch. Will write unit test after review >

[jira] [Commented] (TIKA-2460) Possibility to add custom-mimetypes.xml (and/or also other configuration files) from location outside classpath

2017-09-07 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157137#comment-16157137 ] Luis Filipe Nassif commented on TIKA-2460: -- Hi [~gagravarr], I also had this requirement in the

[jira] [Resolved] (TIKA-2456) Emails extracted from MBOX not detected as rfc822

2017-08-31 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif resolved TIKA-2456. -- Resolution: Fixed Fixed in r560e91a > Emails extracted from MBOX not detected as rfc822

[jira] [Issue Comment Deleted] (TIKA-2456) Emails extracted from MBOX not detected as rfc822

2017-08-31 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2456: - Comment: was deleted (was: Fixed in r560e91a) > Emails extracted from MBOX not detected

[jira] [Commented] (TIKA-2456) Emails extracted from MBOX not detected as rfc822

2017-08-31 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149223#comment-16149223 ] Luis Filipe Nassif commented on TIKA-2456: -- Fixed in r560e91a > Emails extracted from MBOX not

[jira] [Updated] (TIKA-2456) Emails extracted from MBOX not detected as rfc822

2017-08-31 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2456: - Attachment: single_mail.mbox File to unit test > Emails extracted from MBOX not detected

[jira] [Created] (TIKA-2456) Emails extracted from MBOX not detected as rfc822

2017-08-31 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2456: Summary: Emails extracted from MBOX not detected as rfc822 Key: TIKA-2456 URL: https://issues.apache.org/jira/browse/TIKA-2456 Project: Tika Issue

[jira] [Commented] (TIKA-2454) Emails extracted from PSTs detected as unexpected file types

2017-08-30 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148251#comment-16148251 ] Luis Filipe Nassif commented on TIKA-2454: -- Those look like 4 variants of the (container) mbox

[jira] [Commented] (TIKA-2454) Emails extracted from PSTs detected as unexpected file types

2017-08-30 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148079#comment-16148079 ] Luis Filipe Nassif commented on TIKA-2454: -- [~talli...@apache.org], are you the flash? I was

[jira] [Commented] (TIKA-2450) OfficeParser.parse called for zero-byte file with .doc extension

2017-08-30 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147874#comment-16147874 ] Luis Filipe Nassif commented on TIKA-2450: -- Late to the party... In forensic field, it is very

[jira] [Comment Edited] (TIKA-2443) Plain text file identified as rfc822 and which can cause StackOverflowError

2017-08-23 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138378#comment-16138378 ] Luis Filipe Nassif edited comment on TIKA-2443 at 8/23/17 2:00 PM: ---

[jira] [Commented] (TIKA-2443) Plain text file identified as rfc822 and which can cause StackOverflowError

2017-08-23 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138378#comment-16138378 ] Luis Filipe Nassif commented on TIKA-2443: -- Currently no. How are you using Tika? If you are using

[jira] [Commented] (TIKA-2430) Add at least dev test capability to run Tika against corrupted files in our test suite

2017-07-14 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087451#comment-16087451 ] Luis Filipe Nassif commented on TIKA-2430: -- Awesome [~talli...@apache.org], you rocks! For sure

[jira] [Commented] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-13 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086318#comment-16086318 ] Luis Filipe Nassif commented on TIKA-2428: -- That would be very nice! > EMFParser loops forever

[jira] [Commented] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-13 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085965#comment-16085965 ] Luis Filipe Nassif commented on TIKA-2428: -- bq. If bytes skipped is more than requested, we've hit

[jira] [Commented] (TIKA-2042) MBOX file detected wrongly as text/html

2017-07-13 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085883#comment-16085883 ] Luis Filipe Nassif commented on TIKA-2042: -- See Tika-879. Looks like widening the magic search

[jira] [Commented] (TIKA-2042) MBOX file detected wrongly as text/html

2017-07-13 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085880#comment-16085880 ] Luis Filipe Nassif commented on TIKA-2042: -- This problem is very very recurrent. I think we should

[jira] [Commented] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-13 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085852#comment-16085852 ] Luis Filipe Nassif commented on TIKA-2428: -- Strange, I don't think the javadocs allow that. Maybe

[jira] [Commented] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-13 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085665#comment-16085665 ] Luis Filipe Nassif commented on TIKA-2428: -- I just put the stacktrace, you found the cause. But

[jira] [Commented] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-12 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085128#comment-16085128 ] Luis Filipe Nassif commented on TIKA-2428: -- Seems like the issue is at POI level. Threads are

[jira] [Updated] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-12 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2428: - Affects Version/s: 1.16 > EMFParser loops forever with corrupted files >

[jira] [Updated] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-12 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2428: - Description: EMFParser hangs with the attached corrupted EMF files. Sorry

[jira] [Updated] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-12 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2428: - Affects Version/s: (was: 1.16) 1.15 > EMFParser loops forever

[jira] [Updated] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-12 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2428: - Attachment: Carved-912866.emf Carved-1285676.emf

[jira] [Created] (TIKA-2428) EMFParser loops forever with corrupted files

2017-07-12 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2428: Summary: EMFParser loops forever with corrupted files Key: TIKA-2428 URL: https://issues.apache.org/jira/browse/TIKA-2428 Project: Tika Issue Type:

[jira] [Commented] (TIKA-2338) Change Scope of Jai-ImageIO-Core dependency

2017-07-07 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078311#comment-16078311 ] Luis Filipe Nassif commented on TIKA-2338: -- As a side note, seems like Oracle will integrate jai,

[jira] [Comment Edited] (TIKA-2419) Try HTML mime magic on broken XML files

2017-07-05 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075006#comment-16075006 ] Luis Filipe Nassif edited comment on TIKA-2419 at 7/5/17 4:03 PM: -- Hi

[jira] [Commented] (TIKA-2419) Try HTML mime magic on broken XML files

2017-07-05 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075006#comment-16075006 ] Luis Filipe Nassif commented on TIKA-2419: -- Hi Nick, The original issue of eml(x) being detected

[jira] [Commented] (TIKA-2415) Upgrade libpst to 0.9.3

2017-07-03 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072842#comment-16072842 ] Luis Filipe Nassif commented on TIKA-2415: -- Hi Tim, Not sure if we should update for now.

[jira] [Commented] (TIKA-2402) Support all image formats in Object Recognition REST Parser

2017-06-30 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070154#comment-16070154 ] Luis Filipe Nassif commented on TIKA-2402: -- duh... sorry, I did not see the "REST" in title, the

[jira] [Commented] (TIKA-2402) Support all image formats in Object Recognition REST Parser

2017-06-29 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068731#comment-16068731 ] Luis Filipe Nassif commented on TIKA-2402: -- Hi [~ThejanWijesinghe], looks like DataVec

[jira] [Commented] (TIKA-2394) Unknown message type: IPM.Note.Rules.OofTemplate.Microsoft

2017-06-15 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050773#comment-16050773 ] Luis Filipe Nassif commented on TIKA-2394: -- Seems like the current version of java-libpst still

[jira] [Commented] (TIKA-2394) Unknown message type: IPM.Note.Rules.OofTemplate.Microsoft

2017-06-15 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050723#comment-16050723 ] Luis Filipe Nassif commented on TIKA-2394: -- I tested java-libpst 0.9.4 some weeks ago, because in

[jira] [Commented] (TIKA-2394) Unknown message type: IPM.Note.Rules.OofTemplate.Microsoft

2017-06-15 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050682#comment-16050682 ] Luis Filipe Nassif commented on TIKA-2394: -- I think you can declare directly the new java-libpst

[jira] [Created] (TIKA-2390) Extract images embedded in Html

2017-06-09 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2390: Summary: Extract images embedded in Html Key: TIKA-2390 URL: https://issues.apache.org/jira/browse/TIKA-2390 Project: Tika Issue Type: Improvement

  1   2   3   4   >