[jira] [Updated] (TIKA-1896) Invalid closing script tag not handled gracefully by HtmlParser

2016-03-08 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-1896: -- Attachment: test.html > Invalid closing script tag not handled gracefully by

[jira] [Comment Edited] (TIKA-1896) Invalid closing script tag not handled gracefully by HtmlParser

2016-03-08 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184969#comment-15184969 ] Matthew Caruana Galizia edited comment on TIKA-1896 at 3/8/16 2:33 PM:

[jira] [Created] (TIKA-2274) and metadata collision

2017-02-23 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2274: - Summary: and metadata collision Key: TIKA-2274 URL: https://issues.apache.org/jira/browse/TIKA-2274 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-2245) Standardise on java.util.Logging

2017-01-19 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2245: - Summary: Standardise on java.util.Logging Key: TIKA-2245 URL: https://issues.apache.org/jira/browse/TIKA-2245 Project: Tika Issue Type:

[jira] [Commented] (TIKA-2245) Standardise logging

2017-01-19 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829856#comment-15829856 ] Matthew Caruana Galizia commented on TIKA-2245: --- So should we agree that parsers should use

[jira] [Updated] (TIKA-2245) Standardise logging

2017-01-19 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2245: -- Description: Tika parsers sometimes use Log4j's Logger, sometimes the JUL

[jira] [Commented] (TIKA-2232) Add JBIG2 image parsing support

2017-01-16 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823691#comment-15823691 ] Matthew Caruana Galizia commented on TIKA-2232: --- Could we at least log a warning once when

[jira] [Commented] (TIKA-2274) and metadata collision

2017-02-28 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887559#comment-15887559 ] Matthew Caruana Galizia commented on TIKA-2274: --- Thanks fot checking up on this. Try

[jira] [Updated] (TIKA-2280) message_from not extracted from Outlook emails

2017-02-28 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2280: -- Description: While the MESSAGE_FROM metadata field is extracted for both RFC and

[jira] [Created] (TIKA-2280) message_from not extracted from Outlook emails

2017-02-28 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2280: - Summary: message_from not extracted from Outlook emails Key: TIKA-2280 URL: https://issues.apache.org/jira/browse/TIKA-2280 Project: Tika

[jira] [Commented] (TIKA-2280) message_from not extracted from Outlook emails

2017-02-28 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889192#comment-15889192 ] Matthew Caruana Galizia commented on TIKA-2280: --- OK, so this is a duplicate then. Your

[jira] [Commented] (TIKA-2235) Use Tesseract's recommended DPI for PDF images

2017-03-01 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890106#comment-15890106 ] Matthew Caruana Galizia commented on TIKA-2235: --- In the majority of cases, JPEG, JBIG2

[jira] [Commented] (TIKA-2235) Use Tesseract's recommended DPI for PDF images

2017-03-01 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890324#comment-15890324 ] Matthew Caruana Galizia commented on TIKA-2235: --- Ah, good catch. OCR'ing inline. > Use

[jira] [Commented] (TIKA-2167) Image processing causes OCR to fail

2016-11-07 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644455#comment-15644455 ] Matthew Caruana Galizia commented on TIKA-2167: --- [~talli...@mitre.org] to replicate the

[jira] [Created] (TIKA-2167) Image processing causes OCR to fail

2016-11-06 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2167: - Summary: Image processing causes OCR to fail Key: TIKA-2167 URL: https://issues.apache.org/jira/browse/TIKA-2167 Project: Tika Issue Type:

[jira] [Updated] (TIKA-2167) Image processing causes OCR to fail

2016-11-06 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2167: -- Attachment: simple.tiff > Image processing causes OCR to fail >

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650892#comment-15650892 ] Matthew Caruana Galizia commented on TIKA-2174: --- Both on inline and independent files. I've

[jira] [Created] (TIKA-2174) JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser

2016-11-09 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2174: - Summary: JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser Key: TIKA-2174 URL: https://issues.apache.org/jira/browse/TIKA-2174

[jira] [Updated] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2174: -- Description: A complete install of Leptonica with Tesseract will add support for

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651347#comment-15651347 ] Matthew Caruana Galizia commented on TIKA-2174: --- That issue went away once I added 'jp2' and

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-10 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653430#comment-15653430 ] Matthew Caruana Galizia commented on TIKA-2174: --- Thank you! I've also confirmed that

[jira] [Commented] (TIKA-2175) Enable extraction of inlined jp2/jpx from PDF

2016-11-10 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653441#comment-15653441 ] Matthew Caruana Galizia commented on TIKA-2175: --- I've filed [an

[jira] [Commented] (TIKA-1896) Invalid closing script tag not handled gracefully by HtmlParser

2016-11-05 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638978#comment-15638978 ] Matthew Caruana Galizia commented on TIKA-1896: --- [~talli...@mitre.org] did you ever run your

[jira] [Commented] (TIKA-1896) Invalid closing script tag not handled gracefully by HtmlParser

2016-11-14 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663830#comment-15663830 ] Matthew Caruana Galizia commented on TIKA-1896: --- Perhaps we should push ahead with Jsoup

[jira] [Commented] (TIKA-2175) Enable extraction of inlined jp2/jpx from PDF

2016-11-25 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696377#comment-15696377 ] Matthew Caruana Galizia commented on TIKA-2175: --- Still no joy, both with my bridge classes

[jira] [Commented] (TIKA-2175) Enable extraction of inlined jp2/jpx from PDF

2016-11-28 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701919#comment-15701919 ] Matthew Caruana Galizia commented on TIKA-2175: --- The problem was OpenCL support in Tesseract.

[jira] [Created] (TIKA-2235) Use Tesseract's recommended DPI for PDF images

2017-01-11 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2235: - Summary: Use Tesseract's recommended DPI for PDF images Key: TIKA-2235 URL: https://issues.apache.org/jira/browse/TIKA-2235 Project: Tika

[jira] [Commented] (TIKA-2235) Use Tesseract's recommended DPI for PDF images

2017-01-11 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818176#comment-15818176 ] Matthew Caruana Galizia commented on TIKA-2235: --- Yes, I am already! Thanks for linking me to

[jira] [Created] (TIKA-2221) poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException

2016-12-20 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2221: - Summary: poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException Key: TIKA-2221 URL:

[jira] [Commented] (TIKA-1195) XLSB support

2017-03-16 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929105#comment-15929105 ] Matthew Caruana Galizia commented on TIKA-1195: --- [~talli...@mitre.org] d'you reckon that will

[jira] [Closed] (TIKA-2280) message_from not extracted from Outlook emails

2017-03-01 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia closed TIKA-2280. - Resolution: Duplicate > message_from not extracted from Outlook emails >

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2017-03-01 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890800#comment-15890800 ] Matthew Caruana Galizia commented on TIKA-1865: --- Thank you, this is a big improvement. >

[jira] [Commented] (TIKA-2436) Support for GZIP-compressed EMF files

2017-07-29 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106136#comment-16106136 ] Matthew Caruana Galizia commented on TIKA-2436: --- To give you an example of why this is a

[jira] [Commented] (TIKA-2436) Support for GZIP-compressed EMF files

2017-07-29 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106135#comment-16106135 ] Matthew Caruana Galizia commented on TIKA-2436: --- The difference is that the file is a treated

[jira] [Updated] (TIKA-2436) Support for GZIP-compressed EMF files

2017-07-28 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2436: -- Attachment: image004.emz Example EMZ file attached. Common Compress will yield

[jira] [Commented] (TIKA-2042) MBOX file detected wrongly as text/html

2017-07-13 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085709#comment-16085709 ] Matthew Caruana Galizia commented on TIKA-2042: --- I'd like to ask for this issue to be

[jira] [Commented] (TIKA-2042) MBOX file detected wrongly as text/html

2017-07-13 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085842#comment-16085842 ] Matthew Caruana Galizia commented on TIKA-2042: --- [~gagravarr] thank you - that fixes the

[jira] [Updated] (TIKA-2042) MBOX file detected wrongly as text/html

2017-07-13 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2042: -- Attachment: mbox_email_section.txt Sample of one of the message sections from

[jira] [Updated] (TIKA-879) Detection problem: message/rfc822 file is detected as text/plain.

2017-07-13 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-879: - Attachment: mbox_email_section.txt As described in TIKA-2042, the attached file

[jira] [Comment Edited] (TIKA-2042) MBOX file detected wrongly as text/html

2017-07-13 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085847#comment-16085847 ] Matthew Caruana Galizia edited comment on TIKA-2042 at 7/13/17 3:13 PM:

[jira] [Updated] (TIKA-2042) MBOX file detected wrongly as text/html

2017-07-13 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2042: -- Attachment: mbox_header.txt Header attached with identifying information

[jira] [Comment Edited] (TIKA-2042) MBOX file detected wrongly as text/html

2017-07-13 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085709#comment-16085709 ] Matthew Caruana Galizia edited comment on TIKA-2042 at 7/13/17 2:22 PM:

[jira] [Commented] (TIKA-2399) Version conflict with non-ASL jai-imageio-jpeg2000 and edu.ucar jj2000

2017-07-07 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078012#comment-16078012 ] Matthew Caruana Galizia commented on TIKA-2399: --- OK. I can't think of any other option for

[jira] [Commented] (TIKA-2399) Version conflict with non-ASL jai-imageio-jpeg2000 and edu.ucar jj2000

2017-07-04 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073660#comment-16073660 ] Matthew Caruana Galizia commented on TIKA-2399: --- Wouldn't it be better to warn? (Option 2 in

[jira] [Commented] (TIKA-2399) Version conflict with non-ASL jai-imageio-jpeg2000 and edu.ucar jj2000

2017-07-06 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076092#comment-16076092 ] Matthew Caruana Galizia commented on TIKA-2399: --- Their response: bq. I wouldn't mind if you

[jira] [Commented] (TIKA-2399) Version conflict with non-ASL jai-imageio-jpeg2000 and edu.ucar jj2000

2017-07-05 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074918#comment-16074918 ] Matthew Caruana Galizia commented on TIKA-2399: --- I've emailed Unidata to ask about publishing

[jira] [Commented] (TIKA-2399) Version conflict with non-ASL jai-imageio-jpeg2000 and edu.ucar jj2000

2017-07-05 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074881#comment-16074881 ] Matthew Caruana Galizia commented on TIKA-2399: --- Tim, see

[jira] [Created] (TIKA-2436) Support for GZIP-compressed EMF files

2017-07-28 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2436: - Summary: Support for GZIP-compressed EMF files Key: TIKA-2436 URL: https://issues.apache.org/jira/browse/TIKA-2436 Project: Tika Issue

[jira] [Created] (TIKA-2444) JP2 codestream files not parsed

2017-08-22 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2444: - Summary: JP2 codestream files not parsed Key: TIKA-2444 URL: https://issues.apache.org/jira/browse/TIKA-2444 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-2444) JP2 codestream files not parsed

2017-08-22 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2444: -- Attachment: balloon.j2c Example JP2K codestream file attached. > JP2 codestream

[jira] [Commented] (TIKA-2399) Version conflict with non-ASL jai-imageio-jpeg2000 and edu.ucar jj2000

2017-06-20 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055541#comment-16055541 ] Matthew Caruana Galizia commented on TIKA-2399: --- I had emailed Unidata in February about

[jira] [Created] (TIKA-2394) "Unknown message type"

2017-06-15 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2394: - Summary: "Unknown message type" Key: TIKA-2394 URL: https://issues.apache.org/jira/browse/TIKA-2394 Project: Tika Issue Type: Bug

[jira] [Comment Edited] (TIKA-2394) Unknown message type: IPM.Note.Rules.OofTemplate.Microsoft

2017-06-15 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050226#comment-16050226 ] Matthew Caruana Galizia edited comment on TIKA-2394 at 6/15/17 9:28 AM:

[jira] [Commented] (TIKA-2394) Unknown message type: IPM.Note.Rules.OofTemplate.Microsoft

2017-06-15 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050226#comment-16050226 ] Matthew Caruana Galizia commented on TIKA-2394: --- I remember seeing how to override a provided

[jira] [Updated] (TIKA-2394) Unknown message type: IPM.Note.Rules.OofTemplate.Microsoft

2017-06-15 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2394: -- Affects Version/s: 1.15 Labels: container email pst (was: )

[jira] [Commented] (TIKA-2389) Warn log level is pretty strong for missing JBIG2ImageReader

2017-06-09 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044297#comment-16044297 ] Matthew Caruana Galizia commented on TIKA-2389: --- Please don't move this to info. Before

[jira] [Created] (TIKA-2473) PCX and DCX image support

2017-10-06 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2473: - Summary: PCX and DCX image support Key: TIKA-2473 URL: https://issues.apache.org/jira/browse/TIKA-2473 Project: Tika Issue Type:

[jira] [Commented] (TIKA-2473) PCX and DCX image support

2017-10-06 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194435#comment-16194435 ] Matthew Caruana Galizia commented on TIKA-2473: --- Magic: byte 0: x0A byte 1: either x00,

[jira] [Comment Edited] (TIKA-2473) PCX and DCX image support

2017-10-06 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194435#comment-16194435 ] Matthew Caruana Galizia edited comment on TIKA-2473 at 10/6/17 10:42 AM:

[jira] [Commented] (TIKA-2444) JP2 codestream files not parsed

2017-08-30 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147657#comment-16147657 ] Matthew Caruana Galizia commented on TIKA-2444: --- I have no idea. I'm trying to solve a

[jira] [Commented] (TIKA-2454) Emails extracted from PSTs detected as unexpected file types

2017-08-30 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147907#comment-16147907 ] Matthew Caruana Galizia commented on TIKA-2454: --- I agree with you. The fact that you can't

[jira] [Commented] (TIKA-2454) Emails extracted from PSTs detected as unexpected file types

2017-08-30 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148117#comment-16148117 ] Matthew Caruana Galizia commented on TIKA-2454: --- I don't know if the same thing can be done

[jira] [Commented] (TIKA-2450) OfficeParser.parse called for zero-byte file with .doc extension

2017-08-30 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147632#comment-16147632 ] Matthew Caruana Galizia commented on TIKA-2450: --- Thank you, that looks like a good solution!

[jira] [Updated] (TIKA-2471) Tab-prefixed message body lines in Mbox interpreted as headers

2017-09-29 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2471: -- Attachment: mbox Reduced test case attached. The result of parsing this file

[jira] [Created] (TIKA-2471) Tab-prefixed message body lines in Mbox interpreted as headers

2017-09-29 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2471: - Summary: Tab-prefixed message body lines in Mbox interpreted as headers Key: TIKA-2471 URL: https://issues.apache.org/jira/browse/TIKA-2471

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2017-09-01 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150718#comment-16150718 ] Matthew Caruana Galizia commented on TIKA-2219: --- Thanks for getting back. Shouldn't the

[jira] [Commented] (TIKA-2450) OfficeParser.parse called for zero-byte file with .doc extension

2017-08-30 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147303#comment-16147303 ] Matthew Caruana Galizia commented on TIKA-2450: --- I would argue that the raison d'etre of

[jira] [Created] (TIKA-2450) OfficeParser.parse called for zero-byte file with .doc extension

2017-08-30 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2450: - Summary: OfficeParser.parse called for zero-byte file with .doc extension Key: TIKA-2450 URL: https://issues.apache.org/jira/browse/TIKA-2450

[jira] [Commented] (TIKA-2450) OfficeParser.parse called for zero-byte file with .doc extension

2017-08-30 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147347#comment-16147347 ] Matthew Caruana Galizia commented on TIKA-2450: --- When you put it that way, then I'll say yes.

[jira] [Commented] (TIKA-2450) OfficeParser.parse called for zero-byte file with .doc extension

2017-08-30 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147331#comment-16147331 ] Matthew Caruana Galizia commented on TIKA-2450: --- OK, with that in mind then I will agree you.

[jira] [Created] (TIKA-2455) Flag in metadata for alternative email bodies

2017-08-31 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2455: - Summary: Flag in metadata for alternative email bodies Key: TIKA-2455 URL: https://issues.apache.org/jira/browse/TIKA-2455 Project: Tika

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2017-08-31 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149673#comment-16149673 ] Matthew Caruana Galizia commented on TIKA-2219: --- [~talli...@mitre.org] I think this issue has

[jira] [Updated] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2017-08-31 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2219: -- Attachment: test.txt This file contains x92 characters which should force