[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-1896:
--
Attachment: test.html
> Invalid closing script tag not handled gracefully by
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184969#comment-15184969
]
Matthew Caruana Galizia edited comment on TIKA-1896 at 3/8/16 2:33 PM:
Matthew Caruana Galizia created TIKA-2274:
-
Summary: and metadata collision
Key: TIKA-2274
URL: https://issues.apache.org/jira/browse/TIKA-2274
Project: Tika
Issue Type: Bug
Matthew Caruana Galizia created TIKA-2245:
-
Summary: Standardise on java.util.Logging
Key: TIKA-2245
URL: https://issues.apache.org/jira/browse/TIKA-2245
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829856#comment-15829856
]
Matthew Caruana Galizia commented on TIKA-2245:
---
So should we agree that parsers should use
[
https://issues.apache.org/jira/browse/TIKA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2245:
--
Description:
Tika parsers sometimes use Log4j's Logger, sometimes the JUL
[
https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823691#comment-15823691
]
Matthew Caruana Galizia commented on TIKA-2232:
---
Could we at least log a warning once when
[
https://issues.apache.org/jira/browse/TIKA-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887559#comment-15887559
]
Matthew Caruana Galizia commented on TIKA-2274:
---
Thanks fot checking up on this. Try
[
https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2280:
--
Description:
While the MESSAGE_FROM metadata field is extracted for both RFC and
Matthew Caruana Galizia created TIKA-2280:
-
Summary: message_from not extracted from Outlook emails
Key: TIKA-2280
URL: https://issues.apache.org/jira/browse/TIKA-2280
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889192#comment-15889192
]
Matthew Caruana Galizia commented on TIKA-2280:
---
OK, so this is a duplicate then. Your
[
https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890106#comment-15890106
]
Matthew Caruana Galizia commented on TIKA-2235:
---
In the majority of cases, JPEG, JBIG2
[
https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890324#comment-15890324
]
Matthew Caruana Galizia commented on TIKA-2235:
---
Ah, good catch. OCR'ing inline.
> Use
[
https://issues.apache.org/jira/browse/TIKA-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644455#comment-15644455
]
Matthew Caruana Galizia commented on TIKA-2167:
---
[~talli...@mitre.org] to replicate the
Matthew Caruana Galizia created TIKA-2167:
-
Summary: Image processing causes OCR to fail
Key: TIKA-2167
URL: https://issues.apache.org/jira/browse/TIKA-2167
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2167:
--
Attachment: simple.tiff
> Image processing causes OCR to fail
>
[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650892#comment-15650892
]
Matthew Caruana Galizia commented on TIKA-2174:
---
Both on inline and independent files. I've
Matthew Caruana Galizia created TIKA-2174:
-
Summary: JP2 and JPX (JPEG 2000) support not declared by
TesseractOCRParser
Key: TIKA-2174
URL: https://issues.apache.org/jira/browse/TIKA-2174
[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2174:
--
Description:
A complete install of Leptonica with Tesseract will add support for
[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651347#comment-15651347
]
Matthew Caruana Galizia commented on TIKA-2174:
---
That issue went away once I added 'jp2' and
[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653430#comment-15653430
]
Matthew Caruana Galizia commented on TIKA-2174:
---
Thank you! I've also confirmed that
[
https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653441#comment-15653441
]
Matthew Caruana Galizia commented on TIKA-2175:
---
I've filed [an
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638978#comment-15638978
]
Matthew Caruana Galizia commented on TIKA-1896:
---
[~talli...@mitre.org] did you ever run your
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15663830#comment-15663830
]
Matthew Caruana Galizia commented on TIKA-1896:
---
Perhaps we should push ahead with Jsoup
[
https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696377#comment-15696377
]
Matthew Caruana Galizia commented on TIKA-2175:
---
Still no joy, both with my bridge classes
[
https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701919#comment-15701919
]
Matthew Caruana Galizia commented on TIKA-2175:
---
The problem was OpenCL support in Tesseract.
Matthew Caruana Galizia created TIKA-2235:
-
Summary: Use Tesseract's recommended DPI for PDF images
Key: TIKA-2235
URL: https://issues.apache.org/jira/browse/TIKA-2235
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818176#comment-15818176
]
Matthew Caruana Galizia commented on TIKA-2235:
---
Yes, I am already! Thanks for linking me to
Matthew Caruana Galizia created TIKA-2221:
-
Summary: poi.EncryptedDocumentException not wrapped in
tika.exception.EncryptedDocumentException
Key: TIKA-2221
URL:
[
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929105#comment-15929105
]
Matthew Caruana Galizia commented on TIKA-1195:
---
[~talli...@mitre.org] d'you reckon that will
[
https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia closed TIKA-2280.
-
Resolution: Duplicate
> message_from not extracted from Outlook emails
>
[
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890800#comment-15890800
]
Matthew Caruana Galizia commented on TIKA-1865:
---
Thank you, this is a big improvement.
>
[
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106136#comment-16106136
]
Matthew Caruana Galizia commented on TIKA-2436:
---
To give you an example of why this is a
[
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106135#comment-16106135
]
Matthew Caruana Galizia commented on TIKA-2436:
---
The difference is that the file is a treated
[
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2436:
--
Attachment: image004.emz
Example EMZ file attached. Common Compress will yield
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085709#comment-16085709
]
Matthew Caruana Galizia commented on TIKA-2042:
---
I'd like to ask for this issue to be
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085842#comment-16085842
]
Matthew Caruana Galizia commented on TIKA-2042:
---
[~gagravarr] thank you - that fixes the
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2042:
--
Attachment: mbox_email_section.txt
Sample of one of the message sections from
[
https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-879:
-
Attachment: mbox_email_section.txt
As described in TIKA-2042, the attached file
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085847#comment-16085847
]
Matthew Caruana Galizia edited comment on TIKA-2042 at 7/13/17 3:13 PM:
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2042:
--
Attachment: mbox_header.txt
Header attached with identifying information
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085709#comment-16085709
]
Matthew Caruana Galizia edited comment on TIKA-2042 at 7/13/17 2:22 PM:
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078012#comment-16078012
]
Matthew Caruana Galizia commented on TIKA-2399:
---
OK. I can't think of any other option for
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073660#comment-16073660
]
Matthew Caruana Galizia commented on TIKA-2399:
---
Wouldn't it be better to warn? (Option 2 in
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076092#comment-16076092
]
Matthew Caruana Galizia commented on TIKA-2399:
---
Their response:
bq. I wouldn't mind if you
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074918#comment-16074918
]
Matthew Caruana Galizia commented on TIKA-2399:
---
I've emailed Unidata to ask about publishing
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074881#comment-16074881
]
Matthew Caruana Galizia commented on TIKA-2399:
---
Tim, see
Matthew Caruana Galizia created TIKA-2436:
-
Summary: Support for GZIP-compressed EMF files
Key: TIKA-2436
URL: https://issues.apache.org/jira/browse/TIKA-2436
Project: Tika
Issue
Matthew Caruana Galizia created TIKA-2444:
-
Summary: JP2 codestream files not parsed
Key: TIKA-2444
URL: https://issues.apache.org/jira/browse/TIKA-2444
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2444:
--
Attachment: balloon.j2c
Example JP2K codestream file attached.
> JP2 codestream
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055541#comment-16055541
]
Matthew Caruana Galizia commented on TIKA-2399:
---
I had emailed Unidata in February about
Matthew Caruana Galizia created TIKA-2394:
-
Summary: "Unknown message type"
Key: TIKA-2394
URL: https://issues.apache.org/jira/browse/TIKA-2394
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050226#comment-16050226
]
Matthew Caruana Galizia edited comment on TIKA-2394 at 6/15/17 9:28 AM:
[
https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050226#comment-16050226
]
Matthew Caruana Galizia commented on TIKA-2394:
---
I remember seeing how to override a provided
[
https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2394:
--
Affects Version/s: 1.15
Labels: container email pst (was: )
[
https://issues.apache.org/jira/browse/TIKA-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044297#comment-16044297
]
Matthew Caruana Galizia commented on TIKA-2389:
---
Please don't move this to info.
Before
Matthew Caruana Galizia created TIKA-2473:
-
Summary: PCX and DCX image support
Key: TIKA-2473
URL: https://issues.apache.org/jira/browse/TIKA-2473
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194435#comment-16194435
]
Matthew Caruana Galizia commented on TIKA-2473:
---
Magic:
byte 0: x0A
byte 1: either x00,
[
https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194435#comment-16194435
]
Matthew Caruana Galizia edited comment on TIKA-2473 at 10/6/17 10:42 AM:
[
https://issues.apache.org/jira/browse/TIKA-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147657#comment-16147657
]
Matthew Caruana Galizia commented on TIKA-2444:
---
I have no idea. I'm trying to solve a
[
https://issues.apache.org/jira/browse/TIKA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147907#comment-16147907
]
Matthew Caruana Galizia commented on TIKA-2454:
---
I agree with you. The fact that you can't
[
https://issues.apache.org/jira/browse/TIKA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148117#comment-16148117
]
Matthew Caruana Galizia commented on TIKA-2454:
---
I don't know if the same thing can be done
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147632#comment-16147632
]
Matthew Caruana Galizia commented on TIKA-2450:
---
Thank you, that looks like a good solution!
[
https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2471:
--
Attachment: mbox
Reduced test case attached. The result of parsing this file
Matthew Caruana Galizia created TIKA-2471:
-
Summary: Tab-prefixed message body lines in Mbox interpreted as
headers
Key: TIKA-2471
URL: https://issues.apache.org/jira/browse/TIKA-2471
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150718#comment-16150718
]
Matthew Caruana Galizia commented on TIKA-2219:
---
Thanks for getting back. Shouldn't the
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147303#comment-16147303
]
Matthew Caruana Galizia commented on TIKA-2450:
---
I would argue that the raison d'etre of
Matthew Caruana Galizia created TIKA-2450:
-
Summary: OfficeParser.parse called for zero-byte file with .doc
extension
Key: TIKA-2450
URL: https://issues.apache.org/jira/browse/TIKA-2450
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147347#comment-16147347
]
Matthew Caruana Galizia commented on TIKA-2450:
---
When you put it that way, then I'll say yes.
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147331#comment-16147331
]
Matthew Caruana Galizia commented on TIKA-2450:
---
OK, with that in mind then I will agree you.
Matthew Caruana Galizia created TIKA-2455:
-
Summary: Flag in metadata for alternative email bodies
Key: TIKA-2455
URL: https://issues.apache.org/jira/browse/TIKA-2455
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149673#comment-16149673
]
Matthew Caruana Galizia commented on TIKA-2219:
---
[~talli...@mitre.org] I think this issue has
[
https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2219:
--
Attachment: test.txt
This file contains x92 characters which should force
73 matches
Mail list logo