[
https://issues.apache.org/jira/browse/TIKA-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783431#comment-17783431
]
Sam Stephens commented on TIKA-4167:
Thanks Tim. I don't have a use case where I need this behavior.
Sam Stephens created TIKA-4167:
--
Summary: CONTENT_TYPE_USER_OVERRIDE doesn't force content type for
application/illustrator files
Key: TIKA-4167
URL: https://issues.apache.org/jira/browse/TIKA-4167
[
https://issues.apache.org/jira/browse/TIKA-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551871#comment-17551871
]
Sam Stephens commented on TIKA-3768:
Ah, interesting, this is a case of me misunderstanding the
[
https://issues.apache.org/jira/browse/TIKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539138#comment-17539138
]
Sam Stephens commented on TIKA-3769:
Thanks for the prompt fix!
> md5 incorrectly detected as
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539054#comment-17539054
]
Sam Stephens commented on TIKA-3710:
{quote}The h1 isn't quite as unique as we might like, and maybe
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539051#comment-17539051
]
Sam Stephens commented on TIKA-3710:
Is it valid for a message/rfc822 message to have a bunch of
Sam Stephens created TIKA-3769:
--
Summary: md5 incorrectly detected as application/marc
Key: TIKA-3769
URL: https://issues.apache.org/jira/browse/TIKA-3769
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538528#comment-17538528
]
Sam Stephens commented on TIKA-3711:
Thanks [~tallison] , confirmed this is working for me as
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538524#comment-17538524
]
Sam Stephens commented on TIKA-3710:
Note that I exclude org.apache.tika.parser.mail.RFC822Parser as a
[
https://issues.apache.org/jira/browse/TIKA-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Stephens updated TIKA-3768:
---
Description:
When running AutoDetectParser on message/rfc822 structured text documents, such
as the
Sam Stephens created TIKA-3768:
--
Summary: message/rfc822 does not include Headers in extracted text
Key: TIKA-3768
URL: https://issues.apache.org/jira/browse/TIKA-3768
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521349#comment-17521349
]
Sam Stephens commented on TIKA-3666:
It looks like [~4U6U57] and I have both provided POIFSViewer
[
https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Stephens updated TIKA-3666:
---
Attachment: sam-poifsviewer.txt
> Detect and indicate file encrypted with Rights Management Service
[
https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520887#comment-17520887
]
Sam Stephens commented on TIKA-3666:
[~4U6U57] did you have any luck sourcing sample files? I also
[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517167#comment-17517167
]
Sam Stephens edited comment on TIKA-3711 at 4/5/22 10:00 PM:
-
Regarding
[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517167#comment-17517167
]
Sam Stephens edited comment on TIKA-3711 at 4/5/22 12:25 AM:
-
Regarding
[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517167#comment-17517167
]
Sam Stephens commented on TIKA-3711:
Regarding filenames, I don't think they will ever be semantically
[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Stephens updated TIKA-3711:
---
Attachment: word-doc-with-image-from-word-365.docx
> Image file names included in parsed Word
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516142#comment-17516142
]
Sam Stephens commented on TIKA-3710:
The HTML document is exactly what you see there; these documents
[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516141#comment-17516141
]
Sam Stephens commented on TIKA-3711:
I guess the question is what are the semantics of this operation?
[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Stephens updated TIKA-3711:
---
Description:
The attached Word document includes nothing but a single image. Running it
through the
Sam Stephens created TIKA-3711:
--
Summary: Image file names included in parsed Word Document text
Key: TIKA-3711
URL: https://issues.apache.org/jira/browse/TIKA-3711
Project: Tika
Issue Type:
Sam Stephens created TIKA-3710:
--
Summary: HTML document detected incorrect as message/rfc822
Key: TIKA-3710
URL: https://issues.apache.org/jira/browse/TIKA-3710
Project: Tika
Issue Type: Bug
23 matches
Mail list logo