[jira] [Commented] (TIKA-4167) CONTENT_TYPE_USER_OVERRIDE doesn't force content type for application/illustrator files

2023-11-06 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783431#comment-17783431 ] Sam Stephens commented on TIKA-4167: Thanks Tim. I don't have a use case where I need this behavior.

[jira] [Created] (TIKA-4167) CONTENT_TYPE_USER_OVERRIDE doesn't force content type for application/illustrator files

2023-11-06 Thread Sam Stephens (Jira)
Sam Stephens created TIKA-4167: -- Summary: CONTENT_TYPE_USER_OVERRIDE doesn't force content type for application/illustrator files Key: TIKA-4167 URL: https://issues.apache.org/jira/browse/TIKA-4167

[jira] [Commented] (TIKA-3768) message/rfc822 does not include Headers in extracted text

2022-06-08 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551871#comment-17551871 ] Sam Stephens commented on TIKA-3768: Ah, interesting, this is a case of me misunderstanding the

[jira] [Commented] (TIKA-3769) md5 incorrectly detected as application/marc

2022-05-18 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539138#comment-17539138 ] Sam Stephens commented on TIKA-3769: Thanks for the prompt fix! > md5 incorrectly detected as

[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822

2022-05-18 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539054#comment-17539054 ] Sam Stephens commented on TIKA-3710: {quote}The h1 isn't quite as unique as we might like, and maybe

[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822

2022-05-18 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539051#comment-17539051 ] Sam Stephens commented on TIKA-3710: Is it valid for a message/rfc822 message to have a bunch of

[jira] [Created] (TIKA-3769) md5 incorrectly detected as application/marc

2022-05-17 Thread Sam Stephens (Jira)
Sam Stephens created TIKA-3769: -- Summary: md5 incorrectly detected as application/marc Key: TIKA-3769 URL: https://issues.apache.org/jira/browse/TIKA-3769 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-3711) Image file names included in parsed Word Document text

2022-05-17 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538528#comment-17538528 ] Sam Stephens commented on TIKA-3711: Thanks [~tallison] , confirmed this is working for me as

[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822

2022-05-17 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538524#comment-17538524 ] Sam Stephens commented on TIKA-3710: Note that I exclude org.apache.tika.parser.mail.RFC822Parser as a

[jira] [Updated] (TIKA-3768) message/rfc822 does not include Headers in extracted text

2022-05-17 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Stephens updated TIKA-3768: --- Description: When running AutoDetectParser on message/rfc822 structured text documents, such as the

[jira] [Created] (TIKA-3768) message/rfc822 does not include Headers in extracted text

2022-05-17 Thread Sam Stephens (Jira)
Sam Stephens created TIKA-3768: -- Summary: message/rfc822 does not include Headers in extracted text Key: TIKA-3768 URL: https://issues.apache.org/jira/browse/TIKA-3768 Project: Tika Issue Type:

[jira] [Commented] (TIKA-3666) Detect and indicate file encrypted with Rights Management Service RMS/IRM

2022-04-12 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521349#comment-17521349 ] Sam Stephens commented on TIKA-3666: It looks like [~4U6U57] and I have both provided POIFSViewer

[jira] [Updated] (TIKA-3666) Detect and indicate file encrypted with Rights Management Service RMS/IRM

2022-04-12 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Stephens updated TIKA-3666: --- Attachment: sam-poifsviewer.txt > Detect and indicate file encrypted with Rights Management Service

[jira] [Commented] (TIKA-3666) Detect and indicate file encrypted with Rights Management Service RMS/IRM

2022-04-11 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520887#comment-17520887 ] Sam Stephens commented on TIKA-3666: [~4U6U57] did you have any luck sourcing sample files? I also

[jira] [Comment Edited] (TIKA-3711) Image file names included in parsed Word Document text

2022-04-05 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517167#comment-17517167 ] Sam Stephens edited comment on TIKA-3711 at 4/5/22 10:00 PM: - Regarding

[jira] [Comment Edited] (TIKA-3711) Image file names included in parsed Word Document text

2022-04-04 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517167#comment-17517167 ] Sam Stephens edited comment on TIKA-3711 at 4/5/22 12:25 AM: - Regarding

[jira] [Commented] (TIKA-3711) Image file names included in parsed Word Document text

2022-04-04 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517167#comment-17517167 ] Sam Stephens commented on TIKA-3711: Regarding filenames, I don't think they will ever be semantically

[jira] [Updated] (TIKA-3711) Image file names included in parsed Word Document text

2022-04-04 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Stephens updated TIKA-3711: --- Attachment: word-doc-with-image-from-word-365.docx > Image file names included in parsed Word

[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822

2022-04-01 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516142#comment-17516142 ] Sam Stephens commented on TIKA-3710: The HTML document is exactly what you see there; these documents

[jira] [Commented] (TIKA-3711) Image file names included in parsed Word Document text

2022-04-01 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516141#comment-17516141 ] Sam Stephens commented on TIKA-3711: I guess the question is what are the semantics of this operation?

[jira] [Updated] (TIKA-3711) Image file names included in parsed Word Document text

2022-03-31 Thread Sam Stephens (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Stephens updated TIKA-3711: --- Description: The attached Word document includes nothing but a single image. Running it through the

[jira] [Created] (TIKA-3711) Image file names included in parsed Word Document text

2022-03-31 Thread Sam Stephens (Jira)
Sam Stephens created TIKA-3711: -- Summary: Image file names included in parsed Word Document text Key: TIKA-3711 URL: https://issues.apache.org/jira/browse/TIKA-3711 Project: Tika Issue Type:

[jira] [Created] (TIKA-3710) HTML document detected incorrect as message/rfc822

2022-03-31 Thread Sam Stephens (Jira)
Sam Stephens created TIKA-3710: -- Summary: HTML document detected incorrect as message/rfc822 Key: TIKA-3710 URL: https://issues.apache.org/jira/browse/TIKA-3710 Project: Tika Issue Type: Bug