[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-09 Thread Packiaraj Sakkanan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378151#comment-17378151 ] Packiaraj Sakkanan commented on TIKA-3466: -- [~nick], The mentioned scenrio is pretty much

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-09 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377953#comment-17377953 ] Nick Burch commented on TIKA-3466: -- [~psakkanan] You really need to be doing some xml parsing /

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-08 Thread Packiaraj Sakkanan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377563#comment-17377563 ] Packiaraj Sakkanan commented on TIKA-3466: -- Hi [~tallison] Here is the stripped-down version of

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377548#comment-17377548 ] Tim Allison commented on TIKA-3466: --- And, for the record, the {{file}} command (file-5.37) identifies

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377542#comment-17377542 ] Tim Allison commented on TIKA-3466: --- We need to do as much as we can on Tika to get file detection

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-08 Thread Packiaraj Sakkanan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377508#comment-17377508 ] Packiaraj Sakkanan commented on TIKA-3466: -- HiĀ [~nick], We are having problem with allow-list. We

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-08 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377317#comment-17377317 ] Nick Burch commented on TIKA-3466: -- I'm happy to add the xmlns version as a match, that seems pretty

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376851#comment-17376851 ] Kenneth William Krugler commented on TIKA-3466: --- Hi [~psakkanan] - that namespace is inside

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376828#comment-17376828 ] Tim Allison commented on TIKA-3466: --- At a high level, Tika does a pretty good job on files in the wild,

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Packiaraj Sakkanan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376798#comment-17376798 ] Packiaraj Sakkanan commented on TIKA-3466: -- Wouldn't be sufficent to check namepsace alone

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376771#comment-17376771 ] Kenneth William Krugler commented on TIKA-3466: --- Browsers do all kinds of helicopter stunts

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Packiaraj Sakkanan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376760#comment-17376760 ] Packiaraj Sakkanan commented on TIKA-3466: -- The problem here is that this sample file is rendered

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376712#comment-17376712 ] Kenneth William Krugler commented on TIKA-3466: --- This looks like broken HTML. Which we would

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376689#comment-17376689 ] Nick Burch commented on TIKA-3466: -- I've never seen a file that like before, but I'm sure Tim will pop