[jira] [Commented] (TIKA-1114) sgml mime type is not detected when passed in as byte stream
[ https://issues.apache.org/jira/browse/TIKA-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371945#comment-14371945 ] Tyler Palsulich commented on TIKA-1114: --- See http://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language. It seems like there isn't really a dedicated way to know whether is a file is SGML or not... sgml mime type is not detected when passed in as byte stream Key: TIKA-1114 URL: https://issues.apache.org/jira/browse/TIKA-1114 Project: Tika Issue Type: Bug Components: mime Reporter: Vikas Garg When passing sgml files as TikaInputStream (created from byte[]) to Detector.detect(), it returns text/plain as mediatype and not application/sgml or text/sgml. But when I provide the file name to metadata, then it gives me correct mime-type, i.e., text/sgml. Is it because Tika is missing any designated parser for sgml files OR am I missing something? I am on Tika-1.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1114) sgml mime type is not detected when passed in as byte stream
[ https://issues.apache.org/jira/browse/TIKA-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364064#comment-14364064 ] Nick Burch commented on TIKA-1114: -- The file(1) sgml magic file seems to all be sgml-based formats such as svg, xml sitemap, osm, gnucash etc Is there really such a thing as a generic SGML file though? Aren't most/all(?) sgml-based files actually ones of a specific SGML Application which is a subtype based on the SGML structure? sgml mime type is not detected when passed in as byte stream Key: TIKA-1114 URL: https://issues.apache.org/jira/browse/TIKA-1114 Project: Tika Issue Type: Bug Components: mime Reporter: Vikas Garg When passing sgml files as TikaInputStream (created from byte[]) to Detector.detect(), it returns text/plain as mediatype and not application/sgml or text/sgml. But when I provide the file name to metadata, then it gives me correct mime-type, i.e., text/sgml. Is it because Tika is missing any designated parser for sgml files OR am I missing something? I am on Tika-1.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1114) sgml mime type is not detected when passed in as byte stream
[ https://issues.apache.org/jira/browse/TIKA-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362038#comment-14362038 ] Tyler Palsulich commented on TIKA-1114: --- We should be able to grab the magic from file(1). See TIKA-289. sgml mime type is not detected when passed in as byte stream Key: TIKA-1114 URL: https://issues.apache.org/jira/browse/TIKA-1114 Project: Tika Issue Type: Bug Components: mime Reporter: Vikas Garg When passing sgml files as TikaInputStream (created from byte[]) to Detector.detect(), it returns text/plain as mediatype and not application/sgml or text/sgml. But when I provide the file name to metadata, then it gives me correct mime-type, i.e., text/sgml. Is it because Tika is missing any designated parser for sgml files OR am I missing something? I am on Tika-1.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1114) sgml mime type is not detected when passed in as byte stream
[ https://issues.apache.org/jira/browse/TIKA-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642365#comment-13642365 ] Nick Burch commented on TIKA-1114: -- Tika currently has no mime-magic for text/sgml, only filename globs. If you'd like Tika to be able to detect this format without needing the filename, please help us by suggesting suitable mime magic to detect with! sgml mime type is not detected when passed in as byte stream Key: TIKA-1114 URL: https://issues.apache.org/jira/browse/TIKA-1114 Project: Tika Issue Type: Bug Components: mime Reporter: Vikas Garg When passing sgml files as TikaInputStream (created from byte[]) to Detector.detect(), it returns text/plain as mediatype and not application/sgml or text/sgml. But when I provide the file name to metadata, then it gives me correct mime-type, i.e., text/sgml. Is it because Tika is missing any designated parser for sgml files OR am I missing something? I am on Tika-1.3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira