[jira] [Commented] (TIKA-1114) sgml mime type is not detected when passed in as byte stream

2015-03-20 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371945#comment-14371945
 ] 

Tyler Palsulich commented on TIKA-1114:
---

See http://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language. It seems 
like there isn't really a dedicated way to know whether is a file is SGML or 
not...

 sgml mime type is not detected when passed in as byte stream
 

 Key: TIKA-1114
 URL: https://issues.apache.org/jira/browse/TIKA-1114
 Project: Tika
  Issue Type: Bug
  Components: mime
Reporter: Vikas Garg

 When passing sgml files as  TikaInputStream (created from byte[]) to 
 Detector.detect(), it returns text/plain as mediatype and not 
 application/sgml or text/sgml. But when I provide the file name to metadata, 
 then it gives me correct mime-type, i.e., text/sgml.
 Is it because Tika is missing any designated parser for sgml files OR am I 
 missing something? I am on Tika-1.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1114) sgml mime type is not detected when passed in as byte stream

2015-03-16 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364064#comment-14364064
 ] 

Nick Burch commented on TIKA-1114:
--

The file(1) sgml magic file seems to all be sgml-based formats such as svg, xml 
sitemap, osm, gnucash etc

Is there really such a thing as a generic SGML file though? Aren't most/all(?) 
sgml-based files actually ones of a specific SGML Application which is a 
subtype based on the SGML structure?

 sgml mime type is not detected when passed in as byte stream
 

 Key: TIKA-1114
 URL: https://issues.apache.org/jira/browse/TIKA-1114
 Project: Tika
  Issue Type: Bug
  Components: mime
Reporter: Vikas Garg

 When passing sgml files as  TikaInputStream (created from byte[]) to 
 Detector.detect(), it returns text/plain as mediatype and not 
 application/sgml or text/sgml. But when I provide the file name to metadata, 
 then it gives me correct mime-type, i.e., text/sgml.
 Is it because Tika is missing any designated parser for sgml files OR am I 
 missing something? I am on Tika-1.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1114) sgml mime type is not detected when passed in as byte stream

2015-03-14 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362038#comment-14362038
 ] 

Tyler Palsulich commented on TIKA-1114:
---

We should be able to grab the magic from file(1). See TIKA-289.

 sgml mime type is not detected when passed in as byte stream
 

 Key: TIKA-1114
 URL: https://issues.apache.org/jira/browse/TIKA-1114
 Project: Tika
  Issue Type: Bug
  Components: mime
Reporter: Vikas Garg

 When passing sgml files as  TikaInputStream (created from byte[]) to 
 Detector.detect(), it returns text/plain as mediatype and not 
 application/sgml or text/sgml. But when I provide the file name to metadata, 
 then it gives me correct mime-type, i.e., text/sgml.
 Is it because Tika is missing any designated parser for sgml files OR am I 
 missing something? I am on Tika-1.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1114) sgml mime type is not detected when passed in as byte stream

2013-04-25 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642365#comment-13642365
 ] 

Nick Burch commented on TIKA-1114:
--

Tika currently has no mime-magic for text/sgml, only filename globs. If you'd 
like Tika to be able to detect this format without needing the filename, please 
help us by suggesting suitable mime magic to detect with!

 sgml mime type is not detected when passed in as byte stream
 

 Key: TIKA-1114
 URL: https://issues.apache.org/jira/browse/TIKA-1114
 Project: Tika
  Issue Type: Bug
  Components: mime
Reporter: Vikas Garg

 When passing sgml files as  TikaInputStream (created from byte[]) to 
 Detector.detect(), it returns text/plain as mediatype and not 
 application/sgml or text/sgml. But when I provide the file name to metadata, 
 then it gives me correct mime-type, i.e., text/sgml.
 Is it because Tika is missing any designated parser for sgml files OR am I 
 missing something? I am on Tika-1.3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira