[jira] [Created] (TIKA-1160) Add support for SolidWorks files

2013-08-14 Thread gunter rombauts (JIRA)
gunter rombauts created TIKA-1160:
-

 Summary: Add support for SolidWorks files
 Key: TIKA-1160
 URL: https://issues.apache.org/jira/browse/TIKA-1160
 Project: Tika
  Issue Type: Wish
  Components: mime
Affects Versions: 1.4
Reporter: gunter rombauts
 Fix For: 1.5


It would be an advantage if the mime type for SolidWorks files could be 
detected by tika. File extensions include *slddrw, *sldasm, *.sldasm.
Standard properties are store in office alike format.
Custom properties are not detected.
I will include a custom-mimetypes.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-1001) tika no longer seems to honor HTTP meta tag for arabic text in ISO-8859-6 charset

2013-08-14 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740580#comment-13740580
 ] 

Tim Allison commented on TIKA-1001:
---

Fixed as of r1514126. Thank you for submitting this issue with test file!

 tika no longer seems to honor HTTP meta tag for arabic text in ISO-8859-6 
 charset
 -

 Key: TIKA-1001
 URL: https://issues.apache.org/jira/browse/TIKA-1001
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.2
Reporter: david lemon
 Attachments: badarabic.html, TIKA-1001v1.tar.gz


 attached document extracts correctly in Tika 1.1
 attached document extracts incorrectly in tika 1.2.
 The difference appears to be that tika 1.1 honors the http meta content-type 
 tag which specifies the charset as iso-8859-6, and correctly converts the 
 output to UTF-8.
 tika 1.2 appears to ignore the charset specified in the meta tag.
 Some noodling seems to indicate that the problem is the charset.
 it doesn't matter what mode tika is used in (server, app mode, etc. even if 
 content-type is specified with a charset, the output is still garbage).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira