Hi,

I'm new to Tika and I have a question regarding content types.

I would like to check if the content type provided by a content repository 
actually matches the content. So I use Tika to detect the type from the content 
stream and compare it to the provided content type.
That works well if the content types are the same or one is a subtype of the 
other. But there are some cases that require a more fuzzy comparison. 
If, for example, Tika detects "application/xhtml+xml" and the repository 
reports "text/html" then that would be a close enough match for my purpose.

Is there a simple way in Tika to do such fuzzy comparisons?


Thanks,

Florian

Reply via email to