Of course the return type is MediaType, i.e. MediaType type = TikaConfig.getDefaultConfig().getDetector().detect(...);
On Thu, Jul 26, 2012 at 1:03 AM, Public Network Services < [email protected]> wrote: > Actually, I am surprised that many people are not shouting about this > already. > > All the static detect() methods of the Tika convenience class return the > mime type as a String and, if not the recommended approache, they are > certainly very popular. > > I have always been puzzled as to why the return type of such methods > should be String, as opposed to a MimeType object. > > Tika is an excellent work and all the contributors are to be > congratulated, but, in all due respect, it seems that this modification of > the return String for "text/plain" will cause numerous headaches. > > Perhaps you should issue a directive that people should use the MimeType > class, even if by creating such objects by parsing the String that > Tika.detect() returns. Or, do something like > > MimeType type = TikaConfig.getDefaultConfig().getDetector().detect(...); > > > :-) > > > On Wed, Jul 25, 2012 at 3:50 PM, Paulini, Matthew CTR USAF AFMC AFRL/RISA > <[email protected]> wrote: > >> I can see how the encoding might be useful to some people. However, I >> also agree that older code that is checking against the MIME type returned >> from Tika for equality (i.e. .equals() or .compareTo() in java) rather than >> (i.e. contains() in java) could cause some issues if the dependant code >> doesn't do extra processing on the MIME before their check. Since the >> encoding was never present before, the chances that older code would have >> done processing to grab just the MIME type portion of the returned string >> is slim, I would assume. >> >> Wouldn't it be more backword compatible if you just added an "encoding" >> field to the list of metadata attributes that are returned? >> >> ~Scout >> >> ________________________________ >> >> From: Public Network Services [mailto:[email protected]] >> Sent: Wed 7/25/2012 8:31 AM >> To: [email protected] >> Subject: Re: Charset detection >> >> >> If it does not add much to processing, then it could be run earlier, for >> consistency purposes >> >> Having said that, I am not sure about the usefulness of appending the >> charset at the end of the detected MIME type string in the first place. It >> is correct from a syntax point, but it adds one more level of string >> processing to extract it (as opposed to just getting it from the metadata). >> Are we sure, for instance, that older code (checking for equality to >> "text/plain") will not be not broken? >> >> Of course the decision has already been made and you guys know very well >> what you are doing, but it still puzzles me. :-) >> >> >> On Wed, Jul 25, 2012 at 10:55 AM, Jukka Zitting <[email protected]> >> wrote: >> >> >> Hi, >> >> >> On Wed, Jul 25, 2012 at 1:05 AM, Public Network Services >> <[email protected]> wrote: >> > Should that be the case? >> >> >> Yes. So far the extra charset detection code is only being run >> when >> you actually parse a document, so the charset parameter gets >> added at >> that point, not yet at type detection. Perhaps we should run >> charset >> detection already earlier at that point? >> >> BR, >> >> Jukka Zitting >> >> >> >> >
