Hmmm...
Taking these comments on board; after initiating
/* creates a Tika facade using the default configuration */
private Tika tika;
I now include a more verbose snippet of the method giving me problems here.
/**
* A facade interface to trying all the possible mime type resolution
* strategies available within Tika. First, the mime type provided in
* <code>typeName</code> is cleaned, with {@link #cleanMimeType(String)}.
* Then the cleaned mime type is looked up in the underlying Tika
* {@link MimeTypes} registry, by its cleaned name. If the {@link
MimeType} is
* found, then that mime type is used, otherwise {@link URL} resolution is
* used to try and determine the mime type. If that means is
unsuccessful, and
* if <code>mime.type.magic</code> is enabled in {@link
NutchConfiguration},
* then mime type magic resolution is used to try and obtain a
* better-than-the-default approximation of the {@link MimeType}.
*
* @param typeName
* The original mime type, returned from a {@link
ProtocolOutput}.
* @param url
* The given {@link URL}, that Nutch was trying to crawl. The
given
* name can also be a URL or a full file path. In such cases
only the
* file name part of the string is used for type detection.
* @param data
* The byte data, returned from the crawl, if any.
* @return The correctly, automatically guessed {@link MimeType} name.
*/
public String autoResolveContentType(String typeName, String url, byte[]
data) {
MimeType type = null;
String cleanedMimeType = null;
....
// if returned null, or if it's the default type then try url resolution
if (type == null
|| (type != null && type.getName().equals(MimeTypes.OCTET_STREAM)))
{
// If no mime-type header, or cannot find a corresponding registered
// mime-type, then guess a mime-type from the url pattern
String mt = tika.detect(url);
type = mt != null ? mt : type;
}
You will notice that the final two lines in the last code block contain the
'new' code you suggested. (thanks for this btw)
In this case we utilise 'String url' in the method parameter because the
given name can also be a URL or a full file path. In such cases only the
file name part of the string is used for type detection (from Javadoc :)).
After compiling I get
[javac] MimeUtil.java:165: incompatible types
[javac] found :
java.lang.Object&java.io.Serializable&java.lang.Comparable<? extends
java.lang.Object&java.io.Serializable&java.lang.Comparable<?>>
[javac] required: org.apache.tika.mime.MimeType
[javac] type = mt != null ? mt : type;
[javac] ^
There is something which I am not quite getting right here :0| Any
suggestions please.
Thank you
Lewis
>
>
> On Mon, Feb 27, 2012 at 7:34 AM, Nick Burch <[email protected]>wrote:
>
>>
>> How about:
>> String mt = Tika.detect(URL);
>> type = mt != null ? mt : type;
>>
>> That uses the new style call, and avoids detecting twice which your old
>> code did
>>
>> Nick
>>
>
>
> **