On 09.01.2012, at 23:31, Srecko Joksimovic wrote:

> Hello Rupert,
>  
> Could you please give me an example of annotation various types of documents?
> As I understood from 
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/metaxaengine.html
>  
> And
>  
> curl -i -X POST -H "Content-Type:text/html" -T testpage.html 
> http://localhost:8080/engines
>  
> MIME type should match to document type. But (maybe this is going to be 
> stupid question)… when I annotated text, I called method like this:
> IOUtils.write(_string_to_annotate, out);          
> IOUtils.closeQuietly(out);
>  
> For document of any type, I should probably convert document content to byte 
> array, and then call similar method?
> I’m asking this because I didn’t see the possibility to provide document URL 
> and to get results. I suppose that this would be the only way?
>  

Generally the MIME type of the content MUST  match the parsed value of the 
Content-Type header. Maybe the Metaxa engine has also some way to detect the 
MIME type based on the content, But I do not know if this is the case.

It is also true that for binary documents you need to use byte oriented methods 
of IOUtils. However I would also consider to "stream" the data directly from 
the file to the OutputStream of the POST request to avoid loading the whole 
content into memory.

Note that for textual content you should also correctly set the Charset. If you 
use an other Charset than "UTF-8" you do need to set it as parameter to the 
parsed "Content-Type" header.  Such as

    Media-Type: text/plain;  charset=UTF-16

best
Rupert

Reply via email to