On 09.01.2012, at 23:31, Srecko Joksimovic wrote:
> Hello Rupert,
>
> Could you please give me an example of annotation various types of documents?
> As I understood from
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/metaxaengine.html
>
> And
>
> curl -i -X POST -H "Content-Type:text/html" -T testpage.html
> http://localhost:8080/engines
>
> MIME type should match to document type. But (maybe this is going to be
> stupid question)… when I annotated text, I called method like this:
> IOUtils.write(_string_to_annotate, out);
> IOUtils.closeQuietly(out);
>
> For document of any type, I should probably convert document content to byte
> array, and then call similar method?
> I’m asking this because I didn’t see the possibility to provide document URL
> and to get results. I suppose that this would be the only way?
>
Generally the MIME type of the content MUST match the parsed value of the
Content-Type header. Maybe the Metaxa engine has also some way to detect the
MIME type based on the content, But I do not know if this is the case.
It is also true that for binary documents you need to use byte oriented methods
of IOUtils. However I would also consider to "stream" the data directly from
the file to the OutputStream of the POST request to avoid loading the whole
content into memory.
Note that for textual content you should also correctly set the Charset. If you
use an other Charset than "UTF-8" you do need to set it as parameter to the
parsed "Content-Type" header. Such as
Media-Type: text/plain; charset=UTF-16
best
Rupert