I fiddled around a bit more with this, trying various things that actually connected to the service.

I finally figured out that if you send the string "xxx & yyy" to the service, it actually processes the string

<Document><Title>1212537108630-85FDAB4B-292518</Title><Date>2008-06-03</Date><Body>xxx &amp; yyy</Body></Document>

or something like that. And that returned offsets are relative to this string. To correct the offsets returned so that they correspond to what you sent looks like it has 2 parts: the first part - the prefix "<Document ... <Body>" is pretty easily accounted for. The send part, expanding & to &amp; requires more work. Other characters are also converted, some strangely. I've seen the usual:

<  converted to &lt;,   > converted to &gt;

The character " seemed to be converted to &amp;quot;

All this is apparently a "bug" - their forum includes a post saying the problem with the "&" will be fixed in the next release.

I've posted a reply to their forum asking about other characters beside the "&".

One final note: their API says that for the POST method, content sent using that method needs to be escaped. I think that means the kind of escaping that is done for encoding strings in URLs; I used the Java library method: URLEncoder.encode(string, 'UTF-8') to do this and it seemed to do the trick.

-Marshall

Reply via email to