It depends on what type of Serializer you use and what kind of Serlializer config you put into your sitemap?
By default XMLSerializer/HTMLSerializer uses UTF-8 encoding. So instead of 1 UTF-16 char you got 2 chars UTF-8 encoded. Of cource there might be also issue with emoji charset, but I would first try to change encoding in Serliazer config (to be UTF-16). Greetings, -Greg 2017-06-07 10:43 GMT+02:00 Flynn, Peter <pfl...@ucc.ie>: > I had a related problem with 3–4 CJK characters being converted to their > &#hex; format. Very weird, but it turned out to be the old and buggy copy > of jtidy, and I can't figure out how to replace it. > > I haven't had the problem you describe, though, and I have a user who has > implemented emoji in Cocoon, see http://research.ucc.ie/emojis/ > > P > > -- > Peter Flynn | Academic and Collaborative Technologies | IT Services | > University College Cork | Ireland | pfl...@ucc.ie | > http://research.ucc.ie/profiles/H505/pflynn | Sent from Hiri > <https://www.hiri.com/> > > > On 2017-06-06 17:08:51+01:00 Christopher Schultz wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > All, > > I've been testing my application for use with high Unicode code points > such as emoji like 😍 which is this > one:http://www.fileformat.info/info/unicode/char/1F60D/index.htm > > My application and database can handle this code point, but Cocoon > butchers it in a way that I have seen before -- the way that > commons-lang's StringEscapeUtils.escapeXml/escapeHtml seems to do. > > Instead of letting the character through as-is, it tries to convert it > into these two numbered entities: > > �� > > Oddly enough, those are the two double-byte UTF-16 characters you'd > get, but they shouldn't be split-up like that, I don't think. > > I haven't found a version of commons-lang 2.x that doesn't break these > kinds of characters. commons-lang3 does the right thing, but they are > incompatible libraries. > > Does anyone know the code well enough to know how difficult it would > be to change the way Cocoon 2.1 escapes its output? For example, by > using commons-lang3? > > I haven't tried Cocoon 2.2, yet, and I can't tell what dependencies it > has. I also can't exactly tell what to do now that I've downloaded the > binary package. Can this just be used as a drop-in replacement for > Cocoon 2.1.x? Cocoon 2.1.x could build a WAR file that I then > customized for my own application, adding various libraries and > configuration files to it. I think I'll follow-up with a separate post > about this. > > - -chris > > -----BEGIN PGP SIGNATURE----- > Comment: GPGTools - http://gpgtools.org > > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQIcBAEBCAAGBQJZNtOBAAoJEBzwKT+lPKRYEuIP/3gSJZDNEbzsHkI5zYjMZbFf > vKvRRnBSl+6IdrcUasftf+AkXIIYwj6xnUQ7winsLW/n8TdDG6jPqsg4Khsozc6z > aa23qDly62gmCsqpLohXxt/ZNKdPY4sOTghaaEUFTtTgpeD3M/INF90myT8SwO4K > WUtqVparSqp/Zf9JMm3OCIguMKbsRNYWVIQuiJxDQJkWYwrw0iVk2v8mc6iz/mDF > w6np4EvFr9fqdDufKpPw8anEkrp5JEuTx47vMOtz4sixVr2C6ehgP4zs3kVzdVid > QPeUsrosV1tsRC9bMVLGmjo7UhNseeXCp/AceIT6AQE8Q1clgy9GcoNMf60dgGku > et0xoGptYgbCfmJL+PuA9y7fJYjgTTQheqzuC721n2/sx+kyBSBWSMIhqia2sd4y > spcT4kw+uChsWjwoeGOHOm4IimrVgXkfJeHVSXV4m66sHS9t+bDiiErwS1SikvSV > qF64/L0u8hYFLD1ehURoHBi4foE1Td3eRGOGHgodcYL9C8U+Yv+fWaiYQ5O4CCnW > pToFvVoQOdZY+VVC8hz1ggbRMSxjT2GQLLJ2mjbGzGUJjlwyQaoZnADSSu0efj88 > O2AlWB2Bf/Ag6E4C9jEjj+cauBfR+1NIK7F1Jo6C02yY1SUOSoOAFDZ7EkO4qYAO > YhvgSQXNmKps6rusNjNZ > =q8Eh > -----END PGP SIGNATURE----- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org > For additional commands, e-mail: users-h...@cocoon.apache.org > >