Ar an dara lá de mí Bealtaine, scríobh Joshua Schachter: > Are you submitting non-utf8 stuff to begin with?
This is the standard web interface. No code I’ve written is involved. If Firefox were submitting non-UTF-8 (because it thought the form took Latin 1, for example), the [Φ] would have been encoded as [Φ] [1] --try pasting it (phi) into the form here; http://www.parhasard.net/latin-1-form.php to see that behaviour. The tags from the POST string below: tags=japanese+language+%5B%CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D+from-peter-t-daniels+pasta.cantbedone.org The non-alphanumeric-ASCII tags as hex: %5B%CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D Some Emacs Lisp: (decode-coding-string "\x5B\xCE\xA6\x5D+\x5Bh\x5D+\x5B\xC3\xA7\x5D" 'utf-8) => "[Φ]+[h]+[ç]" I see the same behaviour with Firefox 2.0.0.3 on Mac OS X 10.4.9, and with Safari 2.0.4 on the same OS. If you can’t reproduce this, try looking at the same entry from a page you haven’t previously viewed today--the caching is pretty aggressive right now, as far as I can work out. Again, I’ve made a UTF-8 encoded version of this mail available here: http://parhasard.net/[EMAIL PROTECTED] Please look at that if you find anything unclear in this mail. I can guarantee that the file at that address is served as UTF-8, is encoded as UTF-8, and conveys what I intend, but guaranteeing that Yahoo Groups will not make my message contradict itself is beyond my powers. Bye, Aidan [1] That is, in case Yahoo chooses to do its own thing with this mail, left square bracket, ampersand, hash-mark-otherwise-known-as-North-American pound-sign, digit nine, digit three, digit four, semicolon, right square bracket. > > -----Original Message----- > > From: [email protected] > > [mailto:[EMAIL PROTECTED] On Behalf Of aeohek > > Sent: Wednesday, May 02, 2007 4:24 AM > > To: [email protected] > > Subject: [ydn-delicious] Latin 1, and only Latin 1, lost on tag edit > > > > Hi, > > > > When I edit the tags on: > > > > http://del.icio.us/url/4720413157054e2059cf355a64300a3c > > > > (my username is aidan , right now I'm the only person to have > > posted that > > URI), any Latin-1 (excluding ASCII) in a tag means that tag > > is truncated > > from the last non-Latin-1 character onwards. This happens both with an > > Ajax > > edit and with a full-screen edit. > > > > In detail; currently my tags are: > > > > japanese language [Φ] [h] [ç] from-peter-t-daniels > > pasta.cantbedone.org > > > > (you can't see them on the URL page right now, it appears to be > > cached; try > > http://del.icio.us/tag/%5B%CE%A6%D5 if the cache hasn't expired by the > > time > > you read this). Φ is a Greek character, U+30A6, ç is a > > Latin-1 character, > > U+00E7. The rest of the characters are US-ASCII. Browser is Firefox > > 1.5.0.11, platform Windows XP. > > > > I edit those tags--I click on edit, then full-screen edit, and add > > hi-there > > as a tag, such that the displayed text is now: > > > > japanese language [Φ] [h] [ç] from-peter-t-daniels > > pasta.cantbedone.org hi-there > > > > I then click save, and it redirects away from that tag. But when I > > examine > > the URL details again, it no longer has a [ç] tag, but it has a new [ > > tag. It still has the [Φ] tag. The same happens when I > > edit other tags > > containing Latin 1, or when I create new entries with tags containing > > Latin > > 1. > > > > Using the Live HTTP Headers extension, I see that the POST request > > submitted > > was: > > > > POST /aidan/%5B%C3%A7%5D?779822 > > url=http%3A%2F%2Fpasta.cantbedone.org%2Fpages%2FPXXU7p.htm&old > > url=http%3A%2F%2Fpasta.cantbedone.org%2Fpages%2FPXXU7p.htm&des > > cription=%5B%CE%A6%5D%2C+%5Bh%5D+in+Japanese¬es=Cute%3B+Jap > anese+went+through+a+historical+development+similar+to+the+%2Ff%2F&tags=japanese+language+%5B%> > CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D+from-peter-t-daniels+pasta.cantb > > edone.org+&jump=no&date=2007-03-10T14%3A31%3A10Z&key=1540afd50 > > 2e4ce4af3cb2bac8df225d1 > > > > which, when URL-decoded and converted to UTF-8, gives this: > > > > POST /aidan?312757 > > url=http://pasta.cantbedone.org/pages/PXXU7p.htm&oldurl=http:/ > > /pasta.cantbedone.org/pages/PXXU7p.htm&description=[Φ],+[ > > h]+in+Japanese¬es=Cute;+Japanese+went+through+a+historical+ > > development+similar+to+the+/f/&tags=japanese+language+[Φ] > > +[h]+[ç]+from-peter-t-daniels+pasta.cantbedone.org+&jump=no&da > > te=2007-03-10T14:31:10Z&key=1540afd502e4ce4af3cb2bac8df225d > > > > Now, the tags CGI variable is correct there, so this seems to be a > > server-side problem. I can work around it by renaming the tag > > [ to [ç], or > > espa to español. > > > > I've made a UTF-8 encoded version of this email at > > http://www.parhasard.net/del.icio.us-latin-1-problem.txt , since Yahoo > > Groups appears to prefer to treat it as Latin 1. > > > > Best regards, and please tell me if I should report this > > somewhere else. > > > > Aidan -- On the quay of the little Black Sea port, where the rescued pair came once more into contact with civilization, Dobrinton was bitten by a dog which was assumed to be mad, though it may only have been indiscriminating. (Saki)

