Ar an dara lá de mí Bealtaine, scríobh Joshua Schachter: 

 > Are you submitting non-utf8 stuff to begin with?

This is the standard web interface. No code I’ve written is involved. 

If Firefox were submitting non-UTF-8 (because it thought the form took Latin
1, for example), the [Φ] would have been encoded as [Φ] [1] --try
pasting it (phi) into the form here;

  http://www.parhasard.net/latin-1-form.php

to see that behaviour.

The tags from the POST string below: 

  
tags=japanese+language+%5B%CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D+from-peter-t-daniels+pasta.cantbedone.org

The non-alphanumeric-ASCII tags as hex: 

  %5B%CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D

Some Emacs Lisp:

  (decode-coding-string "\x5B\xCE\xA6\x5D+\x5Bh\x5D+\x5B\xC3\xA7\x5D" 'utf-8)
  => "[Φ]+[h]+[ç]"

I see the same behaviour with Firefox 2.0.0.3 on Mac OS X 10.4.9, and with
Safari 2.0.4 on the same OS.

If you can’t reproduce this, try looking at the same entry from a page you
haven’t previously viewed today--the caching is pretty aggressive right now,
as far as I can work out.

Again, I’ve made a UTF-8 encoded version of this mail available here:           

  http://parhasard.net/[EMAIL PROTECTED]

Please look at that if you find anything unclear in this mail. I can
guarantee that the file at that address is served as UTF-8, is encoded as
UTF-8, and conveys what I intend, but guaranteeing that Yahoo Groups will
not make my message contradict itself is beyond my powers.

Bye, 
        Aidan

[1] That is, in case Yahoo chooses to do its own thing with this mail, left
square bracket, ampersand, hash-mark-otherwise-known-as-North-American
pound-sign, digit nine, digit three, digit four, semicolon, right square
bracket. 



 > > -----Original Message-----
 > > From: [email protected] 
 > > [mailto:[EMAIL PROTECTED] On Behalf Of aeohek
 > > Sent: Wednesday, May 02, 2007 4:24 AM
 > > To: [email protected]
 > > Subject: [ydn-delicious] Latin 1, and only Latin 1, lost on tag edit
 > > 
 > > Hi, 
 > > 
 > > When I edit the tags on:
 > > 
 > > http://del.icio.us/url/4720413157054e2059cf355a64300a3c 
 > > 
 > > (my username is aidan , right now I'm the only person to have 
 > > posted that
 > > URI), any Latin-1 (excluding ASCII) in a tag means that tag 
 > > is truncated
 > > from the last non-Latin-1 character onwards. This happens both with an
 > > Ajax
 > > edit and with a full-screen edit.
 > > 
 > > In detail; currently my tags are:
 > > 
 > > japanese language [Φ] [h] [ç] from-peter-t-daniels 
 > > pasta.cantbedone.org 
 > > 
 > > (you can't see them on the URL page right now, it appears to be
 > > cached; try
 > > http://del.icio.us/tag/%5B%CE%A6%D5 if the cache hasn't expired by the
 > > time
 > > you read this). Φ is a Greek character, U+30A6, ç is a 
 > > Latin-1 character,
 > > U+00E7. The rest of the characters are US-ASCII. Browser is Firefox
 > > 1.5.0.11, platform Windows XP.
 > > 
 > > I edit those tags--I click on edit, then full-screen edit, and add
 > > hi-there
 > > as a tag, such that the displayed text is now:
 > > 
 > > japanese language [Φ] [h] [ç] from-peter-t-daniels
 > > pasta.cantbedone.org hi-there
 > > 
 > > I then click save, and it redirects away from that tag. But when I
 > > examine 
 > > the URL details again, it no longer has a [ç] tag, but it has a new [ 
 > > tag. It still has the [Φ] tag. The same happens when I 
 > > edit other tags 
 > > containing Latin 1, or when I create new entries with tags containing
 > > Latin 
 > > 1. 
 > > 
 > > Using the Live HTTP Headers extension, I see that the POST request
 > > submitted
 > > was:
 > > 
 > > POST /aidan/%5B%C3%A7%5D?779822
 > > url=http%3A%2F%2Fpasta.cantbedone.org%2Fpages%2FPXXU7p.htm&old
 > > url=http%3A%2F%2Fpasta.cantbedone.org%2Fpages%2FPXXU7p.htm&des
 > > cription=%5B%CE%A6%5D%2C+%5Bh%5D+in+Japanese&notes=Cute%3B+Jap
 > anese+went+through+a+historical+development+similar+to+the+%2Ff%2F&tags=japanese+language+%5B%>
 >  CE%A6%5D+%5Bh%5D+%5B%C3%A7%5D+from-peter-t-daniels+pasta.cantb
 > > edone.org+&jump=no&date=2007-03-10T14%3A31%3A10Z&key=1540afd50
 > > 2e4ce4af3cb2bac8df225d1
 > > 
 > > which, when URL-decoded and converted to UTF-8, gives this: 
 > > 
 > > POST /aidan?312757
 > > url=http://pasta.cantbedone.org/pages/PXXU7p.htm&oldurl=http:/
 > > /pasta.cantbedone.org/pages/PXXU7p.htm&description=[Φ],+[
 > > h]+in+Japanese&notes=Cute;+Japanese+went+through+a+historical+
 > > development+similar+to+the+/f/&tags=japanese+language+[Φ]
 > > +[h]+[ç]+from-peter-t-daniels+pasta.cantbedone.org+&jump=no&da
 > > te=2007-03-10T14:31:10Z&key=1540afd502e4ce4af3cb2bac8df225d
 > > 
 > > Now, the tags CGI variable is correct there, so this seems to be a
 > > server-side problem. I can work around it by renaming the tag 
 > > [ to [ç], or
 > > espa to español. 
 > > 
 > > I've made a UTF-8 encoded version of this email at
 > > http://www.parhasard.net/del.icio.us-latin-1-problem.txt , since Yahoo
 > > Groups appears to prefer to treat it as Latin 1. 
 > > 
 > > Best regards, and please tell me if I should report this 
 > > somewhere else. 
 > > 
 > >         Aidan

-- 
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)

Reply via email to