Re: [PHP] Unicode Problem
On Fri, 6 Oct 2006 10:44:55 -0500 (CDT), Richard Lynch wrote: I don't think MS Word quotes are Unicode, really... I think they're just made-up character sets that Microsoft felt like using to be incompatible with everybody else... Though the %u is almost-for-sure and ATTEMPT to apply Unicode conversion, that doesn't mean that the original content was really Unicode to start with. So after you undo the Unicode conversion, you've still potentially got data on your hands from a proprietary non-standards-based made-up software application. Apologies in advance if MS Word actually *is* using a standard Unicode charset... But I sure doubt it. I think you're missing the point. MS Word DOES use proprietary encodings, but when text is copied from MS Word and pasted into the browser, it involves a conversion process. E.g., the bullet (0x95 in cp1250) will be converted to whatever encoding the web page is in (0x2022 in a Unicode encoding). Whether the conversion is performed by the browser, some OS glue or some other trickery, witchery or devilry, is at the moment beyond my scant knowledge. How to solve the original posters problem is also beyond me, as I haven't used AJAX. I tend to prefer the ol' form submission for my bits and bobs. That way I can use UTF-8 all way around, and everything just magically works. It even works fine for JavaScript-challenged browsers, would you believe. --nfe -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
At 4:15 PM -0500 10/6/06, Richard Lynch wrote: Perhaps you would care to extend your browsercam test to some regression testing of more ancient browsers -- on Mac OS. The following goes back to IE 5.2 for the Mac -- that's as far back as BrowserCam goes. http://www.browsercam.com/public.aspx?proj_id=289683 I think you meant x2022 a.k.a. (dec)8226 :-) Ahh, a typo thanks. tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
At 7:11 PM -0700 10/5/06, Robbert van Andel wrote: I know it's Unicode because the javascript is encoding it as Unicode (and it's doing so correctly). I guess the gist of my question is how to do I do a reverse. How do I take %u2022 and get make that display as the bullet character? Robbert: To display it in a browser, convert the number to DEC (2022-8226) and use: #8226; I thought there was a way to use HEX directly, but can't find the reference at the moment (if there is one). hth's tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode Problem
On Thu, October 5, 2006 5:14 pm, [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. I don't think MS Word quotes are Unicode, really... I think they're just made-up character sets that Microsoft felt like using to be incompatible with everybody else... There are about 5 different translations functions at http://php.net/str_replace, last time I checked -- but those assumed the user was just typing stuff in a FORM, and does not include your JS escaping... I *hope* it's the same thing, really, but can't promise. You're going to have to investigate what the JS escape mechanism is doing -- It could be any of a variety of things. Though the %u is almost-for-sure and ATTEMPT to apply Unicode conversion, that doesn't mean that the original content was really Unicode to start with. So after you undo the Unicode conversion, you've still potentially got data on your hands from a proprietary non-standards-based made-up software application. Apologies in advance if MS Word actually *is* using a standard Unicode charset... But I sure doubt it. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? -- Some people have a gift link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
I can take ANY number I want, and put %u in front of it... That don't make it mean anything. You also need to know the charset it came from to start with, which in the case of MS Word, is not even a standard charset, but some made-up proprietary random assemblege of numbers to characters they found convenient that day. You also might want to consider using something like FCKEditor or that other one like it to let users compose HTML-formatted content. On Thu, October 5, 2006 9:11 pm, Robbert van Andel wrote: I know it's Unicode because the javascript is encoding it as Unicode (and it's doing so correctly). I guess the gist of my question is how to do I do a reverse. How do I take %u2022 and get make that display as the bullet character? -Original Message- From: Dotan Cohen [mailto:[EMAIL PROTECTED] Sent: Thursday, October 05, 2006 3:44 PM To: [EMAIL PROTECTED] Cc: php-general@lists.php.net Subject: Re: [PHP] Unicode Problem On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? Thank you I doubt that MS Word quotes are unicode. And as long as the users are coping/ pasting between MS products (Word-IE) you're going to have a hard time deciphering those funny characters. Try to encourage them to use Firefox, and if possible to use a UTF-8 compliant word processor. Mine is Kword, but I don't think that's available for Windows. Dotan Cohen http://what-is-what.com 98 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Some people have a gift link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
On Fri, October 6, 2006 8:37 am, tedd wrote: At 7:11 PM -0700 10/5/06, Robbert van Andel wrote: I know it's Unicode because the javascript is encoding it as Unicode (and it's doing so correctly). I guess the gist of my question is how to do I do a reverse. How do I take %u2022 and get make that display as the bullet character? Robbert: To display it in a browser, convert the number to DEC (2022-8226) and use: #8226; I thought there was a way to use HEX directly, but can't find the reference at the moment (if there is one). http://php.net/hexdec But #8226; is almost-for-sure *ONLY* going to look right on MS IE. Because *only* MS IE uses the double-secret Microsoft decoder ring for 8226 to be what MS Word thinks it is. Everybody else is using a standards-based conversion... So your page will look fine in IE, but everybody else will see all kinds of goofy characters. Test it and see -- I could be wrong... -- Some people have a gift link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
At 10:50 AM -0500 10/6/06, Richard Lynch wrote: On Fri, October 6, 2006 8:37 am, tedd wrote: At 7:11 PM -0700 10/5/06, Robbert van Andel wrote: How do I take %u2022 and get make that display as the bullet character? I thought there was a way to use HEX directly, but can't find the reference at the moment (if there is one). http://php.net/hexdec Richard: No, that's not what I meant. I know how to convert DEC - HEX. What I was talking about is called a NCRs, or Numeric Character References One could use the Unicode DEC value directly, such as: #8226; or the Unicode HEX value directly, such as: #x2002; Note, either will produce a bullet in most browsers. But #8226; is almost-for-sure *ONLY* going to look right on MS IE. Not true, for most (and all most current) browsers do render that glyph correctly (other glyphs may vary), please review: http://www.browsercam.com/public.aspx?proj_id=289683 The first bullet is 149; (same as ALT 0149 on the windoze keyboard). The second is #8226; and third is #x2002; Note all three produce a bullet -- oh and don't forget bull;, which will produce the same result. tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
On Fri, October 6, 2006 12:29 pm, tedd wrote: No, that's not what I meant. I know how to convert DEC - HEX. What I was talking about is called a NCRs, or Numeric Character References One could use the Unicode DEC value directly, such as: #8226; or the Unicode HEX value directly, such as: #x2002; I think you meant x2022 a.k.a. (dec)8226 :-) 8226 or x2022 is the same number, so whatever it is, it should work the same. And maybe those will work the same as bull; on all modern browser now. But that was not my experience in the past. Perhaps you would care to extend your browsercam test to some regression testing of more ancient browsers -- on Mac OS. Note, either will produce a bullet in most browsers. But #8226; is almost-for-sure *ONLY* going to look right on MS IE. Not true, for most (and all most current) browsers do render that glyph correctly (other glyphs may vary), please review: http://www.browsercam.com/public.aspx?proj_id=289683 The first bullet is 149; (same as ALT 0149 on the windoze keyboard). The second is #8226; and third is #x2002; Note all three produce a bullet -- oh and don't forget bull;, which will produce the same result. tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Some people have a gift link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode Problem
On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? Thank you I doubt that MS Word quotes are unicode. And as long as the users are coping/ pasting between MS products (Word-IE) you're going to have a hard time deciphering those funny characters. Try to encourage them to use Firefox, and if possible to use a UTF-8 compliant word processor. Mine is Kword, but I don't think that's available for Windows. Dotan Cohen http://what-is-what.com 98 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
I know it's Unicode because the javascript is encoding it as Unicode (and it's doing so correctly). I guess the gist of my question is how to do I do a reverse. How do I take %u2022 and get make that display as the bullet character? -Original Message- From: Dotan Cohen [mailto:[EMAIL PROTECTED] Sent: Thursday, October 05, 2006 3:44 PM To: [EMAIL PROTECTED] Cc: php-general@lists.php.net Subject: Re: [PHP] Unicode Problem On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? Thank you I doubt that MS Word quotes are unicode. And as long as the users are coping/ pasting between MS products (Word-IE) you're going to have a hard time deciphering those funny characters. Try to encourage them to use Firefox, and if possible to use a UTF-8 compliant word processor. Mine is Kword, but I don't think that's available for Windows. Dotan Cohen http://what-is-what.com 98 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode Problem
On Thu, 05 Oct 2006 18:14:59 -0400, [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? Thank you -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php If you post that data via POST method, use encodeURIComponent() to encode the string, and then the server-side script can accesses it directly. -- Sorry for my poor English. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php