Re: [PHP] Unicode problems
Thiago H. Pojda wrote: This is slightly OT but I honestly don't know what else I can do. I was asked to migrate a website from diff hosts. Okay, pretty easy, right? Well, as usual, it wasn't. Site pages content type was ISO-8559-1 and it was developed for a MySQL5 database that used latin1 as charset and InnoDB as storage system. Pretty normal and ran smoothly. The client database is a old 4.0 MySQL that (I'm not sure if they're just disabled but it) doesn't have InnoDB and latin1. So I'm stuck with MyISAM and UTF8. No, they can't change it - their hosting want them to migrate to MSSQL and they can't switch hosts for whatever reason. Tried either of these? http://dev.mysql.com/doc/refman/5.0/en/charset-convert.html http://forums.mysql.com/read.php?10,52929,56552#msg-56552 -- Postgresql php tutorials http://www.designmagick.com/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode Problem
On Fri, 6 Oct 2006 10:44:55 -0500 (CDT), Richard Lynch wrote: I don't think MS Word quotes are Unicode, really... I think they're just made-up character sets that Microsoft felt like using to be incompatible with everybody else... Though the %u is almost-for-sure and ATTEMPT to apply Unicode conversion, that doesn't mean that the original content was really Unicode to start with. So after you undo the Unicode conversion, you've still potentially got data on your hands from a proprietary non-standards-based made-up software application. Apologies in advance if MS Word actually *is* using a standard Unicode charset... But I sure doubt it. I think you're missing the point. MS Word DOES use proprietary encodings, but when text is copied from MS Word and pasted into the browser, it involves a conversion process. E.g., the bullet (0x95 in cp1250) will be converted to whatever encoding the web page is in (0x2022 in a Unicode encoding). Whether the conversion is performed by the browser, some OS glue or some other trickery, witchery or devilry, is at the moment beyond my scant knowledge. How to solve the original posters problem is also beyond me, as I haven't used AJAX. I tend to prefer the ol' form submission for my bits and bobs. That way I can use UTF-8 all way around, and everything just magically works. It even works fine for JavaScript-challenged browsers, would you believe. --nfe -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
At 4:15 PM -0500 10/6/06, Richard Lynch wrote: Perhaps you would care to extend your browsercam test to some regression testing of more ancient browsers -- on Mac OS. The following goes back to IE 5.2 for the Mac -- that's as far back as BrowserCam goes. http://www.browsercam.com/public.aspx?proj_id=289683 I think you meant x2022 a.k.a. (dec)8226 :-) Ahh, a typo thanks. tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
At 7:11 PM -0700 10/5/06, Robbert van Andel wrote: I know it's Unicode because the javascript is encoding it as Unicode (and it's doing so correctly). I guess the gist of my question is how to do I do a reverse. How do I take %u2022 and get make that display as the bullet character? Robbert: To display it in a browser, convert the number to DEC (2022-8226) and use: #8226; I thought there was a way to use HEX directly, but can't find the reference at the moment (if there is one). hth's tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode Problem
On Thu, October 5, 2006 5:14 pm, [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. I don't think MS Word quotes are Unicode, really... I think they're just made-up character sets that Microsoft felt like using to be incompatible with everybody else... There are about 5 different translations functions at http://php.net/str_replace, last time I checked -- but those assumed the user was just typing stuff in a FORM, and does not include your JS escaping... I *hope* it's the same thing, really, but can't promise. You're going to have to investigate what the JS escape mechanism is doing -- It could be any of a variety of things. Though the %u is almost-for-sure and ATTEMPT to apply Unicode conversion, that doesn't mean that the original content was really Unicode to start with. So after you undo the Unicode conversion, you've still potentially got data on your hands from a proprietary non-standards-based made-up software application. Apologies in advance if MS Word actually *is* using a standard Unicode charset... But I sure doubt it. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? -- Some people have a gift link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
I can take ANY number I want, and put %u in front of it... That don't make it mean anything. You also need to know the charset it came from to start with, which in the case of MS Word, is not even a standard charset, but some made-up proprietary random assemblege of numbers to characters they found convenient that day. You also might want to consider using something like FCKEditor or that other one like it to let users compose HTML-formatted content. On Thu, October 5, 2006 9:11 pm, Robbert van Andel wrote: I know it's Unicode because the javascript is encoding it as Unicode (and it's doing so correctly). I guess the gist of my question is how to do I do a reverse. How do I take %u2022 and get make that display as the bullet character? -Original Message- From: Dotan Cohen [mailto:[EMAIL PROTECTED] Sent: Thursday, October 05, 2006 3:44 PM To: [EMAIL PROTECTED] Cc: php-general@lists.php.net Subject: Re: [PHP] Unicode Problem On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? Thank you I doubt that MS Word quotes are unicode. And as long as the users are coping/ pasting between MS products (Word-IE) you're going to have a hard time deciphering those funny characters. Try to encourage them to use Firefox, and if possible to use a UTF-8 compliant word processor. Mine is Kword, but I don't think that's available for Windows. Dotan Cohen http://what-is-what.com 98 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Some people have a gift link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
On Fri, October 6, 2006 8:37 am, tedd wrote: At 7:11 PM -0700 10/5/06, Robbert van Andel wrote: I know it's Unicode because the javascript is encoding it as Unicode (and it's doing so correctly). I guess the gist of my question is how to do I do a reverse. How do I take %u2022 and get make that display as the bullet character? Robbert: To display it in a browser, convert the number to DEC (2022-8226) and use: #8226; I thought there was a way to use HEX directly, but can't find the reference at the moment (if there is one). http://php.net/hexdec But #8226; is almost-for-sure *ONLY* going to look right on MS IE. Because *only* MS IE uses the double-secret Microsoft decoder ring for 8226 to be what MS Word thinks it is. Everybody else is using a standards-based conversion... So your page will look fine in IE, but everybody else will see all kinds of goofy characters. Test it and see -- I could be wrong... -- Some people have a gift link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
At 10:50 AM -0500 10/6/06, Richard Lynch wrote: On Fri, October 6, 2006 8:37 am, tedd wrote: At 7:11 PM -0700 10/5/06, Robbert van Andel wrote: How do I take %u2022 and get make that display as the bullet character? I thought there was a way to use HEX directly, but can't find the reference at the moment (if there is one). http://php.net/hexdec Richard: No, that's not what I meant. I know how to convert DEC - HEX. What I was talking about is called a NCRs, or Numeric Character References One could use the Unicode DEC value directly, such as: #8226; or the Unicode HEX value directly, such as: #x2002; Note, either will produce a bullet in most browsers. But #8226; is almost-for-sure *ONLY* going to look right on MS IE. Not true, for most (and all most current) browsers do render that glyph correctly (other glyphs may vary), please review: http://www.browsercam.com/public.aspx?proj_id=289683 The first bullet is 149; (same as ALT 0149 on the windoze keyboard). The second is #8226; and third is #x2002; Note all three produce a bullet -- oh and don't forget bull;, which will produce the same result. tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
On Fri, October 6, 2006 12:29 pm, tedd wrote: No, that's not what I meant. I know how to convert DEC - HEX. What I was talking about is called a NCRs, or Numeric Character References One could use the Unicode DEC value directly, such as: #8226; or the Unicode HEX value directly, such as: #x2002; I think you meant x2022 a.k.a. (dec)8226 :-) 8226 or x2022 is the same number, so whatever it is, it should work the same. And maybe those will work the same as bull; on all modern browser now. But that was not my experience in the past. Perhaps you would care to extend your browsercam test to some regression testing of more ancient browsers -- on Mac OS. Note, either will produce a bullet in most browsers. But #8226; is almost-for-sure *ONLY* going to look right on MS IE. Not true, for most (and all most current) browsers do render that glyph correctly (other glyphs may vary), please review: http://www.browsercam.com/public.aspx?proj_id=289683 The first bullet is 149; (same as ALT 0149 on the windoze keyboard). The second is #8226; and third is #x2002; Note all three produce a bullet -- oh and don't forget bull;, which will produce the same result. tedd -- --- http://sperling.com http://ancientstones.com http://earthstones.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Some people have a gift link here. Know what I want? I want you to buy a CD from some starving artist. http://cdbaby.com/browse/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode Problem
On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? Thank you I doubt that MS Word quotes are unicode. And as long as the users are coping/ pasting between MS products (Word-IE) you're going to have a hard time deciphering those funny characters. Try to encourage them to use Firefox, and if possible to use a UTF-8 compliant word processor. Mine is Kword, but I don't think that's available for Windows. Dotan Cohen http://what-is-what.com 98 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Unicode Problem
I know it's Unicode because the javascript is encoding it as Unicode (and it's doing so correctly). I guess the gist of my question is how to do I do a reverse. How do I take %u2022 and get make that display as the bullet character? -Original Message- From: Dotan Cohen [mailto:[EMAIL PROTECTED] Sent: Thursday, October 05, 2006 3:44 PM To: [EMAIL PROTECTED] Cc: php-general@lists.php.net Subject: Re: [PHP] Unicode Problem On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? Thank you I doubt that MS Word quotes are unicode. And as long as the users are coping/ pasting between MS products (Word-IE) you're going to have a hard time deciphering those funny characters. Try to encourage them to use Firefox, and if possible to use a UTF-8 compliant word processor. Mine is Kword, but I don't think that's available for Windows. Dotan Cohen http://what-is-what.com 98 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode Problem
On Thu, 05 Oct 2006 18:14:59 -0400, [EMAIL PROTECTED] wrote: I have a webpage that allows users to post news stories for their department. The site uses AJAX to send the data to the webserver. The problem I'm having is when the user uses some unicode characters like bullets or MS Word quotes, the page comes out weird. Here's the process. 1. The user enters the story and clicks save. 2. The javascript uses the escape function to turn the text into something that can be posted to the server. This function turns spaces into %20, but it turns unicode characters into a longer string like %u. 3. The javascript then sends the data to the processing page. 4. The PHP processing page receives the data and saves it to the mySQL database server. The problem I see is that any unicode character is saved in it's escaped unicode sequence. For example a bullet is saved into the database as a literal %u2022. What I need to know is what function can I use so that it's either saved as the unicode bullet character or displayed back on the page as the bullet? Thank you -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php If you post that data via POST method, use encodeURIComponent() to encode the string, and then the server-side script can accesses it directly. -- Sorry for my poor English. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode
tedd wrote: At 7:08 PM -0700 6/4/06, Rasmus Lerdorf wrote: Larry Garfield wrote: In C or C++, yes. In PHP, do not assume the same string-number mapping. Numeric definition is irrelevant. Right, and now bring Unicode into the picture and this becomes even more true. -Rasmus I know there's always RTFM, but if you would care to discuss it, I would like to know why. How does php handle Unicode code-points and char-sets? Thanks. tedd From what little I understand (and I could be wrong - been a while)- it doesn't. PHP 6 works in bytes unless you have the mb extension going and then it fakes it. it doesn't look for code points or charsets. -- life is a game... so have fun. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode
Tedd, Interesting that nobody knows the answer... I am struggling with this very issue for an international lily register... http://www.lilyregister.com/ Gerry On 6/5/06, tedd [EMAIL PROTECTED] wrote: At 7:08 PM -0700 6/4/06, Rasmus Lerdorf wrote: Larry Garfield wrote: In C or C++, yes. In PHP, do not assume the same string-number mapping. Numeric definition is irrelevant. Right, and now bring Unicode into the picture and this becomes even more true. -Rasmus I know there's always RTFM, but if you would care to discuss it, I would like to know why. How does php handle Unicode code-points and char-sets? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] unicode
go away and RTFM, STFW, anything but ask another question here until you can show even the slightest inclination to do you're own research and that you'll bother to response to people when then do actually give answers (like maybe a thank you if someone does actually help you, for instance). this is the 26th lame ass question you have posted - not once was there any indication you had even bothered to open a browser to search for possible clues/answers and not once have you ever replied to all the people that tried to help you. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] unicode
And they wonder why labor is so cheap in India and they keep sending jobs and opening call centers and such over there... They read scripts all day, you would think that they would know how to Google. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] unicode
At 9:25 AM -0400 4/14/06, Wolf wrote: And they wonder why labor is so cheap in India and they keep sending jobs and opening call centers and such over there... They read scripts all day, you would think that they would know how to Google. Maybe we could open a call center here for answers to questions they don't have scripts for. tedd -- http://sperling.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] unicode
And Maybe get all that government subsidized money for bringing in jobs to a location that lost them due to a call center closing... yeah, that's the ticket!! tedd wrote: At 9:25 AM -0400 4/14/06, Wolf wrote: And they wonder why labor is so cheap in India and they keep sending jobs and opening call centers and such over there... They read scripts all day, you would think that they would know how to Google. Maybe we could open a call center here for answers to questions they don't have scripts for. tedd -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode TTF Font wingding's just don't cut it!
Does anyone know where I can get a simple symbol font which is .ttf and unicode compatible. Seems that my php graphic program is very sensitive to ttf problems. It'll take one or two wingdings in imagettftext() before it up and dies. Your help will be greatly appreciated. Dies? How so? Is it a segfault? If so, please get me a backtrace. -Rasmus -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode TTF Font wingding's just don't cut it!
What symbols are you trying to use? Verdana is good enough for most unicode characters. hugh danaher wrote: Help Does anyone know where I can get a simple symbol font which is .ttf and unicode compatible. Seems that my php graphic program is very sensitive to ttf problems. It'll take one or two wingdings in imagettftext() before it up and dies. Your help will be greatly appreciated. Hugh *** This message was virus checked with: SAVI 3.53 Jan 2002 last updated 30th January 2002 *** -- Email: [EMAIL PROTECTED] [EMAIL PROTECTED] -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Unicode TTF Font wingding's just don't cut it!
Thanks Neil, Verdana is nice, but what I'm looking for would have large circles (filled or open), squares, diamonds, and other basic geometric shapes. Nothing fancy but it does need to be unicode as I'm using it with php's image functions. Thanks again, Hugh - Original Message - From: Neil Freeman [EMAIL PROTECTED] To: hugh danaher [EMAIL PROTECTED] Cc: php [EMAIL PROTECTED] Sent: Tuesday, February 19, 2002 1:38 AM Subject: Re: [PHP] Unicode TTF Font wingding's just don't cut it! What symbols are you trying to use? Verdana is good enough for most unicode characters. hugh danaher wrote: Help Does anyone know where I can get a simple symbol font which is .ttf and unicode compatible. Seems that my php graphic program is very sensitive to ttf problems. It'll take one or two wingdings in imagettftext() before it up and dies. Your help will be greatly appreciated. Hugh *** This message was virus checked with: SAVI 3.53 Jan 2002 last updated 30th January 2002 *** -- Email: [EMAIL PROTECTED] [EMAIL PROTECTED] -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php