Re: [PHP] Unicode problems

2008-09-25 Thread Chris

Thiago H. Pojda wrote:

This is slightly OT but I honestly don't know what else I can do.

I was asked to migrate a website from diff hosts. Okay, pretty easy, right?
Well, as usual, it wasn't.

Site pages content type was ISO-8559-1 and it was developed for a MySQL5
database that used latin1 as charset and InnoDB as storage system. Pretty
normal and ran smoothly.

The client database is a old 4.0 MySQL that (I'm not sure if they're just
disabled but it) doesn't have InnoDB and latin1. So I'm stuck with MyISAM
and UTF8. No, they can't change it - their hosting want them to migrate to
MSSQL and they can't switch hosts for whatever reason.


Tried either of these?

http://dev.mysql.com/doc/refman/5.0/en/charset-convert.html
http://forums.mysql.com/read.php?10,52929,56552#msg-56552

--
Postgresql  php tutorials
http://www.designmagick.com/


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode Problem

2006-10-08 Thread Nisse Engström
On Fri, 6 Oct 2006 10:44:55 -0500 (CDT), Richard Lynch wrote:

 I don't think MS Word quotes are Unicode, really...
 
 I think they're just made-up character sets that Microsoft felt like
 using to be incompatible with everybody else...
 
 Though the %u is almost-for-sure and ATTEMPT to apply Unicode
 conversion, that doesn't mean that the original content was really
 Unicode to start with.
 
 So after you undo the Unicode conversion, you've still potentially
 got data on your hands from a proprietary non-standards-based made-up
 software application.
 
 Apologies in advance if MS Word actually *is* using a standard Unicode
 charset... But I sure doubt it.

   I think you're missing the point. MS Word DOES use
proprietary encodings, but when text is copied from
MS Word and pasted into the browser, it involves a
conversion process. E.g., the bullet (0x95 in cp1250)
will be converted to whatever encoding the web page is
in (0x2022 in a Unicode encoding).

   Whether the conversion is performed by the browser,
some OS glue or some other trickery, witchery or devilry,
is at the moment beyond my scant knowledge.

   How to solve the original posters problem is also
beyond me, as I haven't used AJAX. I tend to prefer the
ol' form submission for my bits and bobs. That way I can
use UTF-8 all way around, and everything just magically
works. It even works fine for JavaScript-challenged
browsers, would you believe.


  --nfe

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-07 Thread tedd

At 4:15 PM -0500 10/6/06, Richard Lynch wrote:

Perhaps you would care to extend your browsercam test to some
regression testing of more ancient browsers -- on Mac OS.



The following goes back to IE 5.2 for the Mac -- that's as far back 
as BrowserCam goes.


http://www.browsercam.com/public.aspx?proj_id=289683


I think you meant x2022 a.k.a. (dec)8226 :-)


Ahh, a typo thanks.

tedd
--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread tedd

At 7:11 PM -0700 10/5/06, Robbert van Andel wrote:

I know it's Unicode because the javascript is encoding it as Unicode (and
it's doing so correctly).  I guess the gist of my question is how to do I do
a reverse.  How do I take %u2022 and get make that display as the bullet
character?


Robbert:

To display it in a browser, convert the number to DEC (2022-8226) and use:

#8226;

I thought there was a way to use HEX directly, but can't find the 
reference at the moment (if there is one).


hth's

tedd
--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode Problem

2006-10-06 Thread Richard Lynch
On Thu, October 5, 2006 5:14 pm, [EMAIL PROTECTED] wrote:
 I have a webpage that allows users to post news stories for their
 department.  The site uses AJAX to send the data to the webserver.
 The problem I'm having is when the user uses some unicode characters
 like bullets or MS Word quotes, the page comes out weird.

I don't think MS Word quotes are Unicode, really...

I think they're just made-up character sets that Microsoft felt like
using to be incompatible with everybody else...

There are about 5 different translations functions at
http://php.net/str_replace, last time I checked -- but those assumed
the user was just typing stuff in a FORM, and does not include your JS
escaping...

I *hope* it's the same thing, really, but can't promise.

You're going to have to investigate what the JS escape mechanism is
doing -- It could be any of a variety of things.

Though the %u is almost-for-sure and ATTEMPT to apply Unicode
conversion, that doesn't mean that the original content was really
Unicode to start with.

So after you undo the Unicode conversion, you've still potentially
got data on your hands from a proprietary non-standards-based made-up
software application.

Apologies in advance if MS Word actually *is* using a standard Unicode
charset... But I sure doubt it.

 Here's the process.
 1. The user enters the story and clicks save.
 2. The javascript uses the escape function to turn the text into
 something that can be posted to the server.  This function turns
 spaces into %20, but it turns unicode characters into a longer string
 like %u.
 3. The javascript then sends the data to the processing page.
 4. The PHP processing page receives the data and saves it to the mySQL
 database server.

 The problem I see is that any unicode character is saved in it's
 escaped unicode sequence.  For example a bullet is saved into the
 database as a literal %u2022.  What I need to know is what function
 can I use so that it's either saved as the unicode bullet character or
 displayed back on the page as the bullet?


-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread Richard Lynch
I can take ANY number I want, and put %u in front of it...

That don't make it mean anything.

You also need to know the charset it came from to start with, which in
the case of MS Word, is not even a standard charset, but some made-up
proprietary random assemblege of numbers to characters they found
convenient that day.

You also might want to consider using something like FCKEditor or that
other one like it to let users compose HTML-formatted content.

On Thu, October 5, 2006 9:11 pm, Robbert van Andel wrote:
 I know it's Unicode because the javascript is encoding it as Unicode
 (and
 it's doing so correctly).  I guess the gist of my question is how to
 do I do
 a reverse.  How do I take %u2022 and get make that display as the
 bullet
 character?

 -Original Message-
 From: Dotan Cohen [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 05, 2006 3:44 PM
 To: [EMAIL PROTECTED]
 Cc: php-general@lists.php.net
 Subject: Re: [PHP] Unicode Problem

 On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I have a webpage that allows users to post news stories for their
 department.  The site uses AJAX to send the data to the webserver.
 The
 problem I'm having is when the user uses some unicode characters like
 bullets or MS Word quotes, the page comes out weird.

 Here's the process.
 1. The user enters the story and clicks save.
 2. The javascript uses the escape function to turn the text into
 something
 that can be posted to the server.  This function turns spaces into
 %20, but
 it turns unicode characters into a longer string like %u.
 3. The javascript then sends the data to the processing page.
 4. The PHP processing page receives the data and saves it to the
 mySQL
 database server.

 The problem I see is that any unicode character is saved in it's
 escaped
 unicode sequence.  For example a bullet is saved into the database as
 a
 literal %u2022.  What I need to know is what function can I use so
 that it's
 either saved as the unicode bullet character or displayed back on the
 page
 as the bullet?

 Thank you


 I doubt that MS Word quotes are unicode. And as long as the users are
 coping/ pasting between MS products (Word-IE) you're going to have a
 hard time deciphering those funny characters. Try to encourage them to
 use Firefox, and if possible to use a UTF-8 compliant word processor.
 Mine is Kword, but I don't think that's available for Windows.

 Dotan Cohen
 http://what-is-what.com
 98

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread Richard Lynch
On Fri, October 6, 2006 8:37 am, tedd wrote:
 At 7:11 PM -0700 10/5/06, Robbert van Andel wrote:
I know it's Unicode because the javascript is encoding it as Unicode
 (and
it's doing so correctly).  I guess the gist of my question is how to
 do I do
a reverse.  How do I take %u2022 and get make that display as the
 bullet
character?

 Robbert:

 To display it in a browser, convert the number to DEC (2022-8226) and
 use:

 #8226;

 I thought there was a way to use HEX directly, but can't find the
 reference at the moment (if there is one).

http://php.net/hexdec

But #8226; is almost-for-sure *ONLY* going to look right on MS IE.

Because *only* MS IE uses the double-secret Microsoft decoder ring for
8226 to be what MS Word thinks it is.  Everybody else is using a
standards-based conversion...

So your page will look fine in IE, but everybody else will see all
kinds of goofy characters.

Test it and see -- I could be wrong...

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread tedd

At 10:50 AM -0500 10/6/06, Richard Lynch wrote:

On Fri, October 6, 2006 8:37 am, tedd wrote:
  At 7:11 PM -0700 10/5/06, Robbert van Andel wrote:
  How do I take %u2022 and get make that display as the bullet
 character?
  I thought there was a way to use HEX directly, but can't find the

 reference at the moment (if there is one).


http://php.net/hexdec



Richard:

No, that's not what I meant. I know how to convert DEC - HEX.

What I was talking about is called a NCRs, or Numeric Character References

One could use the Unicode DEC value directly, such as:

#8226;

or the Unicode HEX value directly, such as:

#x2002;

Note, either will produce a bullet in most browsers.


But #8226; is almost-for-sure *ONLY* going to look right on MS IE.


Not true, for most (and all most current) browsers do render that 
glyph correctly (other glyphs may vary), please review:


http://www.browsercam.com/public.aspx?proj_id=289683

The first bullet is 149; (same as ALT 0149 on the windoze keyboard).

The second is #8226; and third is #x2002;

Note all three produce a bullet -- oh and don't forget bull;, which 
will produce the same result.


tedd

--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread Richard Lynch
On Fri, October 6, 2006 12:29 pm, tedd wrote:
 No, that's not what I meant. I know how to convert DEC - HEX.

 What I was talking about is called a NCRs, or Numeric Character
 References

 One could use the Unicode DEC value directly, such as:

  #8226;

 or the Unicode HEX value directly, such as:

  #x2002;
I think you meant x2022 a.k.a. (dec)8226 :-)

8226 or x2022 is the same number, so whatever it is, it should work
the same.

And maybe those will work the same as bull; on all modern browser now.

But that was not my experience in the past.

Perhaps you would care to extend your browsercam test to some
regression testing of more ancient browsers -- on Mac OS.

 Note, either will produce a bullet in most browsers.

But #8226; is almost-for-sure *ONLY* going to look right on MS IE.

 Not true, for most (and all most current) browsers do render that
 glyph correctly (other glyphs may vary), please review:

 http://www.browsercam.com/public.aspx?proj_id=289683

 The first bullet is 149; (same as ALT 0149 on the windoze keyboard).

 The second is #8226; and third is #x2002;

 Note all three produce a bullet -- oh and don't forget bull;, which
 will produce the same result.

 tedd

 --
 ---
 http://sperling.com  http://ancientstones.com  http://earthstones.com

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode Problem

2006-10-05 Thread Dotan Cohen

On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

I have a webpage that allows users to post news stories for their department.  
The site uses AJAX to send the data to the webserver.  The problem I'm having 
is when the user uses some unicode characters like bullets or MS Word quotes, 
the page comes out weird.

Here's the process.
1. The user enters the story and clicks save.
2. The javascript uses the escape function to turn the text into something that 
can be posted to the server.  This function turns spaces into %20, but it turns 
unicode characters into a longer string like %u.
3. The javascript then sends the data to the processing page.
4. The PHP processing page receives the data and saves it to the mySQL database 
server.

The problem I see is that any unicode character is saved in it's escaped 
unicode sequence.  For example a bullet is saved into the database as a literal 
%u2022.  What I need to know is what function can I use so that it's either 
saved as the unicode bullet character or displayed back on the page as the 
bullet?

Thank you



I doubt that MS Word quotes are unicode. And as long as the users are
coping/ pasting between MS products (Word-IE) you're going to have a
hard time deciphering those funny characters. Try to encourage them to
use Firefox, and if possible to use a UTF-8 compliant word processor.
Mine is Kword, but I don't think that's available for Windows.

Dotan Cohen
http://what-is-what.com
98

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-05 Thread Robbert van Andel
I know it's Unicode because the javascript is encoding it as Unicode (and
it's doing so correctly).  I guess the gist of my question is how to do I do
a reverse.  How do I take %u2022 and get make that display as the bullet
character?

-Original Message-
From: Dotan Cohen [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 05, 2006 3:44 PM
To: [EMAIL PROTECTED]
Cc: php-general@lists.php.net
Subject: Re: [PHP] Unicode Problem

On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I have a webpage that allows users to post news stories for their
department.  The site uses AJAX to send the data to the webserver.  The
problem I'm having is when the user uses some unicode characters like
bullets or MS Word quotes, the page comes out weird.

 Here's the process.
 1. The user enters the story and clicks save.
 2. The javascript uses the escape function to turn the text into something
that can be posted to the server.  This function turns spaces into %20, but
it turns unicode characters into a longer string like %u.
 3. The javascript then sends the data to the processing page.
 4. The PHP processing page receives the data and saves it to the mySQL
database server.

 The problem I see is that any unicode character is saved in it's escaped
unicode sequence.  For example a bullet is saved into the database as a
literal %u2022.  What I need to know is what function can I use so that it's
either saved as the unicode bullet character or displayed back on the page
as the bullet?

 Thank you


I doubt that MS Word quotes are unicode. And as long as the users are
coping/ pasting between MS products (Word-IE) you're going to have a
hard time deciphering those funny characters. Try to encourage them to
use Firefox, and if possible to use a UTF-8 compliant word processor.
Mine is Kword, but I don't think that's available for Windows.

Dotan Cohen
http://what-is-what.com
98

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode Problem

2006-10-05 Thread Penthexquadium
On Thu, 05 Oct 2006 18:14:59 -0400, [EMAIL PROTECTED] wrote:

 I have a webpage that allows users to post news stories for their department. 
  The site uses AJAX to send the data to the webserver.  The problem I'm 
 having is when the user uses some unicode characters like bullets or MS Word 
 quotes, the page comes out weird.  
 
 Here's the process.
 1. The user enters the story and clicks save.
 2. The javascript uses the escape function to turn the text into something 
 that can be posted to the server.  This function turns spaces into %20, but 
 it turns unicode characters into a longer string like %u.
 3. The javascript then sends the data to the processing page.
 4. The PHP processing page receives the data and saves it to the mySQL 
 database server.
 
 The problem I see is that any unicode character is saved in it's escaped 
 unicode sequence.  For example a bullet is saved into the database as a 
 literal %u2022.  What I need to know is what function can I use so that it's 
 either saved as the unicode bullet character or displayed back on the page as 
 the bullet?  
 
 Thank you
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php

If you post that data via POST method, use encodeURIComponent() to
encode the string, and then the server-side script can accesses it
directly.

--
Sorry for my poor English.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode

2006-08-13 Thread Ligaya Turmelle

tedd wrote:

At 7:08 PM -0700 6/4/06, Rasmus Lerdorf wrote:


Larry Garfield wrote:


In C or C++, yes.  In PHP, do not assume the same string-number mapping.  
Numeric definition is irrelevant.


Right, and now bring Unicode into the picture and this becomes even more true.



-Rasmus

I know there's always RTFM, but if you would care to discuss it, I would like 
to know why. How does php handle Unicode code-points and char-sets?

Thanks.

tedd
From what little I understand (and I could be wrong - been a while)- it 
doesn't.  PHP  6 works in bytes unless you have the mb extension going 
and then it fakes it.  it doesn't look for code points or charsets.


--

life is a game... so have fun.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Unicode

2006-08-12 Thread Gerry D

Tedd,

Interesting that nobody knows the answer... I am struggling with this
very issue for an international lily register...
http://www.lilyregister.com/

Gerry

On 6/5/06, tedd [EMAIL PROTECTED] wrote:

At 7:08 PM -0700 6/4/06, Rasmus Lerdorf wrote:
Larry Garfield wrote:
In C or C++, yes.  In PHP, do not assume the same string-number mapping.  
Numeric definition is irrelevant.

Right, and now bring Unicode into the picture and this becomes even more true.

-Rasmus

I know there's always RTFM, but if you would care to discuss it, I would like 
to know why. How does php handle Unicode code-points and char-sets?


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] unicode

2006-04-14 Thread Jochem Maas

go away and RTFM, STFW, anything but ask another question here until you can 
show
even the slightest inclination to do you're own research and that you'll bother
to response to people when then do actually give answers (like maybe a thank 
you if someone
does actually help you, for instance).

this is the 26th lame ass question you have posted - not once was there any 
indication
you had even bothered to open a browser to search for possible clues/answers 
and not once
have you ever replied to all the people that tried to help you.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] unicode

2006-04-14 Thread Wolf
And they wonder why labor is so cheap in India and they keep sending
jobs and opening call centers and such over there...  They read scripts
all day, you would think that they would know how to Google.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] unicode

2006-04-14 Thread tedd

At 9:25 AM -0400 4/14/06, Wolf wrote:

And they wonder why labor is so cheap in India and they keep sending
jobs and opening call centers and such over there...  They read scripts
all day, you would think that they would know how to Google.



Maybe we could open a call center here for answers to questions they 
don't have scripts for.


tedd
--

http://sperling.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] unicode

2006-04-14 Thread Wolf
And Maybe get all that government subsidized money for bringing in jobs
to a location that lost them due to a call center closing...  yeah,
that's the ticket!!

tedd wrote:
 At 9:25 AM -0400 4/14/06, Wolf wrote:
 And they wonder why labor is so cheap in India and they keep sending
 jobs and opening call centers and such over there...  They read scripts
 all day, you would think that they would know how to Google.

 
 Maybe we could open a call center here for answers to questions they
 don't have scripts for.
 
 tedd

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode TTF Font wingding's just don't cut it!

2002-02-20 Thread Rasmus Lerdorf

 Does anyone know where I can get a simple symbol font which is .ttf and
 unicode compatible. Seems that my php graphic program is very
 sensitive to ttf problems. It'll take one or two wingdings in
 imagettftext() before it up and dies. Your help will be greatly
 appreciated.

Dies?  How so?  Is it a segfault?  If so, please get me a backtrace.

-Rasmus


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Unicode TTF Font wingding's just don't cut it!

2002-02-19 Thread Neil Freeman

What symbols are you trying to use? Verdana is good enough for most unicode characters.

hugh danaher wrote:

 Help
 Does anyone know where I can get a simple symbol font which is .ttf and unicode 
compatible.  Seems that my php graphic program is very sensitive to ttf problems.  
It'll take one or two wingdings in imagettftext() before it up and dies.
 Your help will be greatly appreciated.
 Hugh

 ***
  This message was virus checked with: SAVI 3.53 Jan 2002
  last updated 30th January 2002
 ***

--

 Email:  [EMAIL PROTECTED]
 [EMAIL PROTECTED]




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Unicode TTF Font wingding's just don't cut it!

2002-02-19 Thread hugh danaher

Thanks Neil,
Verdana is nice, but what I'm looking for would have large circles (filled
or open), squares, diamonds, and other basic geometric shapes.  Nothing
fancy but it does need to be unicode as I'm using it with php's image
functions.
Thanks again,
Hugh
- Original Message -
From: Neil Freeman [EMAIL PROTECTED]
To: hugh danaher [EMAIL PROTECTED]
Cc: php [EMAIL PROTECTED]
Sent: Tuesday, February 19, 2002 1:38 AM
Subject: Re: [PHP] Unicode TTF Font wingding's just don't cut it!


 What symbols are you trying to use? Verdana is good enough for most
unicode characters.

 hugh danaher wrote:

  Help
  Does anyone know where I can get a simple symbol font which is .ttf and
unicode compatible.  Seems that my php graphic program is very sensitive
to ttf problems.  It'll take one or two wingdings in imagettftext() before
it up and dies.
  Your help will be greatly appreciated.
  Hugh
 
  ***
   This message was virus checked with: SAVI 3.53 Jan 2002
   last updated 30th January 2002
  ***

 --
 
  Email:  [EMAIL PROTECTED]
  [EMAIL PROTECTED]
 



 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php