Re: [PHP] Unicode Problem

2006-10-08 Thread Nisse Engström
On Fri, 6 Oct 2006 10:44:55 -0500 (CDT), Richard Lynch wrote:

 I don't think MS Word quotes are Unicode, really...
 
 I think they're just made-up character sets that Microsoft felt like
 using to be incompatible with everybody else...
 
 Though the %u is almost-for-sure and ATTEMPT to apply Unicode
 conversion, that doesn't mean that the original content was really
 Unicode to start with.
 
 So after you undo the Unicode conversion, you've still potentially
 got data on your hands from a proprietary non-standards-based made-up
 software application.
 
 Apologies in advance if MS Word actually *is* using a standard Unicode
 charset... But I sure doubt it.

   I think you're missing the point. MS Word DOES use
proprietary encodings, but when text is copied from
MS Word and pasted into the browser, it involves a
conversion process. E.g., the bullet (0x95 in cp1250)
will be converted to whatever encoding the web page is
in (0x2022 in a Unicode encoding).

   Whether the conversion is performed by the browser,
some OS glue or some other trickery, witchery or devilry,
is at the moment beyond my scant knowledge.

   How to solve the original posters problem is also
beyond me, as I haven't used AJAX. I tend to prefer the
ol' form submission for my bits and bobs. That way I can
use UTF-8 all way around, and everything just magically
works. It even works fine for JavaScript-challenged
browsers, would you believe.


  --nfe

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-07 Thread tedd

At 4:15 PM -0500 10/6/06, Richard Lynch wrote:

Perhaps you would care to extend your browsercam test to some
regression testing of more ancient browsers -- on Mac OS.



The following goes back to IE 5.2 for the Mac -- that's as far back 
as BrowserCam goes.


http://www.browsercam.com/public.aspx?proj_id=289683


I think you meant x2022 a.k.a. (dec)8226 :-)


Ahh, a typo thanks.

tedd
--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread tedd

At 7:11 PM -0700 10/5/06, Robbert van Andel wrote:

I know it's Unicode because the javascript is encoding it as Unicode (and
it's doing so correctly).  I guess the gist of my question is how to do I do
a reverse.  How do I take %u2022 and get make that display as the bullet
character?


Robbert:

To display it in a browser, convert the number to DEC (2022-8226) and use:

#8226;

I thought there was a way to use HEX directly, but can't find the 
reference at the moment (if there is one).


hth's

tedd
--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode Problem

2006-10-06 Thread Richard Lynch
On Thu, October 5, 2006 5:14 pm, [EMAIL PROTECTED] wrote:
 I have a webpage that allows users to post news stories for their
 department.  The site uses AJAX to send the data to the webserver.
 The problem I'm having is when the user uses some unicode characters
 like bullets or MS Word quotes, the page comes out weird.

I don't think MS Word quotes are Unicode, really...

I think they're just made-up character sets that Microsoft felt like
using to be incompatible with everybody else...

There are about 5 different translations functions at
http://php.net/str_replace, last time I checked -- but those assumed
the user was just typing stuff in a FORM, and does not include your JS
escaping...

I *hope* it's the same thing, really, but can't promise.

You're going to have to investigate what the JS escape mechanism is
doing -- It could be any of a variety of things.

Though the %u is almost-for-sure and ATTEMPT to apply Unicode
conversion, that doesn't mean that the original content was really
Unicode to start with.

So after you undo the Unicode conversion, you've still potentially
got data on your hands from a proprietary non-standards-based made-up
software application.

Apologies in advance if MS Word actually *is* using a standard Unicode
charset... But I sure doubt it.

 Here's the process.
 1. The user enters the story and clicks save.
 2. The javascript uses the escape function to turn the text into
 something that can be posted to the server.  This function turns
 spaces into %20, but it turns unicode characters into a longer string
 like %u.
 3. The javascript then sends the data to the processing page.
 4. The PHP processing page receives the data and saves it to the mySQL
 database server.

 The problem I see is that any unicode character is saved in it's
 escaped unicode sequence.  For example a bullet is saved into the
 database as a literal %u2022.  What I need to know is what function
 can I use so that it's either saved as the unicode bullet character or
 displayed back on the page as the bullet?


-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread Richard Lynch
I can take ANY number I want, and put %u in front of it...

That don't make it mean anything.

You also need to know the charset it came from to start with, which in
the case of MS Word, is not even a standard charset, but some made-up
proprietary random assemblege of numbers to characters they found
convenient that day.

You also might want to consider using something like FCKEditor or that
other one like it to let users compose HTML-formatted content.

On Thu, October 5, 2006 9:11 pm, Robbert van Andel wrote:
 I know it's Unicode because the javascript is encoding it as Unicode
 (and
 it's doing so correctly).  I guess the gist of my question is how to
 do I do
 a reverse.  How do I take %u2022 and get make that display as the
 bullet
 character?

 -Original Message-
 From: Dotan Cohen [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 05, 2006 3:44 PM
 To: [EMAIL PROTECTED]
 Cc: php-general@lists.php.net
 Subject: Re: [PHP] Unicode Problem

 On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I have a webpage that allows users to post news stories for their
 department.  The site uses AJAX to send the data to the webserver.
 The
 problem I'm having is when the user uses some unicode characters like
 bullets or MS Word quotes, the page comes out weird.

 Here's the process.
 1. The user enters the story and clicks save.
 2. The javascript uses the escape function to turn the text into
 something
 that can be posted to the server.  This function turns spaces into
 %20, but
 it turns unicode characters into a longer string like %u.
 3. The javascript then sends the data to the processing page.
 4. The PHP processing page receives the data and saves it to the
 mySQL
 database server.

 The problem I see is that any unicode character is saved in it's
 escaped
 unicode sequence.  For example a bullet is saved into the database as
 a
 literal %u2022.  What I need to know is what function can I use so
 that it's
 either saved as the unicode bullet character or displayed back on the
 page
 as the bullet?

 Thank you


 I doubt that MS Word quotes are unicode. And as long as the users are
 coping/ pasting between MS products (Word-IE) you're going to have a
 hard time deciphering those funny characters. Try to encourage them to
 use Firefox, and if possible to use a UTF-8 compliant word processor.
 Mine is Kword, but I don't think that's available for Windows.

 Dotan Cohen
 http://what-is-what.com
 98

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread Richard Lynch
On Fri, October 6, 2006 8:37 am, tedd wrote:
 At 7:11 PM -0700 10/5/06, Robbert van Andel wrote:
I know it's Unicode because the javascript is encoding it as Unicode
 (and
it's doing so correctly).  I guess the gist of my question is how to
 do I do
a reverse.  How do I take %u2022 and get make that display as the
 bullet
character?

 Robbert:

 To display it in a browser, convert the number to DEC (2022-8226) and
 use:

 #8226;

 I thought there was a way to use HEX directly, but can't find the
 reference at the moment (if there is one).

http://php.net/hexdec

But #8226; is almost-for-sure *ONLY* going to look right on MS IE.

Because *only* MS IE uses the double-secret Microsoft decoder ring for
8226 to be what MS Word thinks it is.  Everybody else is using a
standards-based conversion...

So your page will look fine in IE, but everybody else will see all
kinds of goofy characters.

Test it and see -- I could be wrong...

-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread tedd

At 10:50 AM -0500 10/6/06, Richard Lynch wrote:

On Fri, October 6, 2006 8:37 am, tedd wrote:
  At 7:11 PM -0700 10/5/06, Robbert van Andel wrote:
  How do I take %u2022 and get make that display as the bullet
 character?
  I thought there was a way to use HEX directly, but can't find the

 reference at the moment (if there is one).


http://php.net/hexdec



Richard:

No, that's not what I meant. I know how to convert DEC - HEX.

What I was talking about is called a NCRs, or Numeric Character References

One could use the Unicode DEC value directly, such as:

#8226;

or the Unicode HEX value directly, such as:

#x2002;

Note, either will produce a bullet in most browsers.


But #8226; is almost-for-sure *ONLY* going to look right on MS IE.


Not true, for most (and all most current) browsers do render that 
glyph correctly (other glyphs may vary), please review:


http://www.browsercam.com/public.aspx?proj_id=289683

The first bullet is 149; (same as ALT 0149 on the windoze keyboard).

The second is #8226; and third is #x2002;

Note all three produce a bullet -- oh and don't forget bull;, which 
will produce the same result.


tedd

--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-06 Thread Richard Lynch
On Fri, October 6, 2006 12:29 pm, tedd wrote:
 No, that's not what I meant. I know how to convert DEC - HEX.

 What I was talking about is called a NCRs, or Numeric Character
 References

 One could use the Unicode DEC value directly, such as:

  #8226;

 or the Unicode HEX value directly, such as:

  #x2002;
I think you meant x2022 a.k.a. (dec)8226 :-)

8226 or x2022 is the same number, so whatever it is, it should work
the same.

And maybe those will work the same as bull; on all modern browser now.

But that was not my experience in the past.

Perhaps you would care to extend your browsercam test to some
regression testing of more ancient browsers -- on Mac OS.

 Note, either will produce a bullet in most browsers.

But #8226; is almost-for-sure *ONLY* going to look right on MS IE.

 Not true, for most (and all most current) browsers do render that
 glyph correctly (other glyphs may vary), please review:

 http://www.browsercam.com/public.aspx?proj_id=289683

 The first bullet is 149; (same as ALT 0149 on the windoze keyboard).

 The second is #8226; and third is #x2002;

 Note all three produce a bullet -- oh and don't forget bull;, which
 will produce the same result.

 tedd

 --
 ---
 http://sperling.com  http://ancientstones.com  http://earthstones.com

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Some people have a gift link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode Problem

2006-10-05 Thread Dotan Cohen

On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

I have a webpage that allows users to post news stories for their department.  
The site uses AJAX to send the data to the webserver.  The problem I'm having 
is when the user uses some unicode characters like bullets or MS Word quotes, 
the page comes out weird.

Here's the process.
1. The user enters the story and clicks save.
2. The javascript uses the escape function to turn the text into something that 
can be posted to the server.  This function turns spaces into %20, but it turns 
unicode characters into a longer string like %u.
3. The javascript then sends the data to the processing page.
4. The PHP processing page receives the data and saves it to the mySQL database 
server.

The problem I see is that any unicode character is saved in it's escaped 
unicode sequence.  For example a bullet is saved into the database as a literal 
%u2022.  What I need to know is what function can I use so that it's either 
saved as the unicode bullet character or displayed back on the page as the 
bullet?

Thank you



I doubt that MS Word quotes are unicode. And as long as the users are
coping/ pasting between MS products (Word-IE) you're going to have a
hard time deciphering those funny characters. Try to encourage them to
use Firefox, and if possible to use a UTF-8 compliant word processor.
Mine is Kword, but I don't think that's available for Windows.

Dotan Cohen
http://what-is-what.com
98

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Unicode Problem

2006-10-05 Thread Robbert van Andel
I know it's Unicode because the javascript is encoding it as Unicode (and
it's doing so correctly).  I guess the gist of my question is how to do I do
a reverse.  How do I take %u2022 and get make that display as the bullet
character?

-Original Message-
From: Dotan Cohen [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 05, 2006 3:44 PM
To: [EMAIL PROTECTED]
Cc: php-general@lists.php.net
Subject: Re: [PHP] Unicode Problem

On 06/10/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I have a webpage that allows users to post news stories for their
department.  The site uses AJAX to send the data to the webserver.  The
problem I'm having is when the user uses some unicode characters like
bullets or MS Word quotes, the page comes out weird.

 Here's the process.
 1. The user enters the story and clicks save.
 2. The javascript uses the escape function to turn the text into something
that can be posted to the server.  This function turns spaces into %20, but
it turns unicode characters into a longer string like %u.
 3. The javascript then sends the data to the processing page.
 4. The PHP processing page receives the data and saves it to the mySQL
database server.

 The problem I see is that any unicode character is saved in it's escaped
unicode sequence.  For example a bullet is saved into the database as a
literal %u2022.  What I need to know is what function can I use so that it's
either saved as the unicode bullet character or displayed back on the page
as the bullet?

 Thank you


I doubt that MS Word quotes are unicode. And as long as the users are
coping/ pasting between MS products (Word-IE) you're going to have a
hard time deciphering those funny characters. Try to encourage them to
use Firefox, and if possible to use a UTF-8 compliant word processor.
Mine is Kword, but I don't think that's available for Windows.

Dotan Cohen
http://what-is-what.com
98

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Unicode Problem

2006-10-05 Thread Penthexquadium
On Thu, 05 Oct 2006 18:14:59 -0400, [EMAIL PROTECTED] wrote:

 I have a webpage that allows users to post news stories for their department. 
  The site uses AJAX to send the data to the webserver.  The problem I'm 
 having is when the user uses some unicode characters like bullets or MS Word 
 quotes, the page comes out weird.  
 
 Here's the process.
 1. The user enters the story and clicks save.
 2. The javascript uses the escape function to turn the text into something 
 that can be posted to the server.  This function turns spaces into %20, but 
 it turns unicode characters into a longer string like %u.
 3. The javascript then sends the data to the processing page.
 4. The PHP processing page receives the data and saves it to the mySQL 
 database server.
 
 The problem I see is that any unicode character is saved in it's escaped 
 unicode sequence.  For example a bullet is saved into the database as a 
 literal %u2022.  What I need to know is what function can I use so that it's 
 either saved as the unicode bullet character or displayed back on the page as 
 the bullet?  
 
 Thank you
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php

If you post that data via POST method, use encodeURIComponent() to
encode the string, and then the server-side script can accesses it
directly.

--
Sorry for my poor English.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php