On Wed, Oct 01, 2003 at 02:02:08PM -0400, Gerard Samuel wrote:
: CPT John W. Holmes wrote:
: >From: "Eugene Lee" <[EMAIL PROTECTED]>
: >>On Wed, Oct 01, 2003 at 01:12:16AM -0400, Gerard Samuel wrote:
: >>:
: >>: Got a problem with htmlspecialchars being too greedy, where
: >>: for example, it converts
: >>: &foo;
: >>: to
: >>: &amp;foo;
[...]
: >>: $foo = '&#20013;&#25991; & http://www.foo.com/index.php?foo=1&bar=2';
: >>
: >>The problem isn't with htmlspecialchars().  It doesn't know what parts
: >>of the string are HTML character references and which parts are not.
: >>But if you're willing to dig up the numeric character references for
: >>those specific Chinese characters, then split the string into the part
: >>that needs no translation and the part that needs it.  That is:
: >>
: >>$foo1encoded = '&#20013;&#25991;'
: >>$foo2raw = ' & http://www.foo.com/index.php?foo=1&bar=2';
: >>$foo = $foo1 . htmlspecialchars(foo2raw);
: >
: >Maybe you should run html_entity_decode() on the string first, then run
: >encode again. The decode will take &#20013; and turn it into it's actual
: >character but not affect anything else. Then the recoding will turn it back
: >into &#20013; and also encode any other characters.
: 
: Eugene, your example leads me to believe that one knows before hand
: what characters needs special attention, in order to not run it
: through htmlspecialchars.  I would never know what characters needs
: special attention.

But it seems that you do know what characters need to be converted,
because you included the exact Unicode character references for those
Chinese characters.  You have to know your data.  Or modify your code
with specific assumptions about the data.

For example, let's say I have a string that I got from somewhere
(database, user form, text file, another web site, etc.):

        $foo = 'Dick &amp; Jane';

When you eventually display this to someone's web browser, what do you
want them to see?

        Dick & Jane
or
        Dick &amp; Jane

This really depends on the format of the data inside $foo.  Is '&amp;'
a character reference that you want to leave alone?  Or is it a literal
string that you want to convert to '&amp;amp;' for display?  And the
only person that knows the format of the data is you.  Again, you have
to know your data.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to