Re: [PHP] Re: htmlentities

2011-09-14 Thread Johan Lidström
On 13 September 2011 23:01, Shawn McKenzie nos...@mckenzies.net wrote:

 On 09/13/2011 01:38 PM, Ron Piggott wrote:
 
  Is there a way to only change accented characters and not HTML (Example:
 p /p a href =”” /a )
 
  The syntax
 
  echo htmlentities(
 stripslashes(mysql_result($whats_new_result,0,message)) ) . \r\n;
 
  is doing everything (as I expect).  I store breaking news within the
 database as HTML formatted text.  I am trying to see if a work around is
 available?  Do I need to do a variety of search / replace to convert the
 noted characters above back after htmlentities ?
 
  (I am just starting to get use to accented letters.)
 
  Thanks a lot for your help.
 
  Ron
 
  The Verse of the Day
  “Encouragement from God’s Word”
  http://www.TheVerseOfTheDay.info
 

 If it is meant to be HTML then why run htmlentities(), especially before
 storing it in the DB?

 --
 Thanks!
 -Shawn
 http://www.spidean.com

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


Perhaps something like this might help you

$content =
htmlspecialchars_decode(htmlentities($content,ENT_NOQUOTES,ISO-8859-1),ENT_NOQUOTES);

or perhaps

$table_all  =
get_html_translation_table(HTML_ENTITIES,ENT_NOQUOTES,ISO-8859-1);
$table_html = get_html_translation_table(HTML_SPECIALCHARS,ENT_NOQUOTES);
$table_nonhtml = array_diff_key($table_all,$table_html);
$content1 = strtr($content1,$table_nonhtml);
$content2 = strtr($content2,$table_nonhtml);

if using it multiple times.

-- 
It is not possible to simultaneously understand and appreciate the Intel
architecture --Ben Scott


[PHP] Re: htmlentities

2011-09-13 Thread Shawn McKenzie
On 09/13/2011 01:38 PM, Ron Piggott wrote:
 
 Is there a way to only change accented characters and not HTML (Example: p 
 /p a href =”” /a )
 
 The syntax
 
 echo htmlentities( stripslashes(mysql_result($whats_new_result,0,message)) 
 ) . \r\n;
 
 is doing everything (as I expect).  I store breaking news within the database 
 as HTML formatted text.  I am trying to see if a work around is available?  
 Do I need to do a variety of search / replace to convert the noted characters 
 above back after htmlentities ?
 
 (I am just starting to get use to accented letters.)
 
 Thanks a lot for your help.
 
 Ron
 
 The Verse of the Day
 “Encouragement from God’s Word”
 http://www.TheVerseOfTheDay.info  
 

If it is meant to be HTML then why run htmlentities(), especially before
storing it in the DB?

-- 
Thanks!
-Shawn
http://www.spidean.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: HTMLEntities as NUMERIC for XML

2008-11-25 Thread ceo

I already had a function to go from weird MS-Word characters to HTML Entities, 
which I was putting into the DB as such.



In retrospect, that function should have been called at output... Actually, I 
knew it should have, but convincing my co-workers was the proverbial brick 
wall, so I cheated and did it on data import and now I'm paying for it...



I ended up just copy-pasting the entity/number table from here:

http://www.w3schools.com/tags/ref_entities.asp

and go through a 2-step process:



RAW DATA (pasted from Word in unknown code-page/charset/encoding)

HTML Name Entities

HTML Numeric Entities



This seemed to make the W3.org RSS validator happy



It still looks goofy in the browser, but that's the problem of the users 
putting in this goofy stuff in the first place, so I'm shoving it back into 
their laps.



My RSS feed validates, and the content within it not being right is the 
problem of the content creators. :-)



[soapbox on]

I'm pretty tired of dealing with this charset/codepage stuff, personally, after 
years of frustrating experiences, none ending in a real solution



If anybody has a petition to abolish everything except for UTF-32, sign me up! 
:-v



UTF-32 is the biggest, right? The one that has ALL characters anybody needs?...



Hey, I don't care, UTF-64 or UTF-128 is fine by me too. Disk space is cheap.



Just stop the insanity of endless incompatible irreversible calculations to 
substitute a bunch of numeric codes for characters, and make it socially 
unacceptable to use anything other than the one true encoding.



I'm sure somebody somewhere actually enjoys dealing with this [bleep], but I'm 
betting the majority are quite tired of it.

[/soapbox]



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: HTMLEntities as NUMERIC for XML

2008-11-25 Thread Ashley Sheridan
On Tue, 2008-11-25 at 17:09 +, [EMAIL PROTECTED] wrote:

 I already had a function to go from weird MS-Word characters to HTML 
 Entities, which I was putting into the DB as such.
 
 
 
 In retrospect, that function should have been called at output... Actually, I 
 knew it should have, but convincing my co-workers was the proverbial brick 
 wall, so I cheated and did it on data import and now I'm paying for it...
 
 
 
 I ended up just copy-pasting the entity/number table from here:
 
 http://www.w3schools.com/tags/ref_entities.asp
 
 and go through a 2-step process:
 
 
 
 RAW DATA (pasted from Word in unknown code-page/charset/encoding)
 
 HTML Name Entities
 
 HTML Numeric Entities
 
 
 
 This seemed to make the W3.org RSS validator happy
 
 
 
 It still looks goofy in the browser, but that's the problem of the users 
 putting in this goofy stuff in the first place, so I'm shoving it back into 
 their laps.
 
 
 
 My RSS feed validates, and the content within it not being right is the 
 problem of the content creators. :-)
 
 
 
 [soapbox on]
 
 I'm pretty tired of dealing with this charset/codepage stuff, personally, 
 after years of frustrating experiences, none ending in a real solution
 
 
 
 If anybody has a petition to abolish everything except for UTF-32, sign me 
 up! :-v
 
 
 
 UTF-32 is the biggest, right? The one that has ALL characters anybody 
 needs?...
 
 
 
 Hey, I don't care, UTF-64 or UTF-128 is fine by me too. Disk space is cheap.
 
 
 
 Just stop the insanity of endless incompatible irreversible calculations to 
 substitute a bunch of numeric codes for characters, and make it socially 
 unacceptable to use anything other than the one true encoding.
 
 
 
 I'm sure somebody somewhere actually enjoys dealing with this [bleep], but 
 I'm betting the majority are quite tired of it.
 
 [/soapbox]
 
 
 

I came across a similar problem using an AJAX thing, with MSWord
characters in the text. The way round the problem was to enclose
everything inside CDATA blocks, which made the browsers happy to receive
as the entities only had to be understood by the HTML browser now, not
the XML parser. As RSS is an XML format, maybe this would help you?


Ash
www.ashleysheridan.co.uk


[PHP] Re: HTMLEntities as NUMERIC for XML

2008-11-25 Thread Al



[EMAIL PROTECTED] wrote:

After reading this:
http://validator.w3.org/feed/docs/error/UndefinedNamedEntity.html
(all praise W3.org!)

I am searching for a PHP library function that will convert all my abc; into 
#123;

I have a zillion of these things from converting stupid MS Word characters into 
something that will, like, you know, actually WORK on the Internet, and do not 
really want to re-invent the wheel here.

Somebody has to have written this function...

I'm kind of surprised it's not http://php.net/xmlentities or somesuch...



Here's what I use:

//Translate table for dumb Windows chars when user paste from Word; function 
strips all 160
$win1252ToPlainTextArray=array(
chr(130)= ',',
chr(131)= '',
chr(132)= ',,',
chr(133)= '...',
chr(134)= '+',
chr(135)= '',
chr(139)= '',
chr(145)= '\'',
chr(146)= '\'',
chr(147)= '',
chr(148)= '',
chr(149)= '*',
chr(150)= '-',
chr(151)= '-',
chr(155)= '',
chr(160)= ' ',
);


function cleanWin1252Text($str)
{
global $win1252ToPlainTextArray; //translate array for many dumb Windows special chars; used 
for paste in textarears

$str = strtr($str, $win1252ToPlainTextArray);
$str = trim($str);
$patterns = array('%[\x7F-\x81]%', '%[\x83]%', '%[\x87-\x8A]%', 
'%[\x8C-\x90]%',  %[\x98-\xff]%');
return preg_replace($patterns, '', $str); //Strip
}

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: htmlentities()

2004-09-08 Thread Paul Birnstihl
Anthony Ritter wrote:
?php
$str = A 'quote' is bbold/b;
echo htmlentities($str);
?
..
// outputs: A 'quote' is bbold/b
Not sure why the I am still getting the tags and spaces after the call to
htmlentities().
Check out the source code of the output. Maybe you want the strip_tags() 
function ?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[PHP] Re: htmlentities and foreign characters from MS Word

2004-09-06 Thread Monty
That did it! It seems that my version of MySQL doesn't support Unicode
encoding, only the various ISO encodings. So, I guess this translation is
necessary before storing all text in the DB so foreign characters aren't
broken when I retrieve them from the DB.

Thanks! 


I2eptilex wrote:
 
 Well it seems you have a UTF-8 encoded text after your function. Use
 iconv to change it. See http://de3.php.net/manual/en/ref.iconv.php .
 
 try doing this with your array before inserting it into the DB
 
 foreach($insert_array as $key = $var){
 $new_arr[$key] = iconv(UTF-8, ISO-8859-1, $var);
 }
 
 It can be that your array has a different coding than UTF-8 check the
 manual for the htmlentities function, but i'm pretty shure that should
 solve it.
 
 I2eptilex
 
 Monty wrote:
 
 I'm having a problem figuring out how to deal with foreign characters in
 text that was copied from an MS Word document and pasted into a form field.
 
 I'm not how sure this is getting stored in the MySQL database, but, when I
 run htmlentities() on this text, each foreign character is converted into 2
 other foreign characters that don't at all represent the original.
 
 For example, a lowercase u with an umlat over it (ü) is somehow displayed as
 an uppercase A with an umlat over it followed by the 1/4 symbol after parsed
 by htmlentities(). A lowercase o with an ulmat displays as an uppercase A
 with an umlat over it followed by the paragraph symbol. It seems that the
 uppercase A w/umlat is a constant, and the next character changes.
 
 The ord() function returns the same number for all of these foreign
 characters: 195. So, I'm not sure what's happening with these foreign
 characters, and if there's any way to convert them to proper htmlentities
 before being displayed in a browser. I thought htmlentities would do this,
 actually.
 
 Thanks!
 
 Monty.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: htmlentities and foreign characters from MS Word

2004-09-06 Thread Octavian Rasnita
You could store those texts as binary in MySQL...

- Original Message - 
From: Monty [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, September 06, 2004 11:07 AM
Subject: [PHP] Re: htmlentities and foreign characters from MS Word


 That did it! It seems that my version of MySQL doesn't support Unicode
 encoding, only the various ISO encodings. So, I guess this translation is
 necessary before storing all text in the DB so foreign characters aren't
 broken when I retrieve them from the DB.
 
 Thanks! 
 
 
 I2eptilex wrote:

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: htmlentities and foreign characters from MS Word

2004-09-05 Thread I2eptileX
Well it seems you have a UTF-8 encoded text after your function. Use 
iconv to change it. See http://de3.php.net/manual/en/ref.iconv.php .

try doing this with your array before inserting it into the DB
foreach($insert_array as $key = $var){
$new_arr[$key] = iconv(UTF-8, ISO-8859-1, $var);
}
It can be that your array has a different coding than UTF-8 check the 
manual for the htmlentities function, but i'm pretty shure that should 
solve it.

I2eptilex
Monty wrote:
I'm having a problem figuring out how to deal with foreign characters in
text that was copied from an MS Word document and pasted into a form field.
I'm not how sure this is getting stored in the MySQL database, but, when I
run htmlentities() on this text, each foreign character is converted into 2
other foreign characters that don't at all represent the original.
For example, a lowercase u with an umlat over it (ü) is somehow displayed as
an uppercase A with an umlat over it followed by the 1/4 symbol after parsed
by htmlentities(). A lowercase o with an ulmat displays as an uppercase A
with an umlat over it followed by the paragraph symbol. It seems that the
uppercase A w/umlat is a constant, and the next character changes.
The ord() function returns the same number for all of these foreign
characters: 195. So, I'm not sure what's happening with these foreign
characters, and if there's any way to convert them to proper htmlentities
before being displayed in a browser. I thought htmlentities would do this,
actually.
Thanks!
Monty.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php