Re: [PHP] Stripping illegal characters out of an XML document

2002-06-06 Thread Analysis Solutions

On Thu, Jun 06, 2002 at 12:47:57PM +0100, Daniel Pupius wrote:

 Hi there. I'm working with RDF/XML that is strict on what characters are
 allowed within the elements and attributes. I was wondering if anyone had a
 script that processed a string and replaced all illegal-characters with
 their HTML code, for example  is converted to  and  to . It should
 also work for characters like é.

Here's what I use.  I grab the file and stick it into the $Contents
string.  Then, I clean it up with the following regex's.  Finally, I 
pass it to the parse function.

   #  Escape ampersands.
   $Contents = preg_replace('/(amp;|)/i', 'amp;', $Contents);

   #  Remove all non-visible characters except SP, TAB, LF and CR.
   $Contents = preg_replace('/[^\x20-\x7E\x09\x0A\x0D]/', \n, $Contents);

Of course, you can similarly tweak $Contents to drop or modify any other
characters you wish.

That's snipet is from my PHP XML Parsing Basics tutorial at
http://www.analysisandsolutions.com/code/phpxml.htm


 It would be possible to process the strings before they are inserted into
 the XML document - if that is easier.

While that's nice, it's not fool proof.  What if someone circumvents
your insertion process and gets a bad file into the mix?  You still need
to clean things as they come out just to be safe.

Enjoy,

--Dan

-- 
   PHP classes that make web design easier
SQL Solution  |   Layout Solution   |  Form Solution
sqlsolution.info  | layoutsolution.info |  formsolution.info
 T H E   A N A L Y S I S   A N D   S O L U T I O N S   C O M P A N Y
 4015 7 Av #4AJ, Brooklyn NY v: 718-854-0335 f: 718-854-0409

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Stripping illegal characters out of an XML document

2002-06-06 Thread Daniel Pupius

Thanks, I've created a delimited file of all the HTML Character references.
I then loop through and do a replace as previously suggested.   However,
IE's XML Parser still doesn't like the eacute; which represents é

For all intents and purposes it's ok and works with the RDF processor.
However, I'd like IE to be able to view the XML file just for completeness.

Da


Analysis  Solutions [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 On Thu, Jun 06, 2002 at 12:47:57PM +0100, Daniel Pupius wrote:

  Hi there. I'm working with RDF/XML that is strict on what characters are
  allowed within the elements and attributes. I was wondering if anyone
had a
  script that processed a string and replaced all illegal-characters with
  their HTML code, for example  is converted to  and  to . It should
  also work for characters like é.

 Here's what I use.  I grab the file and stick it into the $Contents
 string.  Then, I clean it up with the following regex's.  Finally, I
 pass it to the parse function.

#  Escape ampersands.
$Contents = preg_replace('/(amp;|)/i', 'amp;', $Contents);

#  Remove all non-visible characters except SP, TAB, LF and CR.
$Contents = preg_replace('/[^\x20-\x7E\x09\x0A\x0D]/', \n,
$Contents);

 Of course, you can similarly tweak $Contents to drop or modify any other
 characters you wish.

 That's snipet is from my PHP XML Parsing Basics tutorial at
 http://www.analysisandsolutions.com/code/phpxml.htm


  It would be possible to process the strings before they are inserted
into
  the XML document - if that is easier.

 While that's nice, it's not fool proof.  What if someone circumvents
 your insertion process and gets a bad file into the mix?  You still need
 to clean things as they come out just to be safe.

 Enjoy,

 --Dan

 --
PHP classes that make web design easier
 SQL Solution  |   Layout Solution   |  Form Solution
 sqlsolution.info  | layoutsolution.info |  formsolution.info
  T H E   A N A L Y S I S   A N D   S O L U T I O N S   C O M P A N Y
  4015 7 Av #4AJ, Brooklyn NY v: 718-854-0335 f: 718-854-0409



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Stripping illegal characters out of an XML document

2002-06-06 Thread Analysis Solutions

Heya:

On Thu, Jun 06, 2002 at 04:54:15PM +0100, Daniel Pupius wrote:
 Thanks, I've created a delimited file of all the HTML Character references.
 I then loop through and do a replace as previously suggested.   However,
 IE's XML Parser still doesn't like the eacute; which represents é
 
 For all intents and purposes it's ok and works with the RDF processor.
 However, I'd like IE to be able to view the XML file just for completeness.

Try #233; and see if IE likes that.  If not, on the way out to the 
browser, you can convert your escaping back to an é.

Ciao!

--Dan

-- 
   PHP classes that make web design easier
SQL Solution  |   Layout Solution   |  Form Solution
sqlsolution.info  | layoutsolution.info |  formsolution.info
 T H E   A N A L Y S I S   A N D   S O L U T I O N S   C O M P A N Y
 4015 7 Av #4AJ, Brooklyn NY v: 718-854-0335 f: 718-854-0409

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php