Re: [PHP] Stripping illegal characters out of an XML document
On Thu, Jun 06, 2002 at 12:47:57PM +0100, Daniel Pupius wrote: Hi there. I'm working with RDF/XML that is strict on what characters are allowed within the elements and attributes. I was wondering if anyone had a script that processed a string and replaced all illegal-characters with their HTML code, for example is converted to and to . It should also work for characters like é. Here's what I use. I grab the file and stick it into the $Contents string. Then, I clean it up with the following regex's. Finally, I pass it to the parse function. # Escape ampersands. $Contents = preg_replace('/(amp;|)/i', 'amp;', $Contents); # Remove all non-visible characters except SP, TAB, LF and CR. $Contents = preg_replace('/[^\x20-\x7E\x09\x0A\x0D]/', \n, $Contents); Of course, you can similarly tweak $Contents to drop or modify any other characters you wish. That's snipet is from my PHP XML Parsing Basics tutorial at http://www.analysisandsolutions.com/code/phpxml.htm It would be possible to process the strings before they are inserted into the XML document - if that is easier. While that's nice, it's not fool proof. What if someone circumvents your insertion process and gets a bad file into the mix? You still need to clean things as they come out just to be safe. Enjoy, --Dan -- PHP classes that make web design easier SQL Solution | Layout Solution | Form Solution sqlsolution.info | layoutsolution.info | formsolution.info T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y 4015 7 Av #4AJ, Brooklyn NY v: 718-854-0335 f: 718-854-0409 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Stripping illegal characters out of an XML document
Thanks, I've created a delimited file of all the HTML Character references. I then loop through and do a replace as previously suggested. However, IE's XML Parser still doesn't like the eacute; which represents é For all intents and purposes it's ok and works with the RDF processor. However, I'd like IE to be able to view the XML file just for completeness. Da Analysis Solutions [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... On Thu, Jun 06, 2002 at 12:47:57PM +0100, Daniel Pupius wrote: Hi there. I'm working with RDF/XML that is strict on what characters are allowed within the elements and attributes. I was wondering if anyone had a script that processed a string and replaced all illegal-characters with their HTML code, for example is converted to and to . It should also work for characters like é. Here's what I use. I grab the file and stick it into the $Contents string. Then, I clean it up with the following regex's. Finally, I pass it to the parse function. # Escape ampersands. $Contents = preg_replace('/(amp;|)/i', 'amp;', $Contents); # Remove all non-visible characters except SP, TAB, LF and CR. $Contents = preg_replace('/[^\x20-\x7E\x09\x0A\x0D]/', \n, $Contents); Of course, you can similarly tweak $Contents to drop or modify any other characters you wish. That's snipet is from my PHP XML Parsing Basics tutorial at http://www.analysisandsolutions.com/code/phpxml.htm It would be possible to process the strings before they are inserted into the XML document - if that is easier. While that's nice, it's not fool proof. What if someone circumvents your insertion process and gets a bad file into the mix? You still need to clean things as they come out just to be safe. Enjoy, --Dan -- PHP classes that make web design easier SQL Solution | Layout Solution | Form Solution sqlsolution.info | layoutsolution.info | formsolution.info T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y 4015 7 Av #4AJ, Brooklyn NY v: 718-854-0335 f: 718-854-0409 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Stripping illegal characters out of an XML document
Heya: On Thu, Jun 06, 2002 at 04:54:15PM +0100, Daniel Pupius wrote: Thanks, I've created a delimited file of all the HTML Character references. I then loop through and do a replace as previously suggested. However, IE's XML Parser still doesn't like the eacute; which represents é For all intents and purposes it's ok and works with the RDF processor. However, I'd like IE to be able to view the XML file just for completeness. Try #233; and see if IE likes that. If not, on the way out to the browser, you can convert your escaping back to an é. Ciao! --Dan -- PHP classes that make web design easier SQL Solution | Layout Solution | Form Solution sqlsolution.info | layoutsolution.info | formsolution.info T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y 4015 7 Av #4AJ, Brooklyn NY v: 718-854-0335 f: 718-854-0409 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php