On Thu, Jun 06, 2002 at 12:47:57PM +0100, Daniel Pupius wrote:
> Hi there. I'm working with RDF/XML that is strict on what characters are
> allowed within the elements and attributes. I was wondering if anyone had a
> script that processed a string and replaced all illegal-characters with
> their HTML code, for example "&" is converted to & and " to ". It should
> also work for characters like "�".
Here's what I use. I grab the file and stick it into the $Contents
string. Then, I clean it up with the following regex's. Finally, I
pass it to the parse function.
# Escape ampersands.
$Contents = preg_replace('/(&|&)/i', '&', $Contents);
# Remove all non-visible characters except SP, TAB, LF and CR.
$Contents = preg_replace('/[^\x20-\x7E\x09\x0A\x0D]/', "\n", $Contents);
Of course, you can similarly tweak $Contents to drop or modify any other
characters you wish.
That's snipet is from my PHP XML Parsing Basics tutorial at
http://www.analysisandsolutions.com/code/phpxml.htm
> It would be possible to process the strings before they are inserted into
> the XML document - if that is easier.
While that's nice, it's not fool proof. What if someone circumvents
your insertion process and gets a bad file into the mix? You still need
to clean things as they come out just to be safe.
Enjoy,
--Dan
--
PHP classes that make web design easier
SQL Solution | Layout Solution | Form Solution
sqlsolution.info | layoutsolution.info | formsolution.info
T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
4015 7 Av #4AJ, Brooklyn NY v: 718-854-0335 f: 718-854-0409
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php