Re: [PHP] DOM - change a tag name ??

Michael A. Peters Wed, 11 Mar 2009 13:17:48 -0700

Andrew Ballard wrote:

On Wed, Mar 11, 2009 at 3:06 PM, Michael A. Peters <[email protected]> wrote:

Andrew Ballard wrote:

On Wed, Mar 11, 2009 at 11:52 AM, Michael A. Peters <[email protected]>
wrote:

If I'm manipulating a dom object, is there a way to change the tag name?
I know you manipulate just about everything else in a node - but is the
tagName really off limits?


from the documentation for DOMElement -

/* Properties */
readonly public bool $schemaTypeInfo ;
readonly public string $tagName ;

so if I really needed to change it, I'd have to create a virgin node with
the new name, identical attributes and children, and replace the existing
node with the new one?

Is there any other way to alter the tagName without doing all that?

If this is related to your earlier post about attributes, is XSLT not
an option? I hate to sound like a broken record, but PHP has support
for XSL transformations and it sounds like that is exactly what you
are trying to do.

Andrew

No.
XSLT is certainly one of the technologies I'm going to look into, but right
now I'm building a filter that (hopefully) will fully implement the Mozilla
developer Content Security Policy server side before the document gets sent
to the browser - by removing what would violate the specified CSP before it
is sent.

My primary interest in changing tag names is to ensure all tags are lower
case so I can then run the rest of the filter. They are all lower case if
you use loadHTML() but I don't want my class to assume it has a properly
created DOMDocument to start with, so I want to walk the DOM and change bad
tags/attribute names before I apply the CSP filtering.



How are you traversing the DOM if it is not already properly formed?
Every time I've ever tried to load a DOMDocument with xml that
wouldn't validate, it blew up and the DOMDocument was left empty. I
usually find loadHTML() to be more forgiving.

The problem isn't with xml that doesn't validate, the problem is thatHTML is not case sensitive. <script></script> and <scRIpT></scRIpT> areboth legal xml but are different tags to xml. In xhtml the first is ascript element, the second has no meaning and is discarded by thebrowser. However when sending the document as html (necessary for IEusers, for example) - html is not case sensitive, the second is validscript tag. So if the content security policy says no scripts areallowed on the page, I need to catch both the second and the first.

I was actually doing it with regex by saving the document to a bufferfirst but to avoid altering content, I had to make sure I was operatinginside tags etc. and then it dawned on me - it's structured data, use atool designed to work with structured data.

If the class was just for me it wouldn't matter, but if I put the classout in the wild - $doc->createElement("SCriPt"); is legal and even canbe used to produce legal validating HTML 4.01 upon saveHTML() but itwould dodge how my class locates and checks for script elements. Theredoesn't seem to be a case insensitive way to find tags/elements in thephp xml tools, so before my class does the filtering it needs to firstmake sure the tags/attributes are all lower case.

How the potential users (if there ever are any) of my class get theirpage into DOMDocument is up to them, not me. They can loadHTML() orcreate it from scratch or import it from some other xml format.

If their source is an html file (or buffer), I will recommend they runit through tidy first - tidy does wonders - but it's still up to them,not me, so my class can't assume the tags are lower case.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] DOM - change a tag name ??

Reply via email to