Andrew Ballard wrote:
On Wed, Mar 11, 2009 at 3:06 PM, Michael A. Peters <mpet...@mac.com> wrote:
Andrew Ballard wrote:
On Wed, Mar 11, 2009 at 11:52 AM, Michael A. Peters <mpet...@mac.com>
wrote:
If I'm manipulating a dom object, is there a way to change the tag name?
I know you manipulate just about everything else in a node - but is the
tagName really off limits?

from the documentation for DOMElement -

/* Properties */
readonly public bool $schemaTypeInfo ;
readonly public string $tagName ;

so if I really needed to change it, I'd have to create a virgin node with
the new name, identical attributes and children, and replace the existing
node with the new one?

Is there any other way to alter the tagName without doing all that?

If this is related to your earlier post about attributes, is XSLT not
an option? I hate to sound like a broken record, but PHP has support
for XSL transformations and it sounds like that is exactly what you
are trying to do.

Andrew

No.
XSLT is certainly one of the technologies I'm going to look into, but right
now I'm building a filter that (hopefully) will fully implement the Mozilla
developer Content Security Policy server side before the document gets sent
to the browser - by removing what would violate the specified CSP before it
is sent.

My primary interest in changing tag names is to ensure all tags are lower
case so I can then run the rest of the filter. They are all lower case if
you use loadHTML() but I don't want my class to assume it has a properly
created DOMDocument to start with, so I want to walk the DOM and change bad
tags/attribute names before I apply the CSP filtering.


How are you traversing the DOM if it is not already properly formed?
Every time I've ever tried to load a DOMDocument with xml that
wouldn't validate, it blew up and the DOMDocument was left empty. I
usually find loadHTML() to be more forgiving.

The problem isn't with xml that doesn't validate, the problem is that HTML is not case sensitive. <script></script> and <scRIpT></scRIpT> are both legal xml but are different tags to xml. In xhtml the first is a script element, the second has no meaning and is discarded by the browser. However when sending the document as html (necessary for IE users, for example) - html is not case sensitive, the second is valid script tag. So if the content security policy says no scripts are allowed on the page, I need to catch both the second and the first.

I was actually doing it with regex by saving the document to a buffer first but to avoid altering content, I had to make sure I was operating inside tags etc. and then it dawned on me - it's structured data, use a tool designed to work with structured data.

If the class was just for me it wouldn't matter, but if I put the class out in the wild - $doc->createElement("SCriPt"); is legal and even can be used to produce legal validating HTML 4.01 upon saveHTML() but it would dodge how my class locates and checks for script elements. There doesn't seem to be a case insensitive way to find tags/elements in the php xml tools, so before my class does the filtering it needs to first make sure the tags/attributes are all lower case.

How the potential users (if there ever are any) of my class get their page into DOMDocument is up to them, not me. They can loadHTML() or create it from scratch or import it from some other xml format.

If their source is an html file (or buffer), I will recommend they run it through tidy first - tidy does wonders - but it's still up to them, not me, so my class can't assume the tags are lower case.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to