[PHP] Problems working with HTML using PHP's XML tools (placing mixed text/html into xpath-specified nodes...)

2009-05-21 Thread Weston C
Is there a straightforward way (or, heck, any way) of placing mixed
html/text content into xpath-specified nodes using any of PHP's XML
tools?

So far, I've tried SimpleXML and the DOM and things aren't coming out well.

SimpleXML:

 /* $filename contains path to valid XML file, $xpathxpr contains
valid XPath expression matching at least  one document node, $fillval
contains a mixed well-formed text/xhtml string to be pre-pended within
each matching node */

$sx = simplexml_load_file($filename);
$nodes = $sx-xpath($xpathxpr);
foreach($nodes as $node) {
  $children = $node-children();
  $children[0] = $fillval . $children[0];
}

This only sortof works. I get $fillval appended before the original
contents of each matching docment node but if I've put any markup
in, it's all there as literal text (ie, a
href=http://php.net;php.net/a wouldn't show up as a link, you'd
see the actual markup when the document is rendered).

A variation on this that I tried is creating a new SimpleXMLElement
object, with the mixed text/markup string as an argument passed to the
constructor, since the docs seem to indicate this is blessed. Weirdly,
when I do this, it seems to actually be stripping out the markup and
just giving the text. For example:

$s = new SimpleXMLElement('a href=#Boo/a')
echo $s;

yields Boo (and echo $s-a yields nothing). This would be such a
huge bug I have a hard time believing it, so I have to suspect there's
a dance I'm not doing to make this work correctly.

DOM XML:

 /* again, $filename contains path to valid XML file, $xpathxpr
contains valid XPath expression matching at least  one document node,
$fillval contains a mixed well-formed text/xhtml string to be
pre-pended within each matching node */

$domDoc = new DOMDocument();
$domDoc-loadHTML(file_get_contents($filename));
$search = new DOMXPath($domDoc);
$nodes = $search-query($xpathxpr);
foreach($nodes as $emt) {
$f = $domDoc-createDocumentFragment();
$f-appendXML($fillval . $emt-nodeValue);
$emt-nodeValue = '';
$emt-appendChild($f);
}

This also gets mixed results. It gets cranky and issues warnings about
any HTML entities (despite that it seems it should be clear this is an
HTML document given the invocation of loadHTML), and while I'm seeing
some markup make it through, I'm not in other cases. I haven't quite
figured out the difference.

I can come up with some runnable tests if it will help, but I'm hoping
someone's already familiar with the general issues with using PHP's
XML tools to work with HTML that they can make some good commentary on
the matter.

Thanks,

Weston

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Problems working with HTML using PHP's XML tools (placing mixed text/html into xpath-specified nodes...)

2009-05-21 Thread Michael A. Peters

Weston C wrote:

Is there a straightforward way (or, heck, any way) of placing mixed
html/text content into xpath-specified nodes using any of PHP's XML
tools?

So far, I've tried SimpleXML and the DOM and things aren't coming out well.


Not sure if it is of any use to you, I don't use XPath at all, but this 
php class modifies existing DOMDocument objects for filtering purposes 
and may give you an idea of how it is done:


http://www.clfsrpm.net/xss/cspfilter_class.phps

For a clearer example of adding a node to an existing document -

from

http://www.clfsrpm.net/xss/dom_script_test.phps

That page is hard to read as it was butchered from another script, so 
here are the demonstrative parts:


$import_prepare = 'div' . $somexmlcontentinastring . '/div';
$newDom = new DOMDocument(1.0,UTF-8);
$newDom-loadXML($import_prepare);
$elements = $newDom-getElementsByTagName(div);
$imported_div = $elements-item(0);

Now - $imported_div is an object node containing all the xml in 
import_prepare (you don't have to put it in a div, I did there because 
it's a form so I don't know the structure of xml input, putting it in a 
div gave me a known element that would be parent of all input) - but it 
is a node for the $newDom DOMDocument, I need to import it into the 
DOMDocument I am using.


$testDiv = $myxhtml-importNode($imported_div,true);

Now, $testDiv an a DOM object associated with the DOMDocument I want to 
import the node into.


$xmlBody-appendChild($testDiv);

Now the node is a child of the $xmlBody node, and has been incorporated 
into my DOMDocument.


-=-=-=-

importHTML() is problematic with some characters. Better to convert your 
additions to well formed xml and use importXML();


-=-=-=-

While I am currently using DOMDocument for everything (yes, everything), 
I'm still learning it - do not consider me a guru on it's use.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php