Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Thanks for your feedback. __ Raymond Irving --- On Tue, 4/14/09, Michael A. Peters mpet...@mac.com wrote: From: Michael A. Peters mpet...@mac.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Raymond Irving xwis...@yahoo.com Cc: php-general@lists.php.net Date: Tuesday, April 14, 2009, 8:09 PM Raymond Irving wrote: Hi, I'm thinking about using the html5 doctype for all html documents since it's supported by all the popular browsers available today. Two Quick questions... Why do we need to send XHTML code to a web browser when standard html code (with html 5 doctype) will do just fine? In most cases we don't. However if we want to include extensions (such as MathML etc.) then xhtml is the only way to do it. My own reason for sending xhtml is because I believe it to be a superior specification and would like to see html (where not all tags need to be closed) go away. Having valid x(ht)ml output also means that other software that uses your web page as a source for data can just parse it as xml to get the data it needs. Be careful with html 5 - use the fallbacks (IE embed or object for video as a fallback to the video tag), because not everyone uses the latest browsers. Is there any advantage of using xhtml in the web browser over html for normal web application development? In most cases, not a display advantage. HTML 1.1 supports the ruby tags/attribute, html 4 does not, but with html 5 / xhtml 5 - they are supposedly identical in spec with the only difference being the markup semantics of xhtml 5 conform to xml standards. I suspect html 5 elements/attributes are case insensitive (like they are for previous html) but I haven't checked - xhtml tags/attributes need to be lower case. But if your page can be properly displayed with valid html then the only technical advantage I can think of for using xhtml is for apps that use your page as a data source (so they don't have to convert it to xml). I personally will send xhtml most of the time when I can because I want HTML to go away, and as soon as 97% of browsers properly support xhtml, I may stop sending html all together. Since IE 8 still does not (not will correct mime type anyway) it will be years before that happens. Oh - another advantage to xhtml - it's easy to extend for your own use. For example, you can add a custom attribute for your own use (IE as hooks for other web apps on other sites to use when grabbing data from your site, or whatever) and it will validate as long as you properly declare it. With html, I believe adding an attribute is not allowed unless you create a whole new DTD. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Thanks for the feedback. I too like xhtml but I think I like the option of serving both. My only concern is that a proxy server might cache an xhtml page and then serve it to a non-xhtml browser. Do you think it's possible that a proxy might serve the xhtml source to the wrong browser? __ Raymond Irving --- On Tue, 4/14/09, Michael Shadle mike...@gmail.com wrote: From: Michael Shadle mike...@gmail.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Raymond Irving xwis...@yahoo.com Cc: php-general@lists.php.net php-general@lists.php.net Date: Tuesday, April 14, 2009, 8:26 PM As michael said my main reason is strictness. It's much easier to parse a document when an XML parser can read it. I like the idea of closing tags etc. On Apr 14, 2009, at 4:38 PM, Raymond Irving xwis...@yahoo.com wrote: Hi, I'm thinking about using the html5 doctype for all html documents since it's supported by all the popular browsers available today. Two Quick questions... Why do we need to send XHTML code to a web browser when standard html code (with html 5 doctype) will do just fine? Is there any advantage of using xhtml in the web browser over html for normal web application development? __ Raymond Irving --- On Tue, 4/14/09, Peter Ford p...@justcroft.com wrote: From: Peter Ford p...@justcroft.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: php-general@lists.php.net Date: Tuesday, April 14, 2009, 5:05 AM Michael Shadle wrote: On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters mpet...@mac.com wrote: The problem is that validating xhtml does not necessarily render properly in some browsers *cough*IE*cough* I've never had problems and my work is primarily around IE6 / our corporate standards. Hell, even without a script type it still works :) Would this function work for sending html and solve the utf8 problem? function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. this was the only workaround I received for the moment, and I was a bit afraid it would not process the full range of utf-8; it appeared on a quick check to work but I wanted to run it on our entire database and then ask the native geo folks to examine it for correctness. I find that IE7 (at least) is pretty reliable as long as I use strict XHTML and send a DOCTYPE header to that effect at the top - that seems to trigger a standard-compliant mode in IE7. At least then I only have to worry about the JavaScript incompatibilities, and the table model, and the event model, and --Peter Ford phone: 01580 89 Developer fax: 01580 893399 Justcroft International Ltd., Staplehurst, Kent --PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
I use XHTML 1.0 transitional and I've yet to have anyone tell me my sites don't work. Mobile and desktop browsers too. So I'm not sure that's an issue at all (?) On Apr 15, 2009, at 6:31 PM, Raymond Irving xwis...@yahoo.com wrote: Thanks for the feedback. I too like xhtml but I think I like the option of serving both. My only concern is that a proxy server might cache an xhtml page and then serve it to a non-xhtml browser. Do you think it's possible that a proxy might serve the xhtml source to the wrong browser? __ Raymond Irving --- On Tue, 4/14/09, Michael Shadle mike...@gmail.com wrote: From: Michael Shadle mike...@gmail.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Raymond Irving xwis...@yahoo.com Cc: php-general@lists.php.net php-general@lists.php.net Date: Tuesday, April 14, 2009, 8:26 PM As michael said my main reason is strictness. It's much easier to parse a document when an XML parser can read it. I like the idea of closing tags etc. On Apr 14, 2009, at 4:38 PM, Raymond Irving xwis...@yahoo.com wrote: Hi, I'm thinking about using the html5 doctype for all html documents since it's supported by all the popular browsers available today. Two Quick questions... Why do we need to send XHTML code to a web browser when standard html code (with html 5 doctype) will do just fine? Is there any advantage of using xhtml in the web browser over html for normal web application development? __ Raymond Irving --- On Tue, 4/14/09, Peter Ford p...@justcroft.com wrote: From: Peter Ford p...@justcroft.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: php-general@lists.php.net Date: Tuesday, April 14, 2009, 5:05 AM Michael Shadle wrote: On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters mpet...@mac.com wrote: The problem is that validating xhtml does not necessarily render properly in some browsers *cough*IE*cough* I've never had problems and my work is primarily around IE6 / our corporate standards. Hell, even without a script type it still works :) Would this function work for sending html and solve the utf8 problem? function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. this was the only workaround I received for the moment, and I was a bit afraid it would not process the full range of utf-8; it appeared on a quick check to work but I wanted to run it on our entire database and then ask the native geo folks to examine it for correctness. I find that IE7 (at least) is pretty reliable as long as I use strict XHTML and send a DOCTYPE header to that effect at the top - that seems to trigger a standard-compliant mode in IE7. At least then I only have to worry about the JavaScript incompatibilities, and the table model, and the event model, and --Peter Ford phone: 01580 89 Developer fax: 01580 893399 Justcroft International Ltd., Staplehurst, Kent --PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Michael Shadle wrote: On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters mpet...@mac.com wrote: The problem is that validating xhtml does not necessarily render properly in some browsers *cough*IE*cough* I've never had problems and my work is primarily around IE6 / our corporate standards. Hell, even without a script type it still works :) Would this function work for sending html and solve the utf8 problem? function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. this was the only workaround I received for the moment, and I was a bit afraid it would not process the full range of utf-8; it appeared on a quick check to work but I wanted to run it on our entire database and then ask the native geo folks to examine it for correctness. I find that IE7 (at least) is pretty reliable as long as I use strict XHTML and send a DOCTYPE header to that effect at the top - that seems to trigger a standard-compliant mode in IE7. At least then I only have to worry about the JavaScript incompatibilities, and the table model, and the event model, and -- Peter Ford phone: 01580 89 Developer fax: 01580 893399 Justcroft International Ltd., Staplehurst, Kent -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Hi, I'm thinking about using the html5 doctype for all html documents since it's supported by all the popular browsers available today. Two Quick questions... Why do we need to send XHTML code to a web browser when standard html code (with html 5 doctype) will do just fine? Is there any advantage of using xhtml in the web browser over html for normal web application development? __ Raymond Irving --- On Tue, 4/14/09, Peter Ford p...@justcroft.com wrote: From: Peter Ford p...@justcroft.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: php-general@lists.php.net Date: Tuesday, April 14, 2009, 5:05 AM Michael Shadle wrote: On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters mpet...@mac.com wrote: The problem is that validating xhtml does not necessarily render properly in some browsers *cough*IE*cough* I've never had problems and my work is primarily around IE6 / our corporate standards. Hell, even without a script type it still works :) Would this function work for sending html and solve the utf8 problem? function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. this was the only workaround I received for the moment, and I was a bit afraid it would not process the full range of utf-8; it appeared on a quick check to work but I wanted to run it on our entire database and then ask the native geo folks to examine it for correctness. I find that IE7 (at least) is pretty reliable as long as I use strict XHTML and send a DOCTYPE header to that effect at the top - that seems to trigger a standard-compliant mode in IE7. At least then I only have to worry about the JavaScript incompatibilities, and the table model, and the event model, and -- Peter Ford phone: 01580 89 Developer fax: 01580 893399 Justcroft International Ltd., Staplehurst, Kent -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Raymond Irving wrote: Hi, I'm thinking about using the html5 doctype for all html documents since it's supported by all the popular browsers available today. Two Quick questions... Why do we need to send XHTML code to a web browser when standard html code (with html 5 doctype) will do just fine? In most cases we don't. However if we want to include extensions (such as MathML etc.) then xhtml is the only way to do it. My own reason for sending xhtml is because I believe it to be a superior specification and would like to see html (where not all tags need to be closed) go away. Having valid x(ht)ml output also means that other software that uses your web page as a source for data can just parse it as xml to get the data it needs. Be careful with html 5 - use the fallbacks (IE embed or object for video as a fallback to the video tag), because not everyone uses the latest browsers. Is there any advantage of using xhtml in the web browser over html for normal web application development? In most cases, not a display advantage. HTML 1.1 supports the ruby tags/attribute, html 4 does not, but with html 5 / xhtml 5 - they are supposedly identical in spec with the only difference being the markup semantics of xhtml 5 conform to xml standards. I suspect html 5 elements/attributes are case insensitive (like they are for previous html) but I haven't checked - xhtml tags/attributes need to be lower case. But if your page can be properly displayed with valid html then the only technical advantage I can think of for using xhtml is for apps that use your page as a data source (so they don't have to convert it to xml). I personally will send xhtml most of the time when I can because I want HTML to go away, and as soon as 97% of browsers properly support xhtml, I may stop sending html all together. Since IE 8 still does not (not will correct mime type anyway) it will be years before that happens. Oh - another advantage to xhtml - it's easy to extend for your own use. For example, you can add a custom attribute for your own use (IE as hooks for other web apps on other sites to use when grabbing data from your site, or whatever) and it will validate as long as you properly declare it. With html, I believe adding an attribute is not allowed unless you create a whole new DTD. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
As michael said my main reason is strictness. It's much easier to parse a document when an XML parser can read it. I like the idea of closing tags etc. On Apr 14, 2009, at 4:38 PM, Raymond Irving xwis...@yahoo.com wrote: Hi, I'm thinking about using the html5 doctype for all html documents since it's supported by all the popular browsers available today. Two Quick questions... Why do we need to send XHTML code to a web browser when standard html code (with html 5 doctype) will do just fine? Is there any advantage of using xhtml in the web browser over html for normal web application development? __ Raymond Irving --- On Tue, 4/14/09, Peter Ford p...@justcroft.com wrote: From: Peter Ford p...@justcroft.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: php-general@lists.php.net Date: Tuesday, April 14, 2009, 5:05 AM Michael Shadle wrote: On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters mpet...@mac.com wrote: The problem is that validating xhtml does not necessarily render properly in some browsers *cough*IE*cough* I've never had problems and my work is primarily around IE6 / our corporate standards. Hell, even without a script type it still works :) Would this function work for sending html and solve the utf8 problem? function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. this was the only workaround I received for the moment, and I was a bit afraid it would not process the full range of utf-8; it appeared on a quick check to work but I wanted to run it on our entire database and then ask the native geo folks to examine it for correctness. I find that IE7 (at least) is pretty reliable as long as I use strict XHTML and send a DOCTYPE header to that effect at the top - that seems to trigger a standard-compliant mode in IE7. At least then I only have to worry about the JavaScript incompatibilities, and the table model, and the event model, and -- Peter Ford phone: 01580 89 Developer fax: 01580 893399 Justcroft International Ltd., Staplehurst, Kent -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Raymond Irving wrote: Hello, After talking with Michael about how to generate XHTML code using the DOM I came up with this little function that I'm thinking of using to generate XHTML code that's HTML compatible: function saveXHTML($dom) { $html = $dom-saveXML(null,LIBXML_NOEMPTYTAG); $html = str_replace('#13;','',$html); $html = preg_replace('/\?xml[^]*\n/','',$html,1); $html = preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html); $html = preg_replace('/\/(meta|link|base|basefont|param|img|br|hr|area|input)/',' /',$html); return $html; } What do you think? __ Raymond Irving I found this in my old files - don't know if any of it is useful to you - function HTMLify($buffer) { $xhtml[] = '/script([^]*)\//'; $html[] = 'script\\1/script'; $xhtml[] = '/div([^]*)\//'; $html[] = 'div\\1/div'; $xhtml[] = '/a([^]*)\//'; $html[] = 'a\\1/a'; $xhtml[] = '/\//'; $html[] = ''; // DOMDocument never produces white space between / and on self closing tags // $xhtml[] = '/\/\s+/'; // $html[] = ''; return preg_replace($xhtml, $html, $buffer); } I think I actually had extended the function beyond the replacements there, but I don't seem to have them anymore. What really would be nice is to patch the libxml2 library to add an option to change it's default behaviour of screwing up utf8 on html export and then be able to pass that option to saveHTML() so the utf8 issue would go away. That's way beyond my skill though. Has anyone brought up the issue with the libxml developers? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
On Sun, Apr 12, 2009 at 8:07 AM, Raymond Irving xwis...@yahoo.com wrote: $html = preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html); question - the output of this would be script type=text/javascript![CDATAjs code ... ]]/script right? is the cdata truly necessary? I typically use XHTML 1.0 transitional and I don't have problems validating. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Michael Shadle wrote: On Sun, Apr 12, 2009 at 8:07 AM, Raymond Irving xwis...@yahoo.com wrote: $html = preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html); question - the output of this would be script type=text/javascript![CDATAjs code ... ]]/script right? is the cdata truly necessary? I typically use XHTML 1.0 transitional and I don't have problems validating. The problem is that validating xhtml does not necessarily render properly in some browsers *cough*IE*cough* That's why I prefer to send html 4.01 to those browsers. Would this function work for sending html and solve the utf8 problem? function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Michael A. Peters wrote: function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. Huh - not tried above yet - but with $test = $myxhtml-createElement('p','שלום'); $xmlBody-appendChild($test); both saveXML() and saveHTML() do the right thing. However if I have the string pשלום/p and load it into a DOM - With loadHTML() the utf8 is lost regardless of whether I use saveXML() or saveHTML() With loadXML() the utf8 is preserved regardless of whether or not I use saveXML() or saveHTML() php 5.2.9 libxml2 2.6.26-2.1.2.7 (CentOS 5.3) I wonder if the real utf8 problem people experience is really with loadHTML() and not with saveHTML() ?? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Michael A. Peters wrote: I wonder if the real utf8 problem people experience is really with loadHTML() and not with saveHTML() ?? Go to http://www.clfsrpm.net/xss/dom_script_test.php The page was meant to test something else but enter some UTF-8 into the textarea (well formed xhtml) - With IE - html 4 is the only thing it will output but with FireFox and Opera, by default it outputs xhtml but you can check a box to force html. When it outputs html it uses saveHTML() When it outputs xhtml it uses saveXML() The source is linked there so you can see. Anyway - it looks saveHTML() works fine with UTF8 as long as loadXML() was used to load the data. I've tried both Hebrew and Polytonic Greek. It outputs correctly regardless of html or xhtml - but only after I changed the code to load the textarea as xml opposed to html. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Michael, You are absolutely right! It's loadHTML() that's causing the problems. Best regards, __ Raymond Irving --- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote: From: Michael A. Peters mpet...@mac.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Michael Shadle mike...@gmail.com Cc: Raymond Irving xwis...@yahoo.com, php-general@lists.php.net php-general@lists.php.net Date: Monday, April 13, 2009, 5:36 AM Michael A. Peters wrote: function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. Huh - not tried above yet - but with $test = $myxhtml-createElement('p','שלום'); $xmlBody-appendChild($test); both saveXML() and saveHTML() do the right thing. However if I have the string pשלום/p and load it into a DOM - With loadHTML() the utf8 is lost regardless of whether I use saveXML() or saveHTML() With loadXML() the utf8 is preserved regardless of whether or not I use saveXML() or saveHTML() php 5.2.9 libxml2 2.6.26-2.1.2.7 (CentOS 5.3) I wonder if the real utf8 problem people experience is really with loadHTML() and not with saveHTML() ?? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Well this is an interesting turn of events :) We should now run over to the libxml folks and see if there is anything that can be done. There *are* encoding options when you setup the domdocument so it seems like the options are there but not working properly for one reason or another. On Apr 13, 2009, at 8:01 AM, Raymond Irving xwis...@yahoo.com wrote: Michael, You are absolutely right! It's loadHTML() that's causing the problems. Best regards, __ Raymond Irving --- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote: From: Michael A. Peters mpet...@mac.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Michael Shadle mike...@gmail.com Cc: Raymond Irving xwis...@yahoo.com, php- gene...@lists.php.net php-general@lists.php.net Date: Monday, April 13, 2009, 5:36 AM Michael A. Peters wrote: function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. Huh - not tried above yet - but with $test = $myxhtml-createElement('p','שלום'); $xmlBody-appendChild($test); both saveXML() and saveHTML() do the right thing. However if I have the string pשלום/p and load it into a DOM - With loadHTML() the utf8 is lost regardless of whether I use saveXML() or saveHTML() With loadXML() the utf8 is preserved regardless of whether or not I use saveXML() or saveHTML() php 5.2.9 libxml2 2.6.26-2.1.2.7 (CentOS 5.3) I wonder if the real utf8 problem people experience is really with loadHTML() and not with saveHTML() ?? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
I will say though this negates the reason I chose to use domdocument to begin with. I am feeding it snippets of HTML that usually do not validate and I am not sure I want to run it through tidy first to convert from HTML to XHTML to run the domdocument and then convert it back... I am essentially using this to traverse the DOM and process all a href and img src attributes for a link remapping job. (also realizing the power of php's DOM for other things I used to try tidy and then use simplexml when doing HTML scraping ...) but php's dom allows me to give it absolutely crappy HTML and it still works. However if someone has a nice regular expression or chunk of code that allows you to scan a doc for a href and then replaces them in the proper context (not just globally) that would work too. I can't just blindly find urls and then replace them (although the reason for this escapes me right now) On Apr 13, 2009, at 8:01 AM, Raymond Irving xwis...@yahoo.com wrote: Michael, You are absolutely right! It's loadHTML() that's causing the problems. Best regards, __ Raymond Irving --- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote: From: Michael A. Peters mpet...@mac.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Michael Shadle mike...@gmail.com Cc: Raymond Irving xwis...@yahoo.com, php- gene...@lists.php.net php-general@lists.php.net Date: Monday, April 13, 2009, 5:36 AM Michael A. Peters wrote: function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. Huh - not tried above yet - but with $test = $myxhtml-createElement('p','שלום'); $xmlBody-appendChild($test); both saveXML() and saveHTML() do the right thing. However if I have the string pשלום/p and load it into a DOM - With loadHTML() the utf8 is lost regardless of whether I use saveXML() or saveHTML() With loadXML() the utf8 is preserved regardless of whether or not I use saveXML() or saveHTML() php 5.2.9 libxml2 2.6.26-2.1.2.7 (CentOS 5.3) I wonder if the real utf8 problem people experience is really with loadHTML() and not with saveHTML() ?? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters mpet...@mac.com wrote: The problem is that validating xhtml does not necessarily render properly in some browsers *cough*IE*cough* I've never had problems and my work is primarily around IE6 / our corporate standards. Hell, even without a script type it still works :) Would this function work for sending html and solve the utf8 problem? function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. this was the only workaround I received for the moment, and I was a bit afraid it would not process the full range of utf-8; it appeared on a quick check to work but I wanted to run it on our entire database and then ask the native geo folks to examine it for correctness. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Hi Michael, You migth want to check out the Raxan PDI (Programmable Document Interface) framework. It works like a charm iwth html snippets: example: $page['body']-appned('pשלום/p'); // this will append the p to the html body Here's the link: http://raxanpdi.com For online examples checkout: http://raxanpdi.com/examples.html __ Raymond Irving --- On Mon, 4/13/09, Michael Shadle mike...@gmail.com wrote: From: Michael Shadle mike...@gmail.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Raymond Irving xwis...@yahoo.com Cc: php-general@lists.php.net php-general@lists.php.net Date: Monday, April 13, 2009, 11:34 AM I will say though this negates the reason I chose to use domdocument to begin with. I am feeding it snippets of HTML that usually do not validate and I am not sure I want to run it through tidy first to convert from HTML to XHTML to run the domdocument and then convert it back... I am essentially using this to traverse the DOM and process all a href and img src attributes for a link remapping job. (also realizing the power of php's DOM for other things I used to try tidy and then use simplexml when doing HTML scraping ...) but php's dom allows me to give it absolutely crappy HTML and it still works. However if someone has a nice regular expression or chunk of code that allows you to scan a doc for a href and then replaces them in the proper context (not just globally) that would work too. I can't just blindly find urls and then replace them (although the reason for this escapes me right now) On Apr 13, 2009, at 8:01 AM, Raymond Irving xwis...@yahoo.com wrote: Michael, You are absolutely right! It's loadHTML() that's causing the problems. Best regards, __ Raymond Irving --- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote: From: Michael A. Peters mpet...@mac.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Michael Shadle mike...@gmail.com Cc: Raymond Irving xwis...@yahoo.com, php-general@lists.php.net php-general@lists.php.net Date: Monday, April 13, 2009, 5:36 AM Michael A. Peters wrote: function makeHTML($document) { $buffer = $document-saveHTML(); $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8); return $output; } I'll try it and see what it does. Huh - not tried above yet - but with $test = $myxhtml-createElement('p','שלום'); $xmlBody-appendChild($test); both saveXML() and saveHTML() do the right thing. However if I have the string pשלום/p and load it into a DOM - With loadHTML() the utf8 is lost regardless of whether I use saveXML() or saveHTML() With loadXML() the utf8 is preserved regardless of whether or not I use saveXML() or saveHTML() php 5.2.9 libxml2 2.6.26-2.1.2.7 (CentOS 5.3) I wonder if the real utf8 problem people experience is really with loadHTML() and not with saveHTML() ?? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Hi Michael, Your script works fine. The only problem I'm having is that it does not support html entities. The following code will cause the page to crash: pcopy;/p I think that's because you're using loadXML and not loadHTML. Has anyone from the dev team contacted the libxml guys about the utf-8 issue with loadHTML()? __ Raymond Irving --- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote: From: Michael A. Peters mpet...@mac.com Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: Michael Shadle mike...@gmail.com Cc: Raymond Irving xwis...@yahoo.com, php-general@lists.php.net php-general@lists.php.net Date: Monday, April 13, 2009, 5:56 AM Michael A. Peters wrote: I wonder if the real utf8 problem people experience is really with loadHTML() and not with saveHTML() ?? Go to http://www.clfsrpm.net/xss/dom_script_test.php The page was meant to test something else but enter some UTF-8 into the textarea (well formed xhtml) - With IE - html 4 is the only thing it will output but with FireFox and Opera, by default it outputs xhtml but you can check a box to force html. When it outputs html it uses saveHTML() When it outputs xhtml it uses saveXML() The source is linked there so you can see. Anyway - it looks saveHTML() works fine with UTF8 as long as loadXML() was used to load the data. I've tried both Hebrew and Polytonic Greek. It outputs correctly regardless of html or xhtml - but only after I changed the code to load the textarea as xml opposed to html. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Hello, After talking with Michael about how to generate XHTML code using the DOM I came up with this little function that I'm thinking of using to generate XHTML code that's HTML compatible: function saveXHTML($dom) { $html = $dom-saveXML(null,LIBXML_NOEMPTYTAG); $html = str_replace('#13;','',$html); $html = preg_replace('/\?xml[^]*\n/','',$html,1); $html = preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html); $html = preg_replace('/\/(meta|link|base|basefont|param|img|br|hr|area|input)/',' /',$html); return $html; } What do you think? __ Raymond Irving -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
On Sun, Apr 12, 2009 at 8:07 AM, Raymond Irving xwis...@yahoo.com wrote: Hello, After talking with Michael about how to generate XHTML code using the DOM I came up with this little function that I'm thinking of using to generate XHTML code that's HTML compatible: function saveXHTML($dom) { $html = $dom-saveXML(null,LIBXML_NOEMPTYTAG); $html = str_replace(' ','',$html); $html = preg_replace('/\?xml[^]*\n/','',$html,1); $html = preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html); $html = preg_replace('/\/(meta|link|base|basefont|param|img|br|hr|area|input)/',' /',$html); return $html; } What do you think? If this will maintain utf-8 I might be able to use it :) which according to the last thread, saveHTML munges utf-8 stuff due to libxml... Hopefully this week I can give it a go. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
Hi Michael, --- On Sun, 4/12/09, Michael Shadle mike...@gmail.com wrote: If this will maintain utf-8 I might be able to use it :) which according to the last thread, saveHTML munges utf-8 stuff due to libxml... Hopefully this week I can give it a go. I think it should work just fine as saveXML produces utf-8 output. PS. Feel free to drop me a line as I would like to hear about your experience with utf-8 web pages. Best regards, __ Raymond Irving -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
It appears that the email system stripped out the #13; from this line: $html = str_replace('#13;','',$html); Best regards, __ Raymond Irving --- On Sun, 4/12/09, Raymond Irving xwis...@yahoo.com wrote: From: Raymond Irving xwis...@yahoo.com Subject: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument To: php-general@lists.php.net php-general@lists.php.net Date: Sunday, April 12, 2009, 11:07 AM Hello, After talking with Michael about how to generate XHTML code using the DOM I came up with this little function that I'm thinking of using to generate XHTML code that's HTML compatible: function saveXHTML($dom) { $html = $dom-saveXML(null,LIBXML_NOEMPTYTAG); $html = str_replace(' ','',$html); $html = preg_replace('/\?xml[^]*\n/','',$html,1); $html = preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html); $html = preg_replace('/\/(meta|link|base|basefont|param|img|br|hr|area|input)/',' /',$html); return $html; } What do you think? __ Raymond Irving -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php