Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-15 Thread Raymond Irving


Thanks for your feedback.

__
Raymond Irving

--- On Tue, 4/14/09, Michael A. Peters mpet...@mac.com wrote:

 From: Michael A. Peters mpet...@mac.com
 Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
 To: Raymond Irving xwis...@yahoo.com
 Cc: php-general@lists.php.net
 Date: Tuesday, April 14, 2009, 8:09 PM
 Raymond Irving wrote:
  Hi,
  
  I'm thinking about using the html5 doctype for all
 html documents since it's supported by all the popular
 browsers available today. 
  Two Quick questions... 
  Why do we need to send XHTML code to a web browser
 when standard html code (with html 5 doctype) will do just
 fine?
 
 In most cases we don't.
 However if we want to include extensions (such as MathML
 etc.) then xhtml is the only way to do it.
 
 My own reason for sending xhtml is because I believe it to
 be a superior specification and would like to see html
 (where not all tags need to be closed) go away.
 
 Having valid x(ht)ml output also means that other software
 that uses your web page as a source for data can just parse
 it as xml to get the data it needs.
 
 Be careful with html 5 - use the fallbacks (IE embed or
 object for video as a fallback to the video tag), because
 not everyone uses the latest browsers.
 
  
  Is there any advantage of using xhtml in the web
 browser over html for normal web application development?
  
 
 In most cases, not a display advantage.
 
 HTML 1.1 supports the ruby tags/attribute, html 4 does not,
 but with html 5 / xhtml 5 - they are supposedly identical in
 spec with the only difference being the markup semantics of
 xhtml 5 conform to xml standards. I suspect html 5
 elements/attributes are case insensitive (like they are for
 previous html) but I haven't checked - xhtml tags/attributes
 need to be lower case.
 
 But if your page can be properly displayed with valid html
 then the only technical advantage I can think of for using
 xhtml is for apps that use your page as a data source (so
 they don't have to convert it to xml).
 
 I personally will send xhtml most of the time when I can
 because I want HTML to go away, and as soon as 97% of
 browsers properly support xhtml, I may stop sending html all
 together. Since IE 8 still does not (not will correct mime
 type anyway) it will be years before that happens.
 
 Oh - another advantage to xhtml - it's easy to extend for
 your own use.
 For example, you can add a custom attribute for your own
 use (IE as hooks for other web apps on other sites to use
 when grabbing data from your site, or whatever) and it will
 validate as long as you properly declare it. With html, I
 believe adding an attribute is not allowed unless you create
 a whole new DTD.
 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-15 Thread Raymond Irving

Thanks for the feedback.

I too like xhtml but I think I like the option of serving both. My only concern 
is that a proxy server might cache an xhtml page and then serve it to a 
non-xhtml browser.

Do you think it's possible that a proxy might serve the xhtml source to the 
wrong browser?

__
Raymond Irving


--- On Tue, 4/14/09, Michael Shadle mike...@gmail.com wrote:

From: Michael Shadle mike...@gmail.com
Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
To: Raymond Irving xwis...@yahoo.com
Cc: php-general@lists.php.net php-general@lists.php.net
Date: Tuesday, April 14, 2009, 8:26 PM

As michael said my main reason is strictness. It's much easier to parse a 
document when an XML parser can read it. I like the idea of closing tags etc.

On Apr 14, 2009, at 4:38 PM, Raymond Irving xwis...@yahoo.com wrote:

 
 Hi,
 
 I'm thinking about using the html5 doctype for all html documents since it's 
 supported by all the popular browsers available today.
 
 Two Quick questions...
 
 Why do we need to send XHTML code to a web browser when standard html code 
 (with html 5 doctype) will do just fine?
 
 Is there any advantage of using xhtml in the web browser over html for normal 
 web application development?
 
 
 __
 Raymond Irving
 
 --- On Tue, 4/14/09, Peter Ford p...@justcroft.com wrote:
 
 From: Peter Ford p...@justcroft.com
 Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
 To: php-general@lists.php.net
 Date: Tuesday, April 14, 2009, 5:05 AM
 Michael Shadle wrote:
 On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters
 mpet...@mac.com
 wrote:
 
 The problem is that validating xhtml does not
 necessarily render properly in
 some browsers *cough*IE*cough*
 
 I've never had problems and my work is primarily
 around IE6 / our
 corporate standards. Hell, even without a script type
 it still works
 :)
 
 Would this function work for sending html and
 solve the utf8 problem?
 
 function makeHTML($document) {
    $buffer =
 $document-saveHTML();
    $output =
 html_entity_decode($buffer,ENT_QUOTES,UTF-8);
    return $output;
    }
 
 I'll try it and see what it does.
 
 this was the only workaround I received for the
 moment, and I was a
 bit afraid it would not process the full range of
 utf-8; it appeared
 on a quick check to work but I wanted to run it on our
 entire database
 and then ask the native geo folks to examine it for
 correctness.
 
 I find that IE7 (at least) is pretty reliable as long as I
 use strict XHTML and
 send a DOCTYPE header to that effect at the top - that
 seems to trigger a
 standard-compliant mode in IE7.
 At least then I only have to worry about the JavaScript
 incompatibilities, and
 the table model, and the event model, and 
 
 --Peter Ford
 
 phone: 01580 89
 Developer
 
    fax:   01580 893399
 Justcroft International Ltd., Staplehurst, Kent
 
 --PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 


Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-15 Thread Michael Shadle
I use XHTML 1.0 transitional and I've yet to have anyone tell me my  
sites don't work. Mobile and desktop browsers too. So I'm not sure  
that's an issue at all (?)


On Apr 15, 2009, at 6:31 PM, Raymond Irving xwis...@yahoo.com wrote:



Thanks for the feedback.

I too like xhtml but I think I like the option of serving both. My  
only concern is that a proxy server might cache an xhtml page and  
then serve it to a non-xhtml browser.


Do you think it's possible that a proxy might serve the xhtml source  
to the wrong browser?


__
Raymond Irving


--- On Tue, 4/14/09, Michael Shadle mike...@gmail.com wrote:

From: Michael Shadle mike...@gmail.com
Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using  
DOMDocument

To: Raymond Irving xwis...@yahoo.com
Cc: php-general@lists.php.net php-general@lists.php.net
Date: Tuesday, April 14, 2009, 8:26 PM

As michael said my main reason is strictness. It's much easier to  
parse a document when an XML parser can read it. I like the idea of  
closing tags etc.


On Apr 14, 2009, at 4:38 PM, Raymond Irving xwis...@yahoo.com wrote:



Hi,

I'm thinking about using the html5 doctype for all html documents  
since it's supported by all the popular browsers available today.


Two Quick questions...

Why do we need to send XHTML code to a web browser when standard  
html code (with html 5 doctype) will do just fine?


Is there any advantage of using xhtml in the web browser over html  
for normal web application development?



__
Raymond Irving

--- On Tue, 4/14/09, Peter Ford p...@justcroft.com wrote:


From: Peter Ford p...@justcroft.com
Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using  
DOMDocument

To: php-general@lists.php.net
Date: Tuesday, April 14, 2009, 5:05 AM
Michael Shadle wrote:

On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters

mpet...@mac.com
wrote:



The problem is that validating xhtml does not

necessarily render properly in

some browsers *cough*IE*cough*


I've never had problems and my work is primarily

around IE6 / our

corporate standards. Hell, even without a script type

it still works

:)


Would this function work for sending html and

solve the utf8 problem?


function makeHTML($document) {
$buffer =

$document-saveHTML();

$output =

html_entity_decode($buffer,ENT_QUOTES,UTF-8);

return $output;
}

I'll try it and see what it does.


this was the only workaround I received for the

moment, and I was a

bit afraid it would not process the full range of

utf-8; it appeared

on a quick check to work but I wanted to run it on our

entire database

and then ask the native geo folks to examine it for

correctness.

I find that IE7 (at least) is pretty reliable as long as I
use strict XHTML and
send a DOCTYPE header to that effect at the top - that
seems to trigger a
standard-compliant mode in IE7.
At least then I only have to worry about the JavaScript
incompatibilities, and
the table model, and the event model, and 

--Peter Ford

phone: 01580 89
Developer

fax:   01580 893399
Justcroft International Ltd., Staplehurst, Kent

--PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-14 Thread Peter Ford
Michael Shadle wrote:
 On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters mpet...@mac.com wrote:
 
 The problem is that validating xhtml does not necessarily render properly in
 some browsers *cough*IE*cough*
 
 I've never had problems and my work is primarily around IE6 / our
 corporate standards. Hell, even without a script type it still works
 :)
 
 Would this function work for sending html and solve the utf8 problem?

 function makeHTML($document) {
   $buffer = $document-saveHTML();
   $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8);
   return $output;
   }

 I'll try it and see what it does.
 
 this was the only workaround I received for the moment, and I was a
 bit afraid it would not process the full range of utf-8; it appeared
 on a quick check to work but I wanted to run it on our entire database
 and then ask the native geo folks to examine it for correctness.

I find that IE7 (at least) is pretty reliable as long as I use strict XHTML and
send a DOCTYPE header to that effect at the top - that seems to trigger a
standard-compliant mode in IE7.
At least then I only have to worry about the JavaScript incompatibilities, and
the table model, and the event model, and 

-- 
Peter Ford  phone: 01580 89
Developer   fax:   01580 893399
Justcroft International Ltd., Staplehurst, Kent

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-14 Thread Raymond Irving

Hi,

I'm thinking about using the html5 doctype for all html documents since it's 
supported by all the popular browsers available today. 

Two Quick questions... 

Why do we need to send XHTML code to a web browser when standard html code 
(with html 5 doctype) will do just fine?

Is there any advantage of using xhtml in the web browser over html for normal 
web application development?


__
Raymond Irving 

--- On Tue, 4/14/09, Peter Ford p...@justcroft.com wrote:

 From: Peter Ford p...@justcroft.com
 Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
 To: php-general@lists.php.net
 Date: Tuesday, April 14, 2009, 5:05 AM
 Michael Shadle wrote:
  On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters
 mpet...@mac.com
 wrote:
  
  The problem is that validating xhtml does not
 necessarily render properly in
  some browsers *cough*IE*cough*
  
  I've never had problems and my work is primarily
 around IE6 / our
  corporate standards. Hell, even without a script type
 it still works
  :)
  
  Would this function work for sending html and
 solve the utf8 problem?
 
  function makeHTML($document) {
    $buffer =
 $document-saveHTML();
    $output =
 html_entity_decode($buffer,ENT_QUOTES,UTF-8);
    return $output;
    }
 
  I'll try it and see what it does.
  
  this was the only workaround I received for the
 moment, and I was a
  bit afraid it would not process the full range of
 utf-8; it appeared
  on a quick check to work but I wanted to run it on our
 entire database
  and then ask the native geo folks to examine it for
 correctness.
 
 I find that IE7 (at least) is pretty reliable as long as I
 use strict XHTML and
 send a DOCTYPE header to that effect at the top - that
 seems to trigger a
 standard-compliant mode in IE7.
 At least then I only have to worry about the JavaScript
 incompatibilities, and
 the table model, and the event model, and 
 
 -- 
 Peter Ford             
                
 phone: 01580 89
 Developer             
              
    fax:   01580 893399
 Justcroft International Ltd., Staplehurst, Kent
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-14 Thread Michael A. Peters

Raymond Irving wrote:

Hi,

I'm thinking about using the html5 doctype for all html documents since it's supported by all the popular browsers available today. 

Two Quick questions... 


Why do we need to send XHTML code to a web browser when standard html code 
(with html 5 doctype) will do just fine?


In most cases we don't.
However if we want to include extensions (such as MathML etc.) then 
xhtml is the only way to do it.


My own reason for sending xhtml is because I believe it to be a superior 
specification and would like to see html (where not all tags need to be 
closed) go away.


Having valid x(ht)ml output also means that other software that uses 
your web page as a source for data can just parse it as xml to get the 
data it needs.


Be careful with html 5 - use the fallbacks (IE embed or object for video 
as a fallback to the video tag), because not everyone uses the latest 
browsers.




Is there any advantage of using xhtml in the web browser over html for normal 
web application development?



In most cases, not a display advantage.

HTML 1.1 supports the ruby tags/attribute, html 4 does not, but with 
html 5 / xhtml 5 - they are supposedly identical in spec with the only 
difference being the markup semantics of xhtml 5 conform to xml 
standards. I suspect html 5 elements/attributes are case insensitive 
(like they are for previous html) but I haven't checked - xhtml 
tags/attributes need to be lower case.


But if your page can be properly displayed with valid html then the only 
technical advantage I can think of for using xhtml is for apps that use 
your page as a data source (so they don't have to convert it to xml).


I personally will send xhtml most of the time when I can because I want 
HTML to go away, and as soon as 97% of browsers properly support xhtml, 
I may stop sending html all together. Since IE 8 still does not (not 
will correct mime type anyway) it will be years before that happens.


Oh - another advantage to xhtml - it's easy to extend for your own use.
For example, you can add a custom attribute for your own use (IE as 
hooks for other web apps on other sites to use when grabbing data from 
your site, or whatever) and it will validate as long as you properly 
declare it. With html, I believe adding an attribute is not allowed 
unless you create a whole new DTD.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-14 Thread Michael Shadle
As michael said my main reason is strictness. It's much easier to  
parse a document when an XML parser can read it. I like the idea of  
closing tags etc.


On Apr 14, 2009, at 4:38 PM, Raymond Irving xwis...@yahoo.com wrote:



Hi,

I'm thinking about using the html5 doctype for all html documents  
since it's supported by all the popular browsers available today.


Two Quick questions...

Why do we need to send XHTML code to a web browser when standard  
html code (with html 5 doctype) will do just fine?


Is there any advantage of using xhtml in the web browser over html  
for normal web application development?



__
Raymond Irving

--- On Tue, 4/14/09, Peter Ford p...@justcroft.com wrote:


From: Peter Ford p...@justcroft.com
Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using  
DOMDocument

To: php-general@lists.php.net
Date: Tuesday, April 14, 2009, 5:05 AM
Michael Shadle wrote:

On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters

mpet...@mac.com
wrote:



The problem is that validating xhtml does not

necessarily render properly in

some browsers *cough*IE*cough*


I've never had problems and my work is primarily

around IE6 / our

corporate standards. Hell, even without a script type

it still works

:)


Would this function work for sending html and

solve the utf8 problem?


function makeHTML($document) {
   $buffer =

$document-saveHTML();

   $output =

html_entity_decode($buffer,ENT_QUOTES,UTF-8);

   return $output;
   }

I'll try it and see what it does.


this was the only workaround I received for the

moment, and I was a

bit afraid it would not process the full range of

utf-8; it appeared

on a quick check to work but I wanted to run it on our

entire database

and then ask the native geo folks to examine it for

correctness.

I find that IE7 (at least) is pretty reliable as long as I
use strict XHTML and
send a DOCTYPE header to that effect at the top - that
seems to trigger a
standard-compliant mode in IE7.
At least then I only have to worry about the JavaScript
incompatibilities, and
the table model, and the event model, and 

--
Peter Ford

phone: 01580 89
Developer

   fax:   01580 893399
Justcroft International Ltd., Staplehurst, Kent

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Michael A. Peters

Raymond Irving wrote:

Hello,

After talking with Michael about how to generate XHTML code using the DOM I 
came up with this little function that I'm thinking of using to generate XHTML 
code that's HTML compatible:

function saveXHTML($dom) {
$html = $dom-saveXML(null,LIBXML_NOEMPTYTAG);
$html = str_replace('#13;','',$html);
$html = preg_replace('/\?xml[^]*\n/','',$html,1);
$html = 
preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html);
$html = 
preg_replace('/\/(meta|link|base|basefont|param|img|br|hr|area|input)/',' 
/',$html);
return $html;
}

What do you think?


__
Raymond Irving



I found this in my old files - don't know if any of it is useful to you -

function HTMLify($buffer) {
   $xhtml[] = '/script([^]*)\//';
   $html[]  = 'script\\1/script';

   $xhtml[] = '/div([^]*)\//';
   $html[]  = 'div\\1/div';

   $xhtml[] = '/a([^]*)\//';
   $html[]  = 'a\\1/a';

   $xhtml[] = '/\//';
   $html[]  = '';

// DOMDocument never produces white space between / and  on self 
closing tags

//   $xhtml[] = '/\/\s+/';
//   $html[]  = '';

   return preg_replace($xhtml, $html, $buffer);
   }

I think I actually had extended the function beyond the replacements 
there, but I don't seem to have them anymore.


What really would be nice is to patch the libxml2 library to add an 
option to change it's default behaviour of screwing up utf8 on html 
export and then be able to pass that option to saveHTML() so the utf8 
issue would go away.


That's way beyond my skill though. Has anyone brought up the issue with 
the libxml developers?


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Michael Shadle
On Sun, Apr 12, 2009 at 8:07 AM, Raymond Irving xwis...@yahoo.com wrote:

    $html = 
 preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html);

question -

the output of this would be

script type=text/javascript![CDATAjs code ... ]]/script right?

is the cdata truly necessary? I typically use XHTML 1.0 transitional
and I don't have problems validating.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Michael A. Peters

Michael Shadle wrote:

On Sun, Apr 12, 2009 at 8:07 AM, Raymond Irving xwis...@yahoo.com wrote:


   $html = 
preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html);


question -

the output of this would be

script type=text/javascript![CDATAjs code ... ]]/script right?

is the cdata truly necessary? I typically use XHTML 1.0 transitional
and I don't have problems validating.



The problem is that validating xhtml does not necessarily render 
properly in some browsers *cough*IE*cough*


That's why I prefer to send html 4.01 to those browsers.

Would this function work for sending html and solve the utf8 problem?

function makeHTML($document) {
   $buffer = $document-saveHTML();
   $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8);
   return $output;
   }

I'll try it and see what it does.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Michael A. Peters

Michael A. Peters wrote:



function makeHTML($document) {
   $buffer = $document-saveHTML();
   $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8);
   return $output;
   }

I'll try it and see what it does.



Huh - not tried above yet - but with

$test = $myxhtml-createElement('p','שלום');
$xmlBody-appendChild($test);

both saveXML() and saveHTML() do the right thing.

However if I have the string

pשלום/p

and load it into a DOM -

With loadHTML() the utf8 is lost regardless of whether I use saveXML() 
or saveHTML()


With loadXML() the utf8 is preserved regardless of whether or not I use 
saveXML() or saveHTML()


php 5.2.9
libxml2 2.6.26-2.1.2.7 (CentOS 5.3)

I wonder if the real utf8 problem people experience is really with 
loadHTML() and not with saveHTML() ??


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Michael A. Peters

Michael A. Peters wrote:



I wonder if the real utf8 problem people experience is really with 
loadHTML() and not with saveHTML() ??




Go to http://www.clfsrpm.net/xss/dom_script_test.php

The page was meant to test something else but enter some UTF-8 into the 
textarea (well formed xhtml) -


With IE - html 4 is the only thing it will output but with FireFox and 
Opera, by default it outputs xhtml but you can check a box to force html.


When it outputs html it uses saveHTML()
When it outputs xhtml it uses saveXML()

The source is linked there so you can see.

Anyway - it looks saveHTML() works fine with UTF8 as long as loadXML() 
was used to load the data.


I've tried both Hebrew and Polytonic Greek.
It outputs correctly regardless of html or xhtml - but only after I 
changed the code to load the textarea as xml opposed to html.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Raymond Irving


Michael,

You are absolutely right! It's loadHTML() that's causing the problems.


Best regards,
__
Raymond Irving


--- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote:

 From: Michael A. Peters mpet...@mac.com
 Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
 To: Michael Shadle mike...@gmail.com
 Cc: Raymond Irving xwis...@yahoo.com, php-general@lists.php.net 
 php-general@lists.php.net
 Date: Monday, April 13, 2009, 5:36 AM
 Michael A. Peters wrote:
 
  
  function makeHTML($document) {
     $buffer = $document-saveHTML();
     $output =
 html_entity_decode($buffer,ENT_QUOTES,UTF-8);
     return $output;
     }
  
  I'll try it and see what it does.
  
 
 Huh - not tried above yet - but with
 
 $test = $myxhtml-createElement('p','שלום');
 $xmlBody-appendChild($test);
 
 both saveXML() and saveHTML() do the right thing.
 
 However if I have the string
 
 pשלום/p
 
 and load it into a DOM -
 
 With loadHTML() the utf8 is lost regardless of whether I
 use saveXML() or saveHTML()
 
 With loadXML() the utf8 is preserved regardless of whether
 or not I use saveXML() or saveHTML()
 
 php 5.2.9
 libxml2 2.6.26-2.1.2.7 (CentOS 5.3)
 
 I wonder if the real utf8 problem people experience is
 really with loadHTML() and not with saveHTML() ??


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Michael Shadle

Well this is an interesting turn of events :)

We should now run over to the libxml folks and see if there is  
anything that can be done.


There *are* encoding options when you setup the domdocument so it  
seems like the options are there but not working properly for one  
reason or another.


On Apr 13, 2009, at 8:01 AM, Raymond Irving xwis...@yahoo.com wrote:




Michael,

You are absolutely right! It's loadHTML() that's causing the problems.


Best regards,
__
Raymond Irving


--- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote:


From: Michael A. Peters mpet...@mac.com
Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using  
DOMDocument

To: Michael Shadle mike...@gmail.com
Cc: Raymond Irving xwis...@yahoo.com, php- 
gene...@lists.php.net php-general@lists.php.net

Date: Monday, April 13, 2009, 5:36 AM
Michael A. Peters wrote:



function makeHTML($document) {
$buffer = $document-saveHTML();
$output =

html_entity_decode($buffer,ENT_QUOTES,UTF-8);

return $output;
}

I'll try it and see what it does.



Huh - not tried above yet - but with

$test = $myxhtml-createElement('p','שלום');
$xmlBody-appendChild($test);

both saveXML() and saveHTML() do the right thing.

However if I have the string

pשלום/p

and load it into a DOM -

With loadHTML() the utf8 is lost regardless of whether I
use saveXML() or saveHTML()

With loadXML() the utf8 is preserved regardless of whether
or not I use saveXML() or saveHTML()

php 5.2.9
libxml2 2.6.26-2.1.2.7 (CentOS 5.3)

I wonder if the real utf8 problem people experience is
really with loadHTML() and not with saveHTML() ??



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Michael Shadle
I will say though this negates the reason I chose to use domdocument  
to begin with. I am feeding it snippets of HTML that usually do not  
validate and I am not sure I want to run it through tidy first to  
convert from HTML to XHTML to run the domdocument and then convert it  
back... I am essentially using this to traverse the DOM and process  
all a href and img src attributes for a link remapping job. (also  
realizing the power of php's DOM for other things I used to try tidy  
and then use simplexml when doing HTML scraping ...) but php's dom  
allows me to give it absolutely crappy HTML and it still works.


However if someone has a nice regular expression or chunk of code that  
allows you to scan a doc for a href and then replaces them in the  
proper context (not just globally) that would work too. I can't just  
blindly find urls and then replace them (although the reason for this  
escapes me right now)


On Apr 13, 2009, at 8:01 AM, Raymond Irving xwis...@yahoo.com wrote:




Michael,

You are absolutely right! It's loadHTML() that's causing the problems.


Best regards,
__
Raymond Irving


--- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote:


From: Michael A. Peters mpet...@mac.com
Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using  
DOMDocument

To: Michael Shadle mike...@gmail.com
Cc: Raymond Irving xwis...@yahoo.com, php- 
gene...@lists.php.net php-general@lists.php.net

Date: Monday, April 13, 2009, 5:36 AM
Michael A. Peters wrote:



function makeHTML($document) {
$buffer = $document-saveHTML();
$output =

html_entity_decode($buffer,ENT_QUOTES,UTF-8);

return $output;
}

I'll try it and see what it does.



Huh - not tried above yet - but with

$test = $myxhtml-createElement('p','שלום');
$xmlBody-appendChild($test);

both saveXML() and saveHTML() do the right thing.

However if I have the string

pשלום/p

and load it into a DOM -

With loadHTML() the utf8 is lost regardless of whether I
use saveXML() or saveHTML()

With loadXML() the utf8 is preserved regardless of whether
or not I use saveXML() or saveHTML()

php 5.2.9
libxml2 2.6.26-2.1.2.7 (CentOS 5.3)

I wonder if the real utf8 problem people experience is
really with loadHTML() and not with saveHTML() ??



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Michael Shadle
On Mon, Apr 13, 2009 at 2:19 AM, Michael A. Peters mpet...@mac.com wrote:

 The problem is that validating xhtml does not necessarily render properly in
 some browsers *cough*IE*cough*

I've never had problems and my work is primarily around IE6 / our
corporate standards. Hell, even without a script type it still works
:)

 Would this function work for sending html and solve the utf8 problem?

 function makeHTML($document) {
   $buffer = $document-saveHTML();
   $output = html_entity_decode($buffer,ENT_QUOTES,UTF-8);
   return $output;
   }

 I'll try it and see what it does.

this was the only workaround I received for the moment, and I was a
bit afraid it would not process the full range of utf-8; it appeared
on a quick check to work but I wanted to run it on our entire database
and then ask the native geo folks to examine it for correctness.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Raymond Irving


Hi Michael,

You migth want to check out the Raxan PDI (Programmable Document Interface) 
framework. It works like a charm iwth html snippets:

example:

$page['body']-appned('pשלום/p');
// this will append the p to the html body

Here's the link: http://raxanpdi.com

For online examples checkout: http://raxanpdi.com/examples.html

__
Raymond Irving




--- On Mon, 4/13/09, Michael Shadle mike...@gmail.com wrote:

 From: Michael Shadle mike...@gmail.com
 Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
 To: Raymond Irving xwis...@yahoo.com
 Cc: php-general@lists.php.net php-general@lists.php.net
 Date: Monday, April 13, 2009, 11:34 AM
 I will say though this negates the
 reason I chose to use domdocument to begin with. I am
 feeding it snippets of HTML that usually do not validate and
 I am not sure I want to run it through tidy first to convert
 from HTML to XHTML to run the domdocument and then convert
 it back... I am essentially using this to traverse the DOM
 and process all a href and img src attributes for a link
 remapping job. (also realizing the power of php's DOM for
 other things I used to try tidy and then use simplexml when
 doing HTML scraping ...) but php's dom allows me to give it
 absolutely crappy HTML and it still works.
 
 However if someone has a nice regular expression or chunk
 of code that allows you to scan a doc for a href and then
 replaces them in the proper context (not just globally) that
 would work too. I can't just blindly find urls and then
 replace them (although the reason for this escapes me right
 now)
 
 On Apr 13, 2009, at 8:01 AM, Raymond Irving xwis...@yahoo.com
 wrote:
 
  
  
  Michael,
  
  You are absolutely right! It's loadHTML() that's
 causing the problems.
  
  
  Best regards,
  __
  Raymond Irving
  
  
  --- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com
 wrote:
  
  From: Michael A. Peters mpet...@mac.com
  Subject: Re: [PHP] Generate XHTML (HTML
 compatible) Code using DOMDocument
  To: Michael Shadle mike...@gmail.com
  Cc: Raymond Irving xwis...@yahoo.com,
 php-general@lists.php.net
 php-general@lists.php.net
  Date: Monday, April 13, 2009, 5:36 AM
  Michael A. Peters wrote:
  
  
  function makeHTML($document) {
      $buffer =
 $document-saveHTML();
      $output =
  html_entity_decode($buffer,ENT_QUOTES,UTF-8);
      return $output;
      }
  
  I'll try it and see what it does.
  
  
  Huh - not tried above yet - but with
  
  $test =
 $myxhtml-createElement('p','שלום');
  $xmlBody-appendChild($test);
  
  both saveXML() and saveHTML() do the right thing.
  
  However if I have the string
  
  pשלום/p
  
  and load it into a DOM -
  
  With loadHTML() the utf8 is lost regardless of
 whether I
  use saveXML() or saveHTML()
  
  With loadXML() the utf8 is preserved regardless of
 whether
  or not I use saveXML() or saveHTML()
  
  php 5.2.9
  libxml2 2.6.26-2.1.2.7 (CentOS 5.3)
  
  I wonder if the real utf8 problem people
 experience is
  really with loadHTML() and not with saveHTML() ??
  
  
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
  


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-13 Thread Raymond Irving

Hi Michael,

Your script works fine. The only problem I'm having is that it does not support 
html entities. 

The following code will cause the page to crash:

pcopy;/p

I think that's because you're using loadXML and not loadHTML.

Has anyone from the dev team contacted the libxml guys about the utf-8 issue 
with loadHTML()? 


__
Raymond Irving

--- On Mon, 4/13/09, Michael A. Peters mpet...@mac.com wrote:

 From: Michael A. Peters mpet...@mac.com
 Subject: Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
 To: Michael Shadle mike...@gmail.com
 Cc: Raymond Irving xwis...@yahoo.com, php-general@lists.php.net 
 php-general@lists.php.net
 Date: Monday, April 13, 2009, 5:56 AM
 Michael A. Peters wrote:
 
  
  I wonder if the real utf8 problem people experience is
 really with loadHTML() and not with saveHTML() ??
  
 
 Go to http://www.clfsrpm.net/xss/dom_script_test.php
 
 The page was meant to test something else but enter some
 UTF-8 into the textarea (well formed xhtml) -
 
 With IE - html 4 is the only thing it will output but with
 FireFox and Opera, by default it outputs xhtml but you can
 check a box to force html.
 
 When it outputs html it uses saveHTML()
 When it outputs xhtml it uses saveXML()
 
 The source is linked there so you can see.
 
 Anyway - it looks saveHTML() works fine with UTF8 as long
 as loadXML() was used to load the data.
 
 I've tried both Hebrew and Polytonic Greek.
 It outputs correctly regardless of html or xhtml - but only
 after I changed the code to load the textarea as xml opposed
 to html.
 
 -- PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-12 Thread Raymond Irving

Hello,

After talking with Michael about how to generate XHTML code using the DOM I 
came up with this little function that I'm thinking of using to generate XHTML 
code that's HTML compatible:

function saveXHTML($dom) {
$html = $dom-saveXML(null,LIBXML_NOEMPTYTAG);
$html = str_replace('#13;','',$html);
$html = preg_replace('/\?xml[^]*\n/','',$html,1);
$html = 
preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html);
$html = 
preg_replace('/\/(meta|link|base|basefont|param|img|br|hr|area|input)/',' 
/',$html);
return $html;
}

What do you think?


__
Raymond Irving

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-12 Thread Michael Shadle
On Sun, Apr 12, 2009 at 8:07 AM, Raymond Irving xwis...@yahoo.com wrote:

 Hello,

 After talking with Michael about how to generate XHTML code using the DOM I 
 came up with this little function that I'm thinking of using to generate 
 XHTML code that's HTML compatible:

 function saveXHTML($dom) {
    $html = $dom-saveXML(null,LIBXML_NOEMPTYTAG);
    $html = str_replace('
 ','',$html);
    $html = preg_replace('/\?xml[^]*\n/','',$html,1);
    $html = 
 preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html);
    $html = 
 preg_replace('/\/(meta|link|base|basefont|param|img|br|hr|area|input)/',' 
 /',$html);
    return $html;
 }

 What do you think?

If this will maintain utf-8 I might be able to use it :) which
according to the last thread, saveHTML munges utf-8 stuff due to
libxml...

Hopefully this week I can give it a go.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-12 Thread Raymond Irving

Hi Michael,

--- On Sun, 4/12/09, Michael Shadle mike...@gmail.com wrote:
 If this will maintain utf-8 I might be able to use it :)
 which
 according to the last thread, saveHTML munges utf-8 stuff
 due to
 libxml...
 
 Hopefully this week I can give it a go.

I think it should work just fine as saveXML produces utf-8 output. 

PS. Feel free to drop me a line as I would like to hear about your experience 
with utf-8 web pages.


Best regards,
__
Raymond Irving


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument

2009-04-12 Thread Raymond Irving

It appears that the email system stripped out the  #13; from this line:

$html = str_replace('#13;','',$html);


Best regards,
__
Raymond Irving


--- On Sun, 4/12/09, Raymond Irving xwis...@yahoo.com wrote:

 From: Raymond Irving xwis...@yahoo.com
 Subject: [PHP] Generate XHTML (HTML compatible) Code using DOMDocument
 To: php-general@lists.php.net php-general@lists.php.net
 Date: Sunday, April 12, 2009, 11:07 AM
 
 Hello,
 
 After talking with Michael about how to generate XHTML code
 using the DOM I came up with this little function that I'm
 thinking of using to generate XHTML code that's HTML
 compatible:
 
 function saveXHTML($dom) {
     $html =
 $dom-saveXML(null,LIBXML_NOEMPTYTAG);
     $html = str_replace('
','',$html);
     $html =
 preg_replace('/\?xml[^]*\n/','',$html,1);
     $html =
 preg_replace('/\!\[CDATA\[(.*)\]\]\/script/s','//![CDATA[\1//]]/script',$html);
     $html =
 preg_replace('/\/(meta|link|base|basefont|param|img|br|hr|area|input)/','
 /',$html);
     return $html;
 }
 
 What do you think?
 
 
 __
 Raymond Irving
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php