ID:               30257
 Comment by:       olivier at samalyse dot com
 Reported By:      christoffer at natlikan dot se
 Status:           Open
 Bug Type:         XML related
 Operating System: Windows XP
 PHP Version:      5CVS-2004-09-27 (dev)
 New Comment:

I'm experiencing similar troubles with xml_get_current_byte_index().
But I don't agree with the expected result christoffer proposes.

Actually, in php4 xml_get_current_byte_index() behaves perfectly to me.
Using the test code below with php version 4.3.4 produces :

ElementOpen - Row: 2 Col: 0 BIndex: 44
ElementOpen - Row: 4 Col: 1 BIndex: 61
ElementOpen - Row: 5 Col: 2 BIndex: 67
ElementClose - Row: 5 Col: 8 BIndex: 73
ElementClose - Row: 6 Col: 1 BIndex: 79
ElementClose - Row: 7 Col: 0 BIndex: 84

Byte Index 44 points at the beginning of the <a> tag : 
       <a b="x">
       ^

That is fine.

Now, if you omit the xml declaration '<?xml version="1.0"
encoding="ISO-8859-1"?>', using php5, you will obtain :

ElementOpen - Row: 1 Col: 5 BIndex: 8
ElementOpen - Row: 3 Col: 8 BIndex: 19
ElementOpen - Row: 4 Col: 11 BIndex: 25
ElementClose - Row: 4 Col: 15 BIndex: 33
ElementClose - Row: 5 Col: 18 BIndex: 39
ElementClose - Row: 6 Col: 21 BIndex: 44

Byte index 8 does not point at the beginning of the tag anymore, but at
its end :
       <a b="x">
               ^

In my particular case (XML indexing/marshalling) that's far less
usable. Some may consider that's no bug, but it breaks backward
compatibility with php4.

Now, if you let the xml declaration in place, this function does not
produce anything relevant anymore. As Christoffer reports, parsing this
xml document when it includes '<?xml version="1.0"
encoding="ISO-8859-1"?>' will produce :

ElementOpen - Row: 2 Col:  5 BIndex: 11
ElementOpen - Row: 4 Col:  8 BIndex: 22
ElementOpen - Row: 5 Col: 11 BIndex: 28
ElementClose - Row: 5 Col: 15 BIndex: 36
ElementClose - Row: 6 Col: 18 BIndex: 42
ElementClose - Row: 7 Col: 21 BIndex: 47

In this later case, what seems to happen is that the xml declaration
byte length is badly evaluated. If you add to this the fact that the
returned byte index points at the end of the tag (php5) instead of the
beginning of the tag (php4), it really starts to look like random
output...


Previous Comments:
------------------------------------------------------------------------

[2004-09-27 20:36:13] christoffer at natlikan dot se

Description:
------------
xml_get_current_byte_index and xml_get_current_column_number returns
unexpected values when the XML contains a XML declaration. Using
php5.0-win32-200409270830 and Apache/1.3.31.


Reproduce code:
---------------
<?php
        function elementOpen($parser, $elementName, $attributes) {
                echo("ElementOpen - Row: " . xml_get_current_line_number($parser) .
                        " Col: " . xml_get_current_column_number($parser) .
                        " BIndex: " . xml_get_current_byte_index($parser) . "<br />");
        }
        
        function elementClose($parser, $elementName) {
                echo("ElementClose - Row: " . xml_get_current_line_number($parser) .
                        " Col: " . xml_get_current_column_number($parser) .
                        " BIndex: " . xml_get_current_byte_index($parser) . "<br />");
        }
        
        $parser = xml_parser_create();
        xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
        xml_set_element_handler($parser, "elementOpen", "elementClose");

        $xml = 
                "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n" .
                "<a b=\"x\">\n" .
                        "\ttest\n" .
                        "\t<c>\n" .
                                "\t\t<d>foo</d>\n" .
                        "\t</c>\n" .
                "</a>";
        
        xml_parse($parser, $xml);
        xml_parser_free($parser);
?>

Expected result:
----------------
ElementOpen - Row: 2 Col: 10 BIndex: 52
ElementOpen - Row: 4 Col:  8 BIndex: 63
ElementOpen - Row: 5 Col: 11 BIndex: 69
ElementClose - Row: 5 Col:  9 BIndex: 73
ElementClose - Row: 6 Col:  2 BIndex: 79
ElementClose - Row: 7 Col:  1 BIndex: 85

Actual result:
--------------
ElementOpen - Row: 2 Col:  5 BIndex: 11
ElementOpen - Row: 4 Col:  8 BIndex: 22
ElementOpen - Row: 5 Col: 11 BIndex: 28
ElementClose - Row: 5 Col: 15 BIndex: 36
ElementClose - Row: 6 Col: 18 BIndex: 42
ElementClose - Row: 7 Col: 21 BIndex: 47


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=30257&edit=1

Reply via email to