I have a short script to parse my XML file. The parsing produces no error and 
all output looks good EXCEPT url-links were truncated IF it contain the '&' 
characters.

My XML file looks like this:
--- start of XML ---
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0">
<channel>
<title>Test News .Net - Newspapers on the Net</title>
<copyright>Small News Network.com</copyright>
<link>http://www.example.com/</link>
<description>Continuously updating Example News.</description>
<language>en-us</language>
<pubDate>Tue, 29 Mar 2005 18:01:01 -0600</pubDate>
<lastBuildDate>Tue, 29 Mar 2005 18:01:01 -0600</lastBuildDate>
<ttl>30</ttl>
<item>
<title>Group buys SunGard for US$10.4bil</title>
<link>http://feeds.example.com/?rid=318045f7e13e0b66&amp;cat=48cba686fe041718&amp;f=1</link>
<description>NEW YORK: A group of seven private equity investment firms agreed 
yesterday to buy financial technology company SunGard Data Systems Inc in a 
deal worth US$10.4bil plus debt, making it the biggest lev...</description>
<source url="http://biz.theexample.com/";>The Paper</source>
</item>
<item>
<title>Strong quake hits Indonesia coast</title>
<link>http://feeds.example.com/news/world/quake.html</link>
<description>a &quot;widely destructive tsunami&quot; and the quake was felt as 
far away as Malaysia.</description>
<source url="http://biz.theexample.com.net/";>The Paper</source>
</item>
<item>
<title>Final News</title>
<link>http://feeds.example.com/?id=abcdef&amp;cat=somecat</link>
<description>We are going to expect something new this weekend ...</description>
<source url="http://biz.theexample.com/";>The Paper</source>
</item>
</channel>
</rss>
--- end of XML ---

For the sake of testing, my script only print out the url-link to those news 
above. I got these:
f=1
http://feeds.example.com/news/world/quake.html
cat=somecat

The output for line 1 is truncated to 'f=1' and the output of line 3 is 
truncated to 'cat=somecat'. ie, the script only took the last parameter of the 
url-link. The output for line 2 is correct since it has NO parameters.

I am not sure what I have done wrong in my script. Is it bcos the RSS spec says 
that you cannot have parameters in URL ? Please advise.

-- start of script --
<?
$file = "test.xml";
$currentTag = "";

function startElement($parser, $name, $attrs) {
    global $currentTag;
    $currentTag = $name;
}

function endElement($parser, $name) {
    global $currentTag, $TITLE, $URL, $start;

    switch ($currentTag) {
        case "ITEM":
            $start = 0;
        case "LINK":
             if ($start == 1)
                 #print "<A HREF = \"".$URL."\">$TITLE</A><BR>";
                 print "$URL"."<BR>";
             break;
    }
   $currentTag = "";
}

function characterData($parser, $data) {
    global $currentTag, $TITLE, $URL, $start;

    switch ($currentTag) {
        case "ITEM":
            $start = 1;
        case "TITLE":
           $TITLE = $data;
           break;
        case "LINK":
            $URL = $data;
            break;
    }
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");

if (!($fp = fopen($file, "r"))) {
    die("Cannot locate XML data file: $file");
}

while ($data = fread($fp, 4096)) {
    if (!xml_parse($xml_parser, $data, feof($fp))) {
        die(sprintf("XML error: %s at line %d",
            xml_error_string(xml_get_error_code($xml_parser)),
            xml_get_current_line_number($xml_parser)));
    }
}

xml_parser_free($xml_parser);

?>
-- end of script --

TIA.
Roger


---------------------------------------------------
Sign Up for free Email at http://ureg.home.net.my/
---------------------------------------------------

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to