I have to write a script to parse XML files we receive daily. The XML files are all
individual stories but there is an index page that comes with each batch that contains
blocks of information for each story as follows (below). I need to run through this
index file and for each story I need to grab the NewsItemID, the Time, and then the
>From there I need to then open up the individual stories and do some formatting but
>for now I need to get by this :) I was planning on line by line through the file but
>am not sure how I would go about grabbing the information I require. Sometimes there
>is a SourceFilepath but sometimes its missing.
Any help would be greatly appreciated.
<Comment NewsItemID="780023, " Time="28-05-02 13:43"/>
<CPIndexStoryHead>Chretien pushes Bush on softwood, agriculture, but gets no
<CPStoryPara Number="1" ParaSpace="FALSE">
(CP) - Prime Minister Jean Chretien said he pressed U.S. President George W. Bush on
Tuesday to address festering trade disputes between the two countries, but got no
assurances that disagreements over softwood lumber or agricultural subsidies would be
resolved. Chretien, who raised the matters after a NATO meeting in the Italian
capital, said he was "very forceful" with Bush. But he said the president blamed
Congress for the logjam.
<CPStoryPara Number="2" ParaSpace="FALSE">
"It's always like that when you deal with the president of the United States: 'Yes,
but the Congress and the Senate . . . ' In Canada you blame the prime minister or you
congratulate the prime minister because he cannot pass the buck to anyone else."
<CPLink Type="StoryFile" Number="1" SourceFilePath="./n052814A.xml"/>