From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Paul Rousseau Sent: 04 February 2010 16:44 To: perl Win32-users Subject: HTML Parsing Question
> Hello, > > I have an HTML file that contains the following snippet: > > <group name="Group1" visible="true" wallpaper="false" toolTipText="" exposeToVba="notExposed" > isReferenceObject="true" linkSize="true" linkConnections="true" > linkAnimations="linkWithoutExpression" linkBaseObject="System_Bar.Group1"> > <button name="Button1" height="25" width="85" left="8" top="10" visible="true" toolTipText="" > exposeToVba="notExposed" isReferenceObject="true" linkSize="true" linkConnections="true" > linkAnimations="linkWithoutExpression" linkBaseObject="System_Bar.Button1" style="recessed" > captureCursor="false" highlightOnFocus="true" tabIndex="1"> > <command pressAction="Display Overview" repeatAction="" releaseAction="" > repeatRate="0.25"/> > <up patternColor="black" patternStyle="none" backColor="#ECE9D8" backStyle="solid" > foreColor="black"> > <caption fontFamily="Arial" fontSize="8" bold="true" italic="false" underline="false" > strikethrough="false" caption="Overviews"/> > <imageSettings imageReference="noImage"/> > </up> > <down downSameAsUp="true" patternColor="black" patternStyle="none" backColor="#ECE9D8" > backStyle="solid" foreColor="black"> > <caption fontFamily="Arial" fontSize="10" bold="false" italic="false" > underline="false" strikethrough="false" caption=""/> > <imageSettings imageReference="noImage"/> > </down> > </button> > <button name="Button24" height="25" width="85" left="95" top="10" visible="true" > toolTipText="" exposeToVba="notExposed" isReferenceObject="true" linkSize="true" > linkConnections="true" linkAnimations="linkWithoutExpression" linkBaseObject="System_Bar.Button24" > style="recessed" captureCursor="false" highlightOnFocus="true" tabIndex="2"> > <command pressAction="Display Oil_Treating_Overview" repeatAction="" releaseAction="" > repeatRate="0.25"/> > <up patternColor="black" patternStyle="none" backColor="#ECE9D8" backStyle="solid" > foreColor="black"> > <caption fontFamily="Arial" fontSize="8" bold="true" italic="false" underline="false" > strikethrough="false" caption="Oil Treating"/> > <imageSettings imageReference="noImage"/> > </up> > <down downSameAsUp="true" patternColor="black" patternStyle="none" backColor="#ECE9D8" > backStyle="solid" foreColor="black"> > <caption fontFamily="Arial" fontSize="10" bold="false" italic="false" > underline="false" strikethrough="false" caption=""/> > <imageSettings imageReference="noImage"/> > </down> > </button> > . > . > . > </group> I'm not an html expert, but is 'group' a valid tag? The parser seems to be silently ignoring it. It works if you change it group to form. > I want to obtain the parent name for all "button" tags. I am having trouble extracting the parent > name. For example, I can find "Button1" and "Button24" but I can not get to either item's parent name, > "Group1". > Here is my code so far. (I believe it's something trivial; I feel I am so close.) > > foreach $filename (@filenames) > { > print "filename is $filename"; > $tree = HTML::TreeBuilder->new(); > $tree->parse_file("$sourcedir\\$filename"); > @Ans = $tree->look_down('_tag' => 'button'); > foreach $button (@Ans) > { > print "button is " . $button->attr('name'); # prints "button is Button1" > print "button tag is " . $button->tag; # can also use ${$button}{_tag} # prints "button tag is > button" > print "button parent hash pointer is " . $button->parent; # can also use ${$button}{_parent} # > prints "button parent hash pointer is HTML::Element=HASH(0x1afa5c4)" > # > #print Dumper($button->parent); > # > foreach $key (keys %{$button->parent}) > { > print "key is $key"; # prints four keys: _parent, _content, _tag, _implicit > } > print "button _parent value is " . ${$button}{_parent}; # prints "button _parent value is > HTML::Element=HASH(0x1afa5c4)" > print "button _tag value is " . ${$button}{_tag}; # prints "button _tag value is button" > > $parent = $button->parent; > print "parent tag is " . ${$parent}{_tag}; # prints "parent tag is body" > print "parent content is " . ${$parent}{_content}; # prints "parent content is ARRAY(0x1afa7b0)" > foreach $key (@{${$parent}{_content}}) > { > # print "content item is $key"; > } > #print "parent name is " . $parent->attr('name'); > $groupid = $button->look_up('_tag', => 'group'); > # print "group id is " . $groupid; > } Alw2ays > # $tree->dump; > # print $tree->as_HTML; > } Apart from your code being poorly laid out (which may well be the result of emailing), you appear to either not be using 'use strict;' or you are declaring variables in a larger scope than necessary. Both of which are not recommended, if you will excuse the double negative. Always code with 'use strict; use warnings;' at the top of your code, and declare variables in the smallest scope necessary. HTH -- Brian Raven This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. _______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs