Hi. Thanks. For example, <a href="aaa0", alt="aaa1"><em>test1</em> <em>test2</em> I am a boy</a>
Here we have three nodes, 1. <a href="aaa0", alt="aaa1">I am a boy</a> 2. <em>test1</em> 3. <em>test2</em> Then, I want to analyze those nodes as follows. The tag of node 1 is "a". Its attributes are href and alt, which have "aaa0" and "aaa1" respectively Also, it has an anchor text, "I am a boy" The other two tags are "em", which has "test1" and "test2" as an anchor text. This kind of level is enough for me. Does anybody help me? In fact, I have created a sample code with a xpath example. For the simple html input, my code got the almost correct parsing result, but when I tried to parse a html from URL, which is, of course, more complex than a simple html, I got a weird data. In the above example, "I am a boy" is obviously an anchor text of the tag, "a". With this simple html, I get it that way. However, it have been interpreted that "I am a boy" is an anchor text of "em", if it is a part of a complex html. Can I say if a html is not well-formed, then the association between tag and anchor text is not sometimes handled properly? In other words, is there a possibility that a parsing tree is not perfectly correct if the html is not well-formed? In fact, I want to double-check if my way is right or not, seeing some general way of looking at html-parsed tree nodes that somebody may suggest. Thanks Date: Tue, 04 Aug 2009 08:51:42 +0200 From: Michael Ludwig <[email protected]> Subject: Re: [xml] Approach for parsing HTML file or URL To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Brian Kim schrieb: > I would like to parse html and see the content of html attributes in > each tag. > Using htmlreadfile function is quite obvious, but I guess there is > another way to see each node of parsed tree instead of using Xpath. Could you define what you mean by "seeing each node of the parsed tree"? Michael Ludwig _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
