Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
Hello, You might also want to try using the Raxan framework: require_once 'raxan/pdi/gateway.php'; $page = new RichWebPage('page.html'); echo $page['a']-text(); // this will get the text betwen the a tag To get the image element use: $elm = $page['a img']-node(0); You can download Raxan here: http://raxanpdi.com/downloads.html __ Raymond Irving --- On Sat, 8/22/09, Manuel Lemos mle...@acm.org wrote: From: Manuel Lemos mle...@acm.org Subject: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute To: chrysanhy phpli...@hyphusonline.com Cc: php-general@lists.php.net Date: Saturday, August 22, 2009, 1:07 AM Hello, on 08/16/2009 04:33 AM chrysanhy said the following: I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks You may want to try this HTML parser class that comes with filter class and an example script named test_get_html_links.php that does exactly what you ask. http://www.phpclasses.org/secure-html-filter -- Regards, Manuel Lemos Find and post PHP jobs http://www.phpclasses.org/jobs/ PHP Classes - Free ready to use OOP components written in PHP http://www.phpclasses.org/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
Hello, on 08/16/2009 04:33 AM chrysanhy said the following: I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks You may want to try this HTML parser class that comes with filter class and an example script named test_get_html_links.php that does exactly what you ask. http://www.phpclasses.org/secure-html-filter -- Regards, Manuel Lemos Find and post PHP jobs http://www.phpclasses.org/jobs/ PHP Classes - Free ready to use OOP components written in PHP http://www.phpclasses.org/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
try $link-nodeValue() or $link-getContent() im not shure which one works on an image link which is indeed a child of a so u could also check if the node has a child, if so its an image with, in good practice. an alt attribute to use haven't tried but should work. let me know pls ralph_def...@yahoo.de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com... I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
It did not work. Both gave me a Call to undefined method fatal error. On Sun, Aug 16, 2009 at 1:43 AM, Ralph Deffke ralph_def...@yahoo.de wrote: try $link-nodeValue() or $link-getContent() im not shure which one works on an image link which is indeed a child of a so u could also check if the node has a child, if so its an image with, in good practice. an alt attribute to use haven't tried but should work. let me know pls ralph_def...@yahoo.de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com... I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
did u try it something like this foreach ($links as $link) { $int_url_list[$i][href] = $link-getAttribute( 'href' ); $int_url_list[$i++][linkText] = $link-getContent( ); // nodeValue(); } that should work send ur code then please ralph_def...@yahoo,de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com... I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
WHile waiting for suggestions for extracting the link text from the DOM, I tried a brute force approach using the URLs I had found with getAttribute(), but found myself baffled by my results. I boiled down my issue with this approach to the following snippet. $htmldata =EOB http://www.protools.com/users/user_story.cfm?story_id=1162amp;lang=1;quot;Creating Surround Mixes with Tim Weidner/aquot; img height=11 src=new.gif width=28 - iMagnification/i engineer talks about mixing the album at the iProTools/i site, by Jim Batchco http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html;quot;Don't Goquot; Video/aa href= http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html;/a img height=11 src=new.gif width=28 - Presented by Beyond Music (a href=http://www.apple.com/quicktime/download/;QuickTime/a Required) EOB; $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html'; $posn = strpos($url, $htmldata); echo URL |$url| position is |$posn|; Running this gives me: URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is || I've tried lots of functions, and even regular expressions, but I cannot get the code to find the URL in the HTML. While I still hope for a DOM solution to getting this link text, WHY can't the code find the URL in the HTML snippet? On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy phpli...@hyphusonline.comwrote: I pasted the code exactly as you have it, and I got the following: *Fatal error*: Call to undefined method DOMElement::getContent() I got the same thing with nodeValue(). On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke ralph_def...@yahoo.dewrote: did u try it something like this foreach ($links as $link) { $int_url_list[$i][href] = $link-getAttribute( 'href' ); $int_url_list[$i++][linkText] = $link-getContent( ); // nodeValue(); } that should work send ur code then please ralph_def...@yahoo,de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com... I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
well the immage goes inside the a.. img... /a on ur html the node a has no value however u should not get a error this is pergect jtml link a href=thema.htmimg src=button4.jpg width=160 height=34 border=0 alt=THEMA/a ralph chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160943t2254137fve43771c7e4f8c...@mail.gmail.com... WHile waiting for suggestions for extracting the link text from the DOM, I tried a brute force approach using the URLs I had found with getAttribute(), but found myself baffled by my results. I boiled down my issue with this approach to the following snippet. $htmldata =EOB http://www.protools.com/users/user_story.cfm?story_id=1162amp;lang=1;quot;Creating Surround Mixes with Tim Weidner/aquot; img height=11 src=new.gif width=28 - iMagnification/i engineer talks about mixing the album at the iProTools/i site, by Jim Batchco http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html;quot;Don't Goquot; Video/aa href= http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html;/a img height=11 src=new.gif width=28 - Presented by Beyond Music (a href=http://www.apple.com/quicktime/download/;QuickTime/a Required) EOB; $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html'; $posn = strpos($url, $htmldata); echo URL |$url| position is |$posn|; Running this gives me: URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is || I've tried lots of functions, and even regular expressions, but I cannot get the code to find the URL in the HTML. While I still hope for a DOM solution to getting this link text, WHY can't the code find the URL in the HTML snippet? On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy phpli...@hyphusonline.comwrote: I pasted the code exactly as you have it, and I got the following: *Fatal error*: Call to undefined method DOMElement::getContent() I got the same thing with nodeValue(). On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke ralph_def...@yahoo.dewrote: did u try it something like this foreach ($links as $link) { $int_url_list[$i][href] = $link-getAttribute( 'href' ); $int_url_list[$i++][linkText] = $link-getContent( ); // nodeValue(); } that should work send ur code then please ralph_def...@yahoo,de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com... I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
this worked here: ?php $html = new DOMDocument(); $html-loadHtmlFile(testHtml.html); $links = $html-getElementsByTagName('a'); echo pre; foreach ($links as $item) { echo $item-getAttribute( 'href' ). \n; echo --- . $item-nodeValue . \n; } echo /pre; ? Im sending u the 2 files directly in a minute. it came out, as I thought earlier that u have to check if the a tags has got children to extract image links. ralph_def...@yahoo.de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160943t2254137fve43771c7e4f8c...@mail.gmail.com... WHile waiting for suggestions for extracting the link text from the DOM, I tried a brute force approach using the URLs I had found with getAttribute(), but found myself baffled by my results. I boiled down my issue with this approach to the following snippet. $htmldata =EOB http://www.protools.com/users/user_story.cfm?story_id=1162amp;lang=1;quot;Creating Surround Mixes with Tim Weidner/aquot; img height=11 src=new.gif width=28 - iMagnification/i engineer talks about mixing the album at the iProTools/i site, by Jim Batchco http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html;quot;Don't Goquot; Video/aa href= http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html;/a img height=11 src=new.gif width=28 - Presented by Beyond Music (a href=http://www.apple.com/quicktime/download/;QuickTime/a Required) EOB; $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html'; $posn = strpos($url, $htmldata); echo URL |$url| position is |$posn|; Running this gives me: URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is || I've tried lots of functions, and even regular expressions, but I cannot get the code to find the URL in the HTML. While I still hope for a DOM solution to getting this link text, WHY can't the code find the URL in the HTML snippet? On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy phpli...@hyphusonline.comwrote: I pasted the code exactly as you have it, and I got the following: *Fatal error*: Call to undefined method DOMElement::getContent() I got the same thing with nodeValue(). On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke ralph_def...@yahoo.dewrote: did u try it something like this foreach ($links as $link) { $int_url_list[$i][href] = $link-getAttribute( 'href' ); $int_url_list[$i++][linkText] = $link-getContent( ); // nodeValue(); } that should work send ur code then please ralph_def...@yahoo,de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com... I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute
The code snippet below worked! Thank you so much for your time helping me with this! On Sun, Aug 16, 2009 at 11:26 AM, Ralph Deffke ralph_def...@yahoo.dewrote: this worked here: ?php $html = new DOMDocument(); $html-loadHtmlFile(testHtml.html); $links = $html-getElementsByTagName('a'); echo pre; foreach ($links as $item) { echo $item-getAttribute( 'href' ). \n; echo --- . $item-nodeValue . \n; } echo /pre; ? Im sending u the 2 files directly in a minute. it came out, as I thought earlier that u have to check if the a tags has got children to extract image links. ralph_def...@yahoo.de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160943t2254137fve43771c7e4f8c...@mail.gmail.com... WHile waiting for suggestions for extracting the link text from the DOM, I tried a brute force approach using the URLs I had found with getAttribute(), but found myself baffled by my results. I boiled down my issue with this approach to the following snippet. $htmldata =EOB http://www.protools.com/users/user_story.cfm?story_id=1162amp;lang=1 quot;Creating Surround Mixes with Tim Weidner/aquot; img height=11 src=new.gif width=28 - iMagnification/i engineer talks about mixing the album at the iProTools/i site, by Jim Batchco http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html;quot;Don't Goquot; Video/aa href= http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html /a img height=11 src=new.gif width=28 - Presented by Beyond Music (a href=http://www.apple.com/quicktime/download/;QuickTime/a Required) EOB; $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html'; $posn = strpos($url, $htmldata); echo URL |$url| position is |$posn|; Running this gives me: URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html|http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html%7Cposition is || I've tried lots of functions, and even regular expressions, but I cannot get the code to find the URL in the HTML. While I still hope for a DOM solution to getting this link text, WHY can't the code find the URL in the HTML snippet? On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy phpli...@hyphusonline.comwrote: I pasted the code exactly as you have it, and I got the following: *Fatal error*: Call to undefined method DOMElement::getContent() I got the same thing with nodeValue(). On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke ralph_def...@yahoo.dewrote: did u try it something like this foreach ($links as $link) { $int_url_list[$i][href] = $link-getAttribute( 'href' ); $int_url_list[$i++][linkText] = $link-getContent( ); // nodeValue(); } that should work send ur code then please ralph_def...@yahoo,de chrysanhy phpli...@hyphusonline.com wrote in message news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com... I have the following code to extract the URLs from the anchor tags of an HTML page: $html = new DOMDocument(); $htmlpage-loadHtmlFile($location); $xpath = new DOMXPath($htmlpage); $links = $xpath-query( '//a' ); foreach ($links as $link) { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; } If I have a link a href=http://X.com;/a, how do I extract the corresponding which is displayed to the user as the text of the link (if it's an image tag, I would like a DOMElement for that). Thanks -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php