Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-22 Thread Raymond Irving
Hello,

You might also want to try using the Raxan framework:

require_once 'raxan/pdi/gateway.php';

$page = new RichWebPage('page.html');
echo $page['a']-text(); // this will get the text betwen the a tag
 
To get the image element use:

$elm = $page['a img']-node(0);

You can download Raxan here:
http://raxanpdi.com/downloads.html

__
Raymond Irving

--- On Sat, 8/22/09, Manuel Lemos mle...@acm.org wrote:

From: Manuel Lemos mle...@acm.org
Subject: [PHP] Re: How do I extract link text from anchor tag as well as the 
URL from the href attribute
To: chrysanhy phpli...@hyphusonline.com
Cc: php-general@lists.php.net
Date: Saturday, August 22, 2009, 1:07 AM

Hello,

on 08/16/2009 04:33 AM chrysanhy said the following:
 I have the following code to extract the URLs from the anchor tags of an
 HTML page:
 
 $html = new DOMDocument();
 $htmlpage-loadHtmlFile($location);
 $xpath = new DOMXPath($htmlpage);
 $links = $xpath-query( '//a' );
 foreach ($links as $link)
 { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }
 
 If I have a link a href=http://X.com;/a, how do I extract the
 corresponding  which is displayed to the user as the text of the link
 (if it's an image tag, I would like a DOMElement for that).
 Thanks

You may want to try this HTML parser class that comes with filter class
and an example script named test_get_html_links.php  that does exactly
what you ask.

http://www.phpclasses.org/secure-html-filter

-- 

Regards,
Manuel Lemos

Find and post PHP jobs
http://www.phpclasses.org/jobs/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-21 Thread Manuel Lemos
Hello,

on 08/16/2009 04:33 AM chrysanhy said the following:
 I have the following code to extract the URLs from the anchor tags of an
 HTML page:
 
 $html = new DOMDocument();
 $htmlpage-loadHtmlFile($location);
 $xpath = new DOMXPath($htmlpage);
 $links = $xpath-query( '//a' );
 foreach ($links as $link)
 { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }
 
 If I have a link a href=http://X.com;/a, how do I extract the
 corresponding  which is displayed to the user as the text of the link
 (if it's an image tag, I would like a DOMElement for that).
 Thanks

You may want to try this HTML parser class that comes with filter class
and an example script named test_get_html_links.php  that does exactly
what you ask.

http://www.phpclasses.org/secure-html-filter

-- 

Regards,
Manuel Lemos

Find and post PHP jobs
http://www.phpclasses.org/jobs/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-16 Thread Ralph Deffke

try

$link-nodeValue()

or

$link-getContent()

im not shure which one works on an image link which is indeed a child of a
so u could also check if the node has a child, if so its an image with, in
good practice. an alt attribute to use

haven't tried but should work. let me know pls

ralph_def...@yahoo.de


chrysanhy phpli...@hyphusonline.com wrote in message
news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com...
 I have the following code to extract the URLs from the anchor tags of an
 HTML page:

 $html = new DOMDocument();
 $htmlpage-loadHtmlFile($location);
 $xpath = new DOMXPath($htmlpage);
 $links = $xpath-query( '//a' );
 foreach ($links as $link)
 { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }

 If I have a link a href=http://X.com;/a, how do I extract the
 corresponding  which is displayed to the user as the text of the link
 (if it's an image tag, I would like a DOMElement for that).
 Thanks




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-16 Thread chrysanhy
It did not work. Both gave me a Call to undefined method fatal error.

On Sun, Aug 16, 2009 at 1:43 AM, Ralph Deffke ralph_def...@yahoo.de wrote:


 try

 $link-nodeValue()

 or

 $link-getContent()

 im not shure which one works on an image link which is indeed a child of a
 so u could also check if the node has a child, if so its an image with, in
 good practice. an alt attribute to use

 haven't tried but should work. let me know pls

 ralph_def...@yahoo.de


 chrysanhy phpli...@hyphusonline.com wrote in message
 news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com...
  I have the following code to extract the URLs from the anchor tags of an
  HTML page:
 
  $html = new DOMDocument();
  $htmlpage-loadHtmlFile($location);
  $xpath = new DOMXPath($htmlpage);
  $links = $xpath-query( '//a' );
  foreach ($links as $link)
  { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }
 
  If I have a link a href=http://X.com;/a, how do I extract the
  corresponding  which is displayed to the user as the text of the link
  (if it's an image tag, I would like a DOMElement for that).
  Thanks
 



 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




[PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-16 Thread Ralph Deffke
did u try it something like this

foreach ($links as $link) {
$int_url_list[$i][href] = $link-getAttribute( 'href' );
$int_url_list[$i++][linkText] = $link-getContent(  ); // nodeValue();
}
that should work

send ur code then please
ralph_def...@yahoo,de


chrysanhy phpli...@hyphusonline.com wrote in message
news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com...
 I have the following code to extract the URLs from the anchor tags of an
 HTML page:

 $html = new DOMDocument();
 $htmlpage-loadHtmlFile($location);
 $xpath = new DOMXPath($htmlpage);
 $links = $xpath-query( '//a' );
 foreach ($links as $link)
 { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }

 If I have a link a href=http://X.com;/a, how do I extract the
 corresponding  which is displayed to the user as the text of the link
 (if it's an image tag, I would like a DOMElement for that).
 Thanks




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-16 Thread chrysanhy
WHile waiting for suggestions for extracting the link text from the DOM, I
tried a brute force approach using the URLs I had found with getAttribute(),
but found myself baffled by my results. I boiled down my issue with this
approach to the following snippet.

$htmldata =EOB
http://www.protools.com/users/user_story.cfm?story_id=1162amp;lang=1;quot;Creating

Surround Mixes with Tim Weidner/aquot; img height=11
src=new.gif width=28
- iMagnification/i engineer talks about mixing the album at
the
iProTools/i site, by Jim Batchco
http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html;quot;Don't
Goquot; Video/aa href=
http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html;/a
img height=11 src=new.gif width=28 - Presented by Beyond
Music
(a href=http://www.apple.com/quicktime/download/;QuickTime/a

Required)
EOB;
$url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html';
$posn = strpos($url, $htmldata);
echo URL |$url| position is |$posn|;

Running this gives me:

URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is ||

I've tried lots of functions, and even regular expressions, but I cannot get
the code to find the URL in the HTML. While I still hope for a DOM solution
to getting this link text, WHY can't the code find the URL in the HTML
snippet?

On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy phpli...@hyphusonline.comwrote:

 I pasted the code exactly as you have it, and I got the following:

 *Fatal error*: Call to undefined method DOMElement::getContent()

 I got the same thing with nodeValue().


 On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke ralph_def...@yahoo.dewrote:

 did u try it something like this

 foreach ($links as $link) {
$int_url_list[$i][href] = $link-getAttribute( 'href' );
$int_url_list[$i++][linkText] = $link-getContent(  ); //
 nodeValue();
 }
 that should work

 send ur code then please
 ralph_def...@yahoo,de


 chrysanhy phpli...@hyphusonline.com wrote in message
 news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com...
  I have the following code to extract the URLs from the anchor tags of an
  HTML page:
 
  $html = new DOMDocument();
  $htmlpage-loadHtmlFile($location);
  $xpath = new DOMXPath($htmlpage);
  $links = $xpath-query( '//a' );
  foreach ($links as $link)
  { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }
 
  If I have a link a href=http://X.com;/a, how do I extract the
  corresponding  which is displayed to the user as the text of the
 link
  (if it's an image tag, I would like a DOMElement for that).
  Thanks
 



 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php





Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-16 Thread Ralph Deffke
well the immage goes inside the a.. img... /a

on ur html the node a has no value however u should not get a error

this is pergect jtml link
a href=thema.htmimg src=button4.jpg width=160 height=34
border=0 alt=THEMA/a

ralph

chrysanhy phpli...@hyphusonline.com wrote in message
news:88827b190908160943t2254137fve43771c7e4f8c...@mail.gmail.com...
 WHile waiting for suggestions for extracting the link text from the DOM, I
 tried a brute force approach using the URLs I had found with
getAttribute(),
 but found myself baffled by my results. I boiled down my issue with this
 approach to the following snippet.

 $htmldata =EOB

http://www.protools.com/users/user_story.cfm?story_id=1162amp;lang=1;quot;Creating

 Surround Mixes with Tim Weidner/aquot; img height=11
 src=new.gif width=28
 - iMagnification/i engineer talks about mixing the album
at
 the
 iProTools/i site, by Jim Batchco
 http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html;quot;Don't
 Goquot; Video/aa href=

http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html;/a
 img height=11 src=new.gif width=28 - Presented by
Beyond
 Music
 (a
href=http://www.apple.com/quicktime/download/;QuickTime/a

 Required)
 EOB;
 $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html';
 $posn = strpos($url, $htmldata);
 echo URL |$url| position is |$posn|;

 Running this gives me:

 URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is
||

 I've tried lots of functions, and even regular expressions, but I cannot
get
 the code to find the URL in the HTML. While I still hope for a DOM
solution
 to getting this link text, WHY can't the code find the URL in the HTML
 snippet?

 On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy
phpli...@hyphusonline.comwrote:

  I pasted the code exactly as you have it, and I got the following:
 
  *Fatal error*: Call to undefined method DOMElement::getContent()
 
  I got the same thing with nodeValue().
 
 
  On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke
ralph_def...@yahoo.dewrote:
 
  did u try it something like this
 
  foreach ($links as $link) {
 $int_url_list[$i][href] = $link-getAttribute( 'href' );
 $int_url_list[$i++][linkText] = $link-getContent(  ); //
  nodeValue();
  }
  that should work
 
  send ur code then please
  ralph_def...@yahoo,de
 
 
  chrysanhy phpli...@hyphusonline.com wrote in message
  news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com...
   I have the following code to extract the URLs from the anchor tags of
an
   HTML page:
  
   $html = new DOMDocument();
   $htmlpage-loadHtmlFile($location);
   $xpath = new DOMXPath($htmlpage);
   $links = $xpath-query( '//a' );
   foreach ($links as $link)
   { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }
  
   If I have a link a href=http://X.com;/a, how do I extract
the
   corresponding  which is displayed to the user as the text of the
  link
   (if it's an image tag, I would like a DOMElement for that).
   Thanks
  
 
 
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-16 Thread Ralph Deffke
this worked here:
?php

$html = new DOMDocument();
$html-loadHtmlFile(testHtml.html);
$links = $html-getElementsByTagName('a');
echo pre;

foreach ($links as $item) {
  echo $item-getAttribute( 'href' ). \n;
  echo --- . $item-nodeValue . \n;
}

echo /pre;

?

Im sending u the 2 files directly in a minute. it came out, as I thought
earlier that u have to check if the a tags has got children to extract
image links.

ralph_def...@yahoo.de


chrysanhy phpli...@hyphusonline.com wrote in message
news:88827b190908160943t2254137fve43771c7e4f8c...@mail.gmail.com...
 WHile waiting for suggestions for extracting the link text from the DOM, I
 tried a brute force approach using the URLs I had found with
getAttribute(),
 but found myself baffled by my results. I boiled down my issue with this
 approach to the following snippet.

 $htmldata =EOB

http://www.protools.com/users/user_story.cfm?story_id=1162amp;lang=1;quot;Creating

 Surround Mixes with Tim Weidner/aquot; img height=11
 src=new.gif width=28
 - iMagnification/i engineer talks about mixing the album
at
 the
 iProTools/i site, by Jim Batchco
 http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html;quot;Don't
 Goquot; Video/aa href=

http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html;/a
 img height=11 src=new.gif width=28 - Presented by
Beyond
 Music
 (a
href=http://www.apple.com/quicktime/download/;QuickTime/a

 Required)
 EOB;
 $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html';
 $posn = strpos($url, $htmldata);
 echo URL |$url| position is |$posn|;

 Running this gives me:

 URL |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html| position is
||

 I've tried lots of functions, and even regular expressions, but I cannot
get
 the code to find the URL in the HTML. While I still hope for a DOM
solution
 to getting this link text, WHY can't the code find the URL in the HTML
 snippet?

 On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy
phpli...@hyphusonline.comwrote:

  I pasted the code exactly as you have it, and I got the following:
 
  *Fatal error*: Call to undefined method DOMElement::getContent()
 
  I got the same thing with nodeValue().
 
 
  On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke
ralph_def...@yahoo.dewrote:
 
  did u try it something like this
 
  foreach ($links as $link) {
 $int_url_list[$i][href] = $link-getAttribute( 'href' );
 $int_url_list[$i++][linkText] = $link-getContent(  ); //
  nodeValue();
  }
  that should work
 
  send ur code then please
  ralph_def...@yahoo,de
 
 
  chrysanhy phpli...@hyphusonline.com wrote in message
  news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com...
   I have the following code to extract the URLs from the anchor tags of
an
   HTML page:
  
   $html = new DOMDocument();
   $htmlpage-loadHtmlFile($location);
   $xpath = new DOMXPath($htmlpage);
   $links = $xpath-query( '//a' );
   foreach ($links as $link)
   { $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }
  
   If I have a link a href=http://X.com;/a, how do I extract
the
   corresponding  which is displayed to the user as the text of the
  link
   (if it's an image tag, I would like a DOMElement for that).
   Thanks
  
 
 
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: How do I extract link text from anchor tag as well as the URL from the href attribute

2009-08-16 Thread chrysanhy
The code snippet below worked! Thank you so much for your time helping me
with this!

On Sun, Aug 16, 2009 at 11:26 AM, Ralph Deffke ralph_def...@yahoo.dewrote:

 this worked here:
 ?php

 $html = new DOMDocument();
 $html-loadHtmlFile(testHtml.html);
 $links = $html-getElementsByTagName('a');
 echo pre;

 foreach ($links as $item) {
  echo $item-getAttribute( 'href' ). \n;
  echo --- . $item-nodeValue . \n;
 }

 echo /pre;

 ?

 Im sending u the 2 files directly in a minute. it came out, as I thought
 earlier that u have to check if the a tags has got children to extract
 image links.

 ralph_def...@yahoo.de


 chrysanhy phpli...@hyphusonline.com wrote in message
 news:88827b190908160943t2254137fve43771c7e4f8c...@mail.gmail.com...
  WHile waiting for suggestions for extracting the link text from the DOM,
 I
  tried a brute force approach using the URLs I had found with
 getAttribute(),
  but found myself baffled by my results. I boiled down my issue with this
  approach to the following snippet.
 
  $htmldata =EOB
 
 http://www.protools.com/users/user_story.cfm?story_id=1162amp;lang=1
 quot;Creating
 
  Surround Mixes with Tim Weidner/aquot; img height=11
  src=new.gif width=28
  - iMagnification/i engineer talks about mixing the album
 at
  the
  iProTools/i site, by Jim Batchco
  http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html;quot;Don't
  Goquot; Video/aa href=
 
 http://fi.soneraplaza.net/kaista/musiq/kaistatv/0,8883,201392,00.html
 /a
  img height=11 src=new.gif width=28 - Presented by
 Beyond
  Music
  (a
 href=http://www.apple.com/quicktime/download/;QuickTime/a
 
  Required)
  EOB;
  $url = 'http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html';
  $posn = strpos($url, $htmldata);
  echo URL |$url| position is |$posn|;
 
  Running this gives me:
 
  URL 
  |http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html|http://www.beyondmusic.com/MediaPlayer/Yes/DontGo.html%7Cposition
   is
 ||
 
  I've tried lots of functions, and even regular expressions, but I cannot
 get
  the code to find the URL in the HTML. While I still hope for a DOM
 solution
  to getting this link text, WHY can't the code find the URL in the HTML
  snippet?
 
  On Sun, Aug 16, 2009 at 9:29 AM, chrysanhy
 phpli...@hyphusonline.comwrote:
 
   I pasted the code exactly as you have it, and I got the following:
  
   *Fatal error*: Call to undefined method DOMElement::getContent()
  
   I got the same thing with nodeValue().
  
  
   On Sun, Aug 16, 2009 at 7:35 AM, Ralph Deffke
 ralph_def...@yahoo.dewrote:
  
   did u try it something like this
  
   foreach ($links as $link) {
  $int_url_list[$i][href] = $link-getAttribute( 'href' );
  $int_url_list[$i++][linkText] = $link-getContent(  ); //
   nodeValue();
   }
   that should work
  
   send ur code then please
   ralph_def...@yahoo,de
  
  
   chrysanhy phpli...@hyphusonline.com wrote in message
   news:88827b190908160033n226b370bqe2ab70732811...@mail.gmail.com...
I have the following code to extract the URLs from the anchor tags
 of
 an
HTML page:
   
$html = new DOMDocument();
$htmlpage-loadHtmlFile($location);
$xpath = new DOMXPath($htmlpage);
$links = $xpath-query( '//a' );
foreach ($links as $link)
{ $int_url_list[$i++] = $link-getAttribute( 'href' ) . \n; }
   
If I have a link a href=http://X.com;/a, how do I extract
 the
corresponding  which is displayed to the user as the text of the
   link
(if it's an image tag, I would like a DOMElement for that).
Thanks
   
  
  
  
   --
   PHP General Mailing List (http://www.php.net/)
   To unsubscribe, visit: http://www.php.net/unsub.php
  
  
  
 



 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php