PHP.net has some good examples if you search under the regex functions.  Or
you might use something like the function below.  I wrote this in a search
engine spider.  It will return a list of local html links found on the given
page.  The way I used this in my spider was to build a master list of local
links and test that against a separate array of visited links.  Combine with
a little Javascript you can index an entire website with visual feedback.

function extract_links($url)
{
 $fp = fopen($url, "r");
 if ($fp !== false)
 {
  fclose($fp);

  $contents = implode("", file($url));
  preg_match_all("|href=\"?([^\"' >]+)|i", $contents, $arrayoflinks);

  foreach ($arrayoflinks[1] as $link)
  {
   // Trim out any links with http://
   if (!ereg('http://', $link))
   {
    // Make sure the links are html files.
    if (ereg ('.htm', $link))
    {
     // Build array of local links on this page.
     $links[] = $link;
    }
   }
  }
  $links = array_unique($links);
  $links = array_values($links);
  return $links;
 }
 else
 {
  return false;
 }
}

-Kevin


----- Original Message -----
From: "Nick Wilson" <[EMAIL PROTECTED]>
To: "php-general" <[EMAIL PROTECTED]>
Sent: Friday, June 21, 2002 3:15 PM
Subject: [PHP] getting anchor tags


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi
> In theory I can work out how to get <a href= tags from a page. Before I
> start messing with regexp though I thought I'd see if there were any
> pre-built functions or ways of doing this?
>
> I'm building a site search and have not found anything in the docs but
> am guessing there might be an easier way of proceeding?
>
> Many thanks...
> - --
> Nick Wilson     //  www.explodingnet.com
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.6 (GNU/Linux)
>
> iD8DBQE9E5dUHpvrrTa6L5oRAtrRAJ0YqRvKl8WAAG9xYiFHa6u0Nr7RYgCcDIii
> A/dUb7p9De0J1huL+e2QPFs=
> =03Ln
> -----END PGP SIGNATURE-----
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to