Re: [PHP] Another parse problem

Robert Cummings Mon, 14 Jun 2010 06:59:46 -0700

tedd wrote:

At 2:18 PM +0100 6/14/10, Ashley Sheridan wrote:
On Mon, 2010-06-14 at 09:14 -0400, tedd wrote:
Hi gang:

Considering all the recent parsing, here's another problem to
consider -- given any text, parse the domain-names out of it.

You may limit the parsing to the most popular TDL's, such as .com,
.net, and .org, but the finished result should be an array containing
all the domain-names found in a text file.

Cheers,

tedd
--
-------
<http://sperling.com>http://sperling.com<http://ancientstones.com>http://ancientstones.com<http://earthstones.com>http://earthstones.com
I'm assuming it won't be anything as simple as assuming all thedomains begin with the http:// prefix? :p
Thanks,
Ash
Ash:
Nope, just a text file containing whatever and domain-names. The onlydomain-name indicator would be the period followed by an approvedTDL, such as .com, .net, or .org.


<?php

function rip_domains( $text )
{
    $domains = false;

    $pattern =
        '[^-[:alnum:]]*'
       .'('
       .  '[-[:alnum:]][-.[:alnum:]]*'
       .  '\.(com|net|org)'
       .')'
       .'[^-_[:alnum:]]*';

    if( preg_match_all( "#$pattern#", $text, $matches ) )
    {
        $domains = array();
        foreach( $matches[1] as $domain )
        {
            $domains[$domain] = true;
        }
        $domains = array_keys( $domains );
    }

    return $domains;
}

?>

Naive implementation. I'm sure I've missed edge cases someplace.

Cheers,
Rob.
--
E-Mail Disclaimer: Information contained in this message and any
attached documents is considered confidential and legally protected.
This message is intended solely for the addressee(s). Disclosure,
copying, and distribution are prohibited unless authorized.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Another parse problem

Reply via email to