tedd wrote:
At 2:18 PM +0100 6/14/10, Ashley Sheridan wrote:
On Mon, 2010-06-14 at 09:14 -0400, tedd wrote:

Hi gang:

Considering all the recent parsing, here's another problem to
consider -- given any text, parse the domain-names out of it.

You may limit the parsing to the most popular TDL's, such as .com,
.net, and .org, but the finished result should be an array containing
all the domain-names found in a text file.


<http://sperling.com>http://sperling.com <http://ancientstones.com>http://ancientstones.com <http://earthstones.com>http://earthstones.com

I'm assuming it won't be anything as simple as assuming all the domains begin with the http:// prefix? :p



Nope, just a text file containing whatever and domain-names. The only domain-name indicator would be the period followed by an approved TDL, such as .com, .net, or .org.


function rip_domains( $text )
    $domains = false;

    $pattern =
       .  '[-[:alnum:]][-.[:alnum:]]*'
       .  '\.(com|net|org)'

    if( preg_match_all( "#$pattern#", $text, $matches ) )
        $domains = array();
        foreach( $matches[1] as $domain )
            $domains[$domain] = true;
        $domains = array_keys( $domains );

    return $domains;


Naive implementation. I'm sure I've missed edge cases someplace.

E-Mail Disclaimer: Information contained in this message and any
attached documents is considered confidential and legally protected.
This message is intended solely for the addressee(s). Disclosure,
copying, and distribution are prohibited unless authorized.

PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to