Hello,

I am trying to make a PHP script to index my site and insert into a
MySQL DB the .htm files path, its Title (from the HTML tags
<Title></Title>), its Description (from the meta tag <meta
name="description" content="..."> ) and its Keywords (from the meta
tag <meta name="keywords" content="..."> ).

Well, I adapted this function to get the Title and it works great!!:

/*
 * Given a raw html document (as string), return its title.
 * This function may need to be modified if your web pages use
automatically
 * generated titles.
 */

function getTitle(&$doc)
{
        if (eregi("<title>(.*)</title>", $doc, $titlematch))
                $title = trim(eregi_replace("[[:space:]]+", " " ,
$titlematch[1]));
        else
                $title = "";
        if ($title == "")
                $title = "Sem Título";
        return $title;
}


I then tried to do something similar to get the Description:


function getDescription(&$doc)
{
        if (eregi('<meta name="description" content="(.*)">', $doc,
$descr))
                $descricao = trim(eregi_replace("[[:space:]]+", " " ,
$descr[1]));
        else
                $descricao = "";
        if ($descricao == "")
                $descricao = "Sem Descrição";
        return $descricao;
}

This doesn't work as intended... It returns the whole page starting
after content=" and doesn't end at the end of the string (">). 

The funny thing is that if I add a space on the end of the string like
this (" >) in both the PHP code and in the HTML file (<meta
name="description" conten="test with a space" >), the function returns
only the string of the description as intended...


The same thing happens with the Keywords:

function getKeywords(&$doc)
{
        if (eregi('<meta name="keywords" content="(.*)">', $doc,
$mykeys))
                $keywords = trim(eregi_replace("[[:space:]]+", " " ,
$mykeys[1]));
        else
                $keywords = "";
        if ($keywords == "")
                $keywords = "Sem Keywords";
        return $keywords;
}

But this time I nedded two (2) spaces to make the function work!!!
(<meta name="description" conten="test with 2 spaces"  >), If I used
one or no space it returned the whole page... with 2 spaces the
function works...

I concluded that the regex pattern (.*) doesn't stops looking on the
"> and needs a space between them (" >). But why the second time it
nedded 2 spaces!?

I don't want to have to change all the HTM files from my site and add
a space to the Descritpion Meta Tag and 2 spaces to the Keywords
Meta... Is there a way to say to the (.*) to end the search at the ">
?

Thanks for your attention

Marco Ascensao



-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to