Hello,
I am trying to make a PHP script to index my site and insert into a
MySQL DB the .htm files path, its Title (from the HTML tags
<Title></Title>), its Description (from the meta tag <meta
name="description" content="..."> ) and its Keywords (from the meta
tag <meta name="keywords" content="..."> ).
Well, I adapted this function to get the Title and it works great!!:
/*
* Given a raw html document (as string), return its title.
* This function may need to be modified if your web pages use
automatically
* generated titles.
*/
function getTitle(&$doc)
{
if (eregi("<title>(.*)</title>", $doc, $titlematch))
$title = trim(eregi_replace("[[:space:]]+", " " ,
$titlematch[1]));
else
$title = "";
if ($title == "")
$title = "Sem T�tulo";
return $title;
}
I then tried to do something similar to get the Description:
function getDescription(&$doc)
{
if (eregi('<meta name="description" content="(.*)">', $doc,
$descr))
$descricao = trim(eregi_replace("[[:space:]]+", " " ,
$descr[1]));
else
$descricao = "";
if ($descricao == "")
$descricao = "Sem Descri��o";
return $descricao;
}
This doesn't work as intended... It returns the whole page starting
after content=" and doesn't end at the end of the string (">).
The funny thing is that if I add a space on the end of the string like
this (" >) in both the PHP code and in the HTML file (<meta
name="description" conten="test with a space" >), the function returns
only the string of the description as intended...
The same thing happens with the Keywords:
function getKeywords(&$doc)
{
if (eregi('<meta name="keywords" content="(.*)">', $doc,
$mykeys))
$keywords = trim(eregi_replace("[[:space:]]+", " " ,
$mykeys[1]));
else
$keywords = "";
if ($keywords == "")
$keywords = "Sem Keywords";
return $keywords;
}
But this time I nedded two (2) spaces to make the function work!!!
(<meta name="description" conten="test with 2 spaces" >), If I used
one or no space it returned the whole page... with 2 spaces the
function works...
I concluded that the regex pattern (.*) doesn't stops looking on the
"> and needs a space between them (" >). But why the second time it
nedded 2 spaces!?
I don't want to have to change all the HTM files from my site and add
a space to the Descritpion Meta Tag and 2 spaces to the Keywords
Meta... Is there a way to say to the (.*) to end the search at the ">
?
Thanks for your attention
Marco Ascensao
--
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]