Maybe you should look out for

     if (eregi('<meta name="description" content="(.*)">', $doc,

to:

     if (eregi("<meta name='description' content='(.*)'>",
$doc,...)||eregi("<meta name=\"description\" content=\"(.*)\">", $doc,...))

like the first:

    if (eregi("<title>(.*)</title>", $doc, $titlematch))

but I don't know, just maybe :)



----- Original Message -----
From: "DHEA" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, April 27, 2001 12:49 AM
Subject: [PHP-WIN] [Help:] Problem with regex patterns when getting Title,
Description and Keywords from HTML files...


> Hello,
>
> I am trying to make a PHP script to index my site and insert into a
> MySQL DB the .htm files path, its Title (from the HTML tags
> <Title></Title>), its Description (from the meta tag <meta
> name="description" content="..."> ) and its Keywords (from the meta
> tag <meta name="keywords" content="..."> ).
>
> Well, I adapted this function to get the Title and it works great!!:
>
> /*
>  * Given a raw html document (as string), return its title.
>  * This function may need to be modified if your web pages use
> automatically
>  * generated titles.
>  */
>
> function getTitle(&$doc)
> {
> if (eregi("<title>(.*)</title>", $doc, $titlematch))
> $title = trim(eregi_replace("[[:space:]]+", " " ,
> $titlematch[1]));
> else
> $title = "";
> if ($title == "")
> $title = "Sem Título";
> return $title;
> }
>
>
> I then tried to do something similar to get the Description:
>
>
> function getDescription(&$doc)
> {
> if (eregi('<meta name="description" content="(.*)">', $doc,
> $descr))
> $descricao = trim(eregi_replace("[[:space:]]+", " " ,
> $descr[1]));
> else
> $descricao = "";
> if ($descricao == "")
> $descricao = "Sem Descrição";
> return $descricao;
> }
>
> This doesn't work as intended... It returns the whole page starting
> after content=" and doesn't end at the end of the string (">).
>
> The funny thing is that if I add a space on the end of the string like
> this (" >) in both the PHP code and in the HTML file (<meta
> name="description" conten="test with a space" >), the function returns
> only the string of the description as intended...
>
>
> The same thing happens with the Keywords:
>
> function getKeywords(&$doc)
> {
> if (eregi('<meta name="keywords" content="(.*)">', $doc,
> $mykeys))
> $keywords = trim(eregi_replace("[[:space:]]+", " " ,
> $mykeys[1]));
> else
> $keywords = "";
> if ($keywords == "")
> $keywords = "Sem Keywords";
> return $keywords;
> }
>
> But this time I nedded two (2) spaces to make the function work!!!
> (<meta name="description" conten="test with 2 spaces"  >), If I used
> one or no space it returned the whole page... with 2 spaces the
> function works...
>
> I concluded that the regex pattern (.*) doesn't stops looking on the
> "> and needs a space between them (" >). But why the second time it
> nedded 2 spaces!?
>
> I don't want to have to change all the HTM files from my site and add
> a space to the Descritpion Meta Tag and 2 spaces to the Keywords
> Meta... Is there a way to say to the (.*) to end the search at the ">
> ?
>
> Thanks for your attention
>
> Marco Ascensao
>
>
>
> --
> PHP Windows Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> To contact the list administrators, e-mail: [EMAIL PROTECTED]
>
>

-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to