Maybe you should look out for if (eregi('<meta name="description" content="(.*)">', $doc, to: if (eregi("<meta name='description' content='(.*)'>", $doc,...)||eregi("<meta name=\"description\" content=\"(.*)\">", $doc,...)) like the first: if (eregi("<title>(.*)</title>", $doc, $titlematch)) but I don't know, just maybe :) ----- Original Message ----- From: "DHEA" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, April 27, 2001 12:49 AM Subject: [PHP-WIN] [Help:] Problem with regex patterns when getting Title, Description and Keywords from HTML files... > Hello, > > I am trying to make a PHP script to index my site and insert into a > MySQL DB the .htm files path, its Title (from the HTML tags > <Title></Title>), its Description (from the meta tag <meta > name="description" content="..."> ) and its Keywords (from the meta > tag <meta name="keywords" content="..."> ). > > Well, I adapted this function to get the Title and it works great!!: > > /* > * Given a raw html document (as string), return its title. > * This function may need to be modified if your web pages use > automatically > * generated titles. > */ > > function getTitle(&$doc) > { > if (eregi("<title>(.*)</title>", $doc, $titlematch)) > $title = trim(eregi_replace("[[:space:]]+", " " , > $titlematch[1])); > else > $title = ""; > if ($title == "") > $title = "Sem Título"; > return $title; > } > > > I then tried to do something similar to get the Description: > > > function getDescription(&$doc) > { > if (eregi('<meta name="description" content="(.*)">', $doc, > $descr)) > $descricao = trim(eregi_replace("[[:space:]]+", " " , > $descr[1])); > else > $descricao = ""; > if ($descricao == "") > $descricao = "Sem Descrição"; > return $descricao; > } > > This doesn't work as intended... It returns the whole page starting > after content=" and doesn't end at the end of the string (">). > > The funny thing is that if I add a space on the end of the string like > this (" >) in both the PHP code and in the HTML file (<meta > name="description" conten="test with a space" >), the function returns > only the string of the description as intended... > > > The same thing happens with the Keywords: > > function getKeywords(&$doc) > { > if (eregi('<meta name="keywords" content="(.*)">', $doc, > $mykeys)) > $keywords = trim(eregi_replace("[[:space:]]+", " " , > $mykeys[1])); > else > $keywords = ""; > if ($keywords == "") > $keywords = "Sem Keywords"; > return $keywords; > } > > But this time I nedded two (2) spaces to make the function work!!! > (<meta name="description" conten="test with 2 spaces" >), If I used > one or no space it returned the whole page... with 2 spaces the > function works... > > I concluded that the regex pattern (.*) doesn't stops looking on the > "> and needs a space between them (" >). But why the second time it > nedded 2 spaces!? > > I don't want to have to change all the HTM files from my site and add > a space to the Descritpion Meta Tag and 2 spaces to the Keywords > Meta... Is there a way to say to the (.*) to end the search at the "> > ? > > Thanks for your attention > > Marco Ascensao > > > > -- > PHP Windows Mailing List (http://www.php.net/) > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > To contact the list administrators, e-mail: [EMAIL PROTECTED] > > -- PHP Windows Mailing List (http://www.php.net/) To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] To contact the list administrators, e-mail: [EMAIL PROTECTED]