Here is an idea

<?
// make array out of words in search string
$search_array = explode(' ', $search);

// make regexp pattern '.*(this|or|that).*'
$string = ".*(" .  implode('|', $search_array) . ").*";

$query = "SELECT * FROM my_table WHERE body REGEXP '$string'";
$result = mysql_query($query, $connection);
$res = mysql_num_rows($result);
if ($res < 1)
    die("no match for $search");
?>

using this method "car" would match "car", "carwash", "scar", "scarred",
etc.
Since this result will contain the entire boy of text you could some more
matching or scoring for relevancy

<?
while ( $row = mysql_fetch_assoc($result) )
{
   $num = sizeof($search_array);
   for ($i = 0; $i < $num; $i++)
  {
     if ( preg_match("/.*\b$search_array[$i]/i", $row[body]) )
    {
         // it was found so score 25 to start
         $score[$row[page_title_or_something]] += 25;
         $body_size = strlen($row[body]);
         // this is the first case-insensitive occurance of the word
         $temp = @stristr($row[body], "$search_array[$i]");
         $pos = @strlen($row[body])-strlen($temp);
         if ($pos == $body_size)
             $pos = 0;
         // score higher
         $percent = ( ($pos / $body_size * 1000) / 10 );
         $score[$row[page_title_or_something]] += ((100 -
number_format($percent)) / 2);
         // this is the first occurance of the word by it's self
         preg_match("/[^a-z0-9]?($search_array[$i])([^0-9a-z])?/i",
$row[body], $matches);
         $temp = @stristr($row[body], trim($matches[0]));
         $pos_clean = @strlen($row[body])-strlen($temp);
         if ($pos_clean == $body_size)
             $pos_clean = 0;
         // score higher
         $percent = ( ($pos_clean / $body_size * 1000) / 10 );
         $score[$row[page_title_or_something]] += (100 -
number_format($percent));
         // this is how many times it occured in total
         $reps = substr_count($row[body], "$search_array[$i]");
         // score higher
         $score[$row[page_title_or_something]] += ($reps * 5);
         // this is how many times it occured by it's self
         $rc = preg_grep("/[^a-z0-9]?($search_array[$i])([^0-9a-z])?/i",
explode(" ", $row[body]) );
         $reps_clean = sizeof($rc);
         // score higher
         $score[$row[page_title_or_something]] += ($reps_clean * 10);
    }
}
?>

I had that code from a previous working project. I copied it and changed
some var names to make it more clear. I did not test it in this format but
it is a good example and you could certainly improve it or build on it.

Jim Grill
Support
Web-1 Hosting
http://www.web-1hosting.net
----- Original Message -----
From: "Paul Maine" <[EMAIL PROTECTED]>
To: "PHP PHP" <[EMAIL PROTECTED]>
Sent: Saturday, July 27, 2002 9:31 PM
Subject: [PHP] PHP/MySQL Search Engine Query Question


> I am currently working on a website that is implemented using PHP and
MySQL.
>
> The site currently has a simple search engine that allows a shopper to
type
> in a search string that is stored in $search. For example, if a shopper
> types in 1972 Ford Mustang
> $string ="1972 Ford Mustang"
>
> Using the following SQL statement:
> SELECT * FROM whatevertable WHERE whatevercolumn LIKE '%$search%
>
> Records are returned that have this exact string and in this exact order
> (I'm aware a wild card character is included on the front and back of the
> string).
>
> My desire is to be able to logically AND each token of the search together
> independent or the order of the tokens.
> I want to return all records that have Mustang AND 1972 AND Ford.
>
> Since a shopper inputs the search string in advance I don't know how many
> tokens will be used.
>
> I would appreciate any suggestions.
>
> Regards,
> Paul
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
>



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to