Re: [PHP] Fast search
On Wed, May 17, 2006 11:37 am, René Fournier wrote: Looking for suggestions on the most compute-efficient way to search variable-length strings (~200 characters) for the occurrence of one of about 100 possible needles. In other words: $needles = array ( 1 = Hello Jim , 2 = Yellow Banana , 3 = Red Car, ... etc (100 elements) $haystack = Once upon a time there was a programming language that everyone loved. Its name was PHP, and the people that worked with it also liked Yellow Bananas. One day...; Now perhaps I'm blind, but I can't see an obvious string or array function that simply allows you to use an array as a needle while searching a string haystack. What I'm really looking for is something like the reverse of in_array. The obvious thing would be to loop through the needles and run substr_count on the haystack... But is there a faster way? //Kind of a hack, I guess, but might be decently fast: $pattern = implode('|', $needles); if (preg_match(/($pattern)/, $haystack, $matches)){ var_dump($matches); } -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Fast search
On 17/05/06, René Fournier [EMAIL PROTECTED] wrote: Looking for suggestions on the most compute-efficient way to search variable-length strings (~200 characters) for the occurrence of one of about 100 possible needles. In other words: $needles = array ( 1 = Hello Jim , 2 = Yellow Banana , 3 = Red Car, ... etc (100 elements) $haystack = Once upon a time there was a programming language that everyone loved. Its name was PHP, and the people that worked with it also liked Yellow Bananas. One day...; What version of php? If you're using php5 then you could do: ?php $count = 0; str_replace($needles, '', $haystack, $count); if ($count) { /* there was a match */ } ? -robin -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Fast search
You can make a pretty effective time/memory tradeoff -- particularly if your array of patterns is relatively fixed -- but unless you re- implement this in C, it will probably be slower than a brute force approach with the str* functions unless you are searching for a fairly huge number of needles. Below, I build a tree of pattern matches, which you navigate branch- by-branch by using successive letters of the pattern. So, after the tree is built, the index $tree[ 'h' ][ 'a' ][ 't' ] exists and has the value `true'. To search the string, you push cursors onto a stack, advance them as you advance through the string, and destroy them when they hit a dead end. This has a worst case of O( N^2 ) (on the length of the string), which happens when you search for patterns like a and b in the string h. The average case for random or typical strings should be very close to O ( N ). Critically, search time is a function only of the length of the string, not of the number of patterns. Memory usage is also only a function of the size of the character set and the length of the longest pattern, not of the total number of patterns. This particular implementation has some nice side-effects; uncomment the var_dump() line and note how 'pirate' and 'firm' have been optimized out of the tree in favor of 'pir' and 'fir'. Evan ?php $pattern = array( 'hat' ,'hut' ,'blue dog' ,'blue cat' ,'pirate' ,'pir' ,'fir' ,'firm' ); $tree = array( ); foreach( $pattern as $str ) { $len = strlen( $str ); $cursor = $tree; for( $ii = 0; $ii $len; $ii++ ) { if( !isset( $cursor[ $str[ $ii ] ] ) ) { $cursor[ $str[ $ii ] ] = array( ); } $cursor = $cursor[ $str[ $ii ] ]; if( $cursor === true ) break; } $cursor = true; } /* var_dump ( $tree ); */ $corpus = 'lorem ipsum dolor sur amet pi blue do he hi h fipir slog'; $clen = strlen( $corpus ); $stack = array( ); for( $ii = 0; $ii $clen; $ii++ ) { $c = $corpus[ $ii ]; foreach( $stack as $k = $_ ) { if( isset( $stack[ $k ][ $c ] ) ) { if( $stack[ $k ][ $c ] === true ) { die( 'Matched at position '.$ii ); } $stack[ $k ] = $stack[ $k ][ $c ]; } else { unset( $stack[ $k ] ); } } if( isset( $tree[ $c ] ) ) { $stack[] = $tree[ $c ]; } } die( 'Did not find a match.' ); ? On May 17, 2006, at 12:37 PM, René Fournier wrote: Looking for suggestions on the most compute-efficient way to search variable-length strings (~200 characters) for the occurrence of one of about 100 possible needles. In other words: $needles = array ( 1 = Hello Jim , 2 = Yellow Banana , 3 = Red Car, ... etc (100 elements) $haystack = Once upon a time there was a programming language that everyone loved. Its name was PHP, and the people that worked with it also liked Yellow Bananas. One day...; Now perhaps I'm blind, but I can't see an obvious string or array function that simply allows you to use an array as a needle while searching a string haystack. What I'm really looking for is something like the reverse of in_array. The obvious thing would be to loop through the needles and run substr_count on the haystack... But is there a faster way? ...Rene -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Fast search
On Wed, 2006-05-17 at 16:38, Robin Vickery wrote: On 17/05/06, René Fournier [EMAIL PROTECTED] wrote: Looking for suggestions on the most compute-efficient way to search variable-length strings (~200 characters) for the occurrence of one of about 100 possible needles. In other words: $needles = array ( 1 = Hello Jim , 2 = Yellow Banana , 3 = Red Car, ... etc (100 elements) $haystack = Once upon a time there was a programming language that everyone loved. Its name was PHP, and the people that worked with it also liked Yellow Bananas. One day...; What version of php? If you're using php5 then you could do: ?php $count = 0; str_replace($needles, '', $haystack, $count); if ($count) { /* there was a match */ } ? Surely you meant strpos( $haystack, $needle ) ??? Incidentally do you need to match on words beginnings or endings? For instance should 'put' match all of the following: I put the apple core in the compost. Someone always puts the fire out. The car sputtered and died. Cheers, Rob. -- .. | InterJinn Application Framework - http://www.interjinn.com | :: | An application and templating framework for PHP. Boasting | | a powerful, scalable system for accessing system services | | such as forms, properties, sessions, and caches. InterJinn | | also provides an extremely flexible architecture for | | creating re-usable components quickly and easily. | `' -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php