Re: [PHP] Re: Fuzzy Array Search

2011-06-07 Thread Shawn McKenzie


On 06/07/2011 04:35 PM, Alex Nikitin wrote:
> Shawn, == is not good for string comparison, its a bad habit that one
> should get out of, use ===, its much safer .
Yes, except that I was comparing a string to an array of integers :)
>
> Also try the same algorithm on 10 arrays of some number of values
> 10-1000 perhaps, that would give you better performance statistics :)
>
LOOP WITH PREG MATCH:  650.627260 seconds
PREG_GREP:  123.110386 seconds

This was with 100,000 arrays each with 1,000 values.  Of course the loop
iterates through the entire loop which is needed to find all matches. 
If only one match is needed then you can just break out of the loop to
save time.  If we assume that half of the matches will be in the lower
500 and half in the upper then we can half the time for the loop but it
is still 325 seconds.

Thanks!
-Shawn

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Fuzzy Array Search

2011-06-07 Thread Shawn McKenzie


On 06/07/2011 04:28 PM, Floyd Resler wrote:
>
> Shawn,
>   I'm terrible with regular expressions.  Could you give me an example?
>
> Thanks!
> Floyd
>
>

Depends.  Could be as simple as this to return an array of all
occurrences of $needle in any of the $haystack_array values:

$haystack_array = array('something is here', 'nothing matches here',
'there will be a somethingsomething match here');
$needle = 'something';
$matches_array = preg_grep("/$needle/", $haystack_array);

Depends on where you are getting the search criteria and how you want it
to match in the array.  BTW...  If this is originally in a database then
you can do it there with the LIKE operator and % wildcard.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Fuzzy Array Search

2011-06-07 Thread Alex Nikitin
It runs fast on my 2.33 core 2, and about as fast on this small data set, on
the dual 6 core with 96GB ram, or the 8 core 9GB box, it depends on the size
of your data set, memory speed and latency, and miniscule amount of
processing power (once again assuming small data set).

That said, you could probably do some clever stuff to minimize the range you
are looking for. For example, you could use the average record size with
imploding the array and searching, capturing the offset, you could
potentially cut out a lot of records that you are, within a certain
probability sure that the result is not in, making your search execute
faster by not even looking in the majority of data in most cases, this would
be interesting to test out actually. You could sort the array to further
narrow down the search by some criteria, what have you. This would all apply
if you are searching very large data sets, i am talking about multiple
billion data points. And all that said, arrays are not really a good
data-structure for searching anyways, that's why they are rarely used in
file systems or as memory data structures ;)

Shawn, == is not good for string comparison, its a bad habit that one should
get out of, use ===, its much safer .

Also try the same algorithm on 10 arrays of some number of values
10-1000 perhaps, that would give you better performance statistics :)



-- Alex

--
The trouble with programmers is that you can never tell what a programmer is
doing until it’s too late.  ~Seymour Cray



On Tue, Jun 7, 2011 at 5:25 PM, Shawn McKenzie  wrote:

> On 06/07/2011 03:57 PM, Floyd Resler wrote:
> >
> > On Jun 7, 2011, at 4:42 PM, Alex Nikitin wrote:
> >
> >> If you don't need the location, you can implode the array and use preg
> >> match, quickly testing it, that gives you about 4.5 times performance
> >> increase, but it wont give you the location, only if a certain value
> exists
> >> within the array... You can kind of do some really clever math to get
> your
> >> search parameters from there, which would be feasible on really large
> data
> >> sets, but if you want location, you will have to iterate at some
> point...
> >>
> >> (sorry i keep on hitting reply instead of reply to all)
> >>
> >> --
> >> The trouble with programmers is that you can never tell what a
> programmer is
> >> doing until it’s too late.  ~Seymour Cray
> >>
> >>
> >>
> >> On Tue, Jun 7, 2011 at 2:57 PM, Shawn McKenzie 
> wrote:
> >>
> >>> On 06/07/2011 12:45 PM, Floyd Resler wrote:
>  What would be the easiest way to do a fuzzy array search?  Can I do
> this
> >>> without having to step through the array?
> 
>  Thanks!
>  Floyd
> 
> >>>
> >>> I use preg_grep()
> >>>
> >>> --
> >>> Thanks!
> >>> -Shawn
> >>> http://www.spidean.com
> >>>
> >
> > I actually do need the location since I need to get the resulting match.
>  I went ahead and tried to iterate the array and it was MUCH faster than I
> expected it to be!  Of course, considering the machine I'm running this on
> is a monster (2.66 GHz 8 cores, 24GB of RAM) it shouldn't have surprised me!
> >
> > Thanks!
> > Floyd
> >
>
> If you are using a straight equality comparison then the loop would be
> faster (but then array search would probably be better), however if you
> need to use a preg_match() in the loop ("fuzzy search"), then
> preg_grep() will be much faster than the loop.
>
> LOOP WITH PREG_MATCH: 10
>  0.435957 seconds
> PREG_GREP: 10
>  0.085604 seconds
>
> LOOP WITH IF ==: 10
>  0.044594 seconds
> PREG_GREP: 10
>  0.091519 seconds
>
> --
> Thanks!
> -Shawn
> http://www.spidean.com
>


Re: [PHP] Re: Fuzzy Array Search

2011-06-07 Thread Shawn McKenzie
On 06/07/2011 03:57 PM, Floyd Resler wrote:
> 
> On Jun 7, 2011, at 4:42 PM, Alex Nikitin wrote:
> 
>> If you don't need the location, you can implode the array and use preg
>> match, quickly testing it, that gives you about 4.5 times performance
>> increase, but it wont give you the location, only if a certain value exists
>> within the array... You can kind of do some really clever math to get your
>> search parameters from there, which would be feasible on really large data
>> sets, but if you want location, you will have to iterate at some point...
>>
>> (sorry i keep on hitting reply instead of reply to all)
>>
>> --
>> The trouble with programmers is that you can never tell what a programmer is
>> doing until it’s too late.  ~Seymour Cray
>>
>>
>>
>> On Tue, Jun 7, 2011 at 2:57 PM, Shawn McKenzie  wrote:
>>
>>> On 06/07/2011 12:45 PM, Floyd Resler wrote:
 What would be the easiest way to do a fuzzy array search?  Can I do this
>>> without having to step through the array?

 Thanks!
 Floyd

>>>
>>> I use preg_grep()
>>>
>>> --
>>> Thanks!
>>> -Shawn
>>> http://www.spidean.com
>>>
> 
> I actually do need the location since I need to get the resulting match.  I 
> went ahead and tried to iterate the array and it was MUCH faster than I 
> expected it to be!  Of course, considering the machine I'm running this on is 
> a monster (2.66 GHz 8 cores, 24GB of RAM) it shouldn't have surprised me!
> 
> Thanks!
> Floyd
> 

If you are using a straight equality comparison then the loop would be
faster (but then array search would probably be better), however if you
need to use a preg_match() in the loop ("fuzzy search"), then
preg_grep() will be much faster than the loop.

LOOP WITH PREG_MATCH: 10
 0.435957 seconds
PREG_GREP: 10
 0.085604 seconds

LOOP WITH IF ==: 10
 0.044594 seconds
PREG_GREP: 10
 0.091519 seconds

-- 
Thanks!
-Shawn
http://www.spidean.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Fuzzy Array Search

2011-06-07 Thread Floyd Resler

On Jun 7, 2011, at 4:42 PM, Alex Nikitin wrote:

> If you don't need the location, you can implode the array and use preg
> match, quickly testing it, that gives you about 4.5 times performance
> increase, but it wont give you the location, only if a certain value exists
> within the array... You can kind of do some really clever math to get your
> search parameters from there, which would be feasible on really large data
> sets, but if you want location, you will have to iterate at some point...
> 
> (sorry i keep on hitting reply instead of reply to all)
> 
> --
> The trouble with programmers is that you can never tell what a programmer is
> doing until it’s too late.  ~Seymour Cray
> 
> 
> 
> On Tue, Jun 7, 2011 at 2:57 PM, Shawn McKenzie  wrote:
> 
>> On 06/07/2011 12:45 PM, Floyd Resler wrote:
>>> What would be the easiest way to do a fuzzy array search?  Can I do this
>> without having to step through the array?
>>> 
>>> Thanks!
>>> Floyd
>>> 
>> 
>> I use preg_grep()
>> 
>> --
>> Thanks!
>> -Shawn
>> http://www.spidean.com
>> 

I actually do need the location since I need to get the resulting match.  I 
went ahead and tried to iterate the array and it was MUCH faster than I 
expected it to be!  Of course, considering the machine I'm running this on is a 
monster (2.66 GHz 8 cores, 24GB of RAM) it shouldn't have surprised me!

Thanks!
Floyd


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Fuzzy Array Search

2011-06-07 Thread Alex Nikitin
If you don't need the location, you can implode the array and use preg
match, quickly testing it, that gives you about 4.5 times performance
increase, but it wont give you the location, only if a certain value exists
within the array... You can kind of do some really clever math to get your
search parameters from there, which would be feasible on really large data
sets, but if you want location, you will have to iterate at some point...

(sorry i keep on hitting reply instead of reply to all)

--
The trouble with programmers is that you can never tell what a programmer is
doing until it’s too late.  ~Seymour Cray



On Tue, Jun 7, 2011 at 2:57 PM, Shawn McKenzie  wrote:

> On 06/07/2011 12:45 PM, Floyd Resler wrote:
> > What would be the easiest way to do a fuzzy array search?  Can I do this
> without having to step through the array?
> >
> > Thanks!
> > Floyd
> >
>
> I use preg_grep()
>
> --
> Thanks!
> -Shawn
> http://www.spidean.com
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


[PHP] Re: Fuzzy Array Search

2011-06-07 Thread Shawn McKenzie
On 06/07/2011 12:45 PM, Floyd Resler wrote:
> What would be the easiest way to do a fuzzy array search?  Can I do this 
> without having to step through the array?
> 
> Thanks!
> Floyd
> 

I use preg_grep()

-- 
Thanks!
-Shawn
http://www.spidean.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php