[PHP] String searching

2008-05-17 Thread Chris W

I need to find the position of the first character in the string
(searching from the end) that is not one of the characters in a set.  In
this case the set is [0-9a-zA-z-_]

I guess to be even more specific, I want to split a string into to parts
the first part can contain anything and the second part must be only in
the set described above.

What is the easiest way to do this?

--
Chris W
KE5GIX

Protect your digital freedom and privacy, eliminate DRM,
learn more at http://www.defectivebydesign.org/what_is_drm;

Ham Radio Repeater Database.
http://hrrdb.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] String searching

2008-05-17 Thread Daniel Brown
On Sat, May 17, 2008 at 2:17 AM, Chris W [EMAIL PROTECTED] wrote:
 I need to find the position of the first character in the string
 (searching from the end) that is not one of the characters in a set.  In
 this case the set is [0-9a-zA-z-_]

To find the position of a specific character, RTFM on strpos().
For those not existing in your condition, I'd recommend
everythingbut(), but it's yet to be included in the core.  ;-P

 I guess to be even more specific, I want to split a string into to parts
 the first part can contain anything and the second part must be only in
 the set described above.

You can split a string by doing something as simple as this:

?php
$str = abcdefghijklmnopqrstuvwxyz;
$d = $str[5]; // $d == position - 1, because count always begins with 0
?

So to walk backward through the string, while it's not very clean,
you could do:

?php
$str = ABCDEF01234567789;

for($i=strlen($str);$i0;$i--) {
if(preg_match('/[g-z]/i',$str[$i])) {
// Handle your this is a bad character condition(s).
// break; /* Or, optionally, continue. */
}
}
?

Not pretty, but if my mind is still working at 2:30a (EDT), it
should help you out.

-- 
/Daniel P. Brown
Dedicated Servers - Intel 2.4GHz w/2TB bandwidth/mo. starting at just
$59.99/mo. with no contract!
Dedicated servers, VPS, and hosting from $2.50/mo.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] String searching

2008-05-17 Thread Richard Heyes

Chris W wrote:

I need to find the position of the first character in the string
(searching from the end) that is not one of the characters in a set.  In
this case the set is [0-9a-zA-z-_]

I guess to be even more specific, I want to split a string into to parts
the first part can contain anything and the second part must be only in
the set described above.

What is the easiest way to do this?



There's something here, imaginatively called blah(), which does what you 
require:


http://www.phpguru.org/preg/example.phps

--
Richard Heyes

++
| Access SSH with a Windows mapped drive |
|http://www.phpguru.org/sftpdrive|
++

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] String searching peformance

2003-02-26 Thread {R}ichard Ashton
On Mon, 24 Feb 2003 22:35:35 +0100, Ernest E Vogelsinger wrote:

At 21:22 24.02.2003, {R}ichard Ashton spoke out and said:
[snip]
while ( $flag == true )
if (strpos($body, $word[])  0) {$flag=false}

What I really need to know is which is the fastest loop?
Which is the fastest match, strpos?
Which is the fastest comparison?
What other optimisations are possible to maximise search speed?

I would expect to keep a frequency hit count and sort the $words[] so
that the most frequent hits are found first, remembering that only
$body with NO $words in then are automatically posted, so every word
must be tested.

I don't know sufficient internals to pick the fastest method. I will be
running with 4.3.1 on FreeBSD 4.6
[snip] 

I'd suggest something like this:

$buzzwords = array('idiot', 'fool', 'shit', 'FOAD');

$re = '/(' . implode('|',$buzzwords).')/is';
if (preg_match($re, $posting))
// bad word found
else
// cleared

Thank you very much that is brilliant, I would never have thought of
that. 

Do you think that:

if (preg_match($re, $posting, $hits)) would slow it down at all. The
$buzzwords will be kept in a file to be loaded before each run, every 5
minutes. I could therefore keep a count of which words hit most
frequently and move them to the top of the list.

{R}


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] String searching peformance

2003-02-26 Thread Ernest E Vogelsinger
At 09:49 26.02.2003, {R}ichard Ashton said:
[snip]
Do you think that:

if (preg_match($re, $posting, $hits)) would slow it down at all. The
$buzzwords will be kept in a file to be loaded before each run, every 5
minutes. I could therefore keep a count of which words hit most
frequently and move them to the top of the list.
[snip] 

If you have a lot of buzzwords I believe this could make quite some
performance impact.


-- 
   O Ernest E. Vogelsinger
   (\)ICQ #13394035
^ http://www.vogelsinger.at/



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] String searching peformance

2003-02-26 Thread Jason Wong
On Wednesday 26 February 2003 16:49, {R}ichard Ashton wrote:

 Do you think that:

 if (preg_match($re, $posting, $hits)) would slow it down at all. The
 $buzzwords will be kept in a file to be loaded before each run, every 5
 minutes. I could therefore keep a count of which words hit most
 frequently and move them to the top of the list.

No idea whether this would be faster (it's certainly easier to code):

  explode() text into an array
  place your banned words into an array
  array_intersect() to find words common in both

Do your own benchmarking!

-- 
Jason Wong - Gremlins Associates - www.gremlins.biz
Open Source Software Systems Integrators
* Web Design  Hosting * Internet  Intranet Applications Development *
--
Search the list archives before you post
http://marc.theaimsgroup.com/?l=php-general
--
/*
Why is the alphabet in that order?  Is it because of that song?
-- Steven Wright
*/


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] String searching peformance

2003-02-26 Thread {R}ichard Ashton
On Wed, 26 Feb 2003 17:47:41 +0800, Jason Wong wrote:
On Wednesday 26 February 2003 16:49, {R}ichard Ashton wrote:

 Do you think that:

 if (preg_match($re, $posting, $hits)) would slow it down at all. The
 $buzzwords will be kept in a file to be loaded before each run, every 5
 minutes. I could therefore keep a count of which words hit most
 frequently and move them to the top of the list.

No idea whether this would be faster (it's certainly easier to code):

  explode() text into an array
  place your banned words into an array
  array_intersect() to find words common in both

Thanks for another method,

Do your own benchmarking!

That is the easy bit.

{R}


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] String searching peformance

2003-02-24 Thread {R}ichard Ashton

I am looking for the most efficient way to search for Trigger words
in a big string.

I have a string, $body which is all of the body of any particular
Usenet Post, so it can be as short as Me too and up to some, as yet
undecided, limit say around 10Kbytes.

I have a list of words in an array, maybe 50 words of up to 8
characters.

I need to search the $body for the presence of $word and if any $word
in the array is found in  $body Trigger an action.

This is a moderation bot for a beginners group, NO Flaming No swearing
No Abuse, you can imagine a list of words, idiot, fool, shit, FOAD, and
so on. When *any* one is found the post is diverted for manual
intervention, and searching stops.

Just guessing I would imagine 

while ( $flag == true )
if (strpos($body, $word[])  0) {$flag=false}

What I really need to know is which is the fastest loop?
Which is the fastest match, strpos?
Which is the fastest comparison?
What other optimisations are possible to maximise search speed?

I would expect to keep a frequency hit count and sort the $words[] so
that the most frequent hits are found first, remembering that only
$body with NO $words in then are automatically posted, so every word
must be tested.

I don't know sufficient internals to pick the fastest method. I will be
running with 4.3.1 on FreeBSD 4.6

{R}



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] String searching peformance

2003-02-24 Thread Ernest E Vogelsinger
At 21:22 24.02.2003, {R}ichard Ashton spoke out and said:
[snip]
while ( $flag == true )
if (strpos($body, $word[])  0) {$flag=false}

What I really need to know is which is the fastest loop?
Which is the fastest match, strpos?
Which is the fastest comparison?
What other optimisations are possible to maximise search speed?

I would expect to keep a frequency hit count and sort the $words[] so
that the most frequent hits are found first, remembering that only
$body with NO $words in then are automatically posted, so every word
must be tested.

I don't know sufficient internals to pick the fastest method. I will be
running with 4.3.1 on FreeBSD 4.6
[snip] 

I'd suggest something like this:

$buzzwords = array('idiot', 'fool', 'shit', 'FOAD');

$re = '/(' . implode('|',$buzzwords).')/is';
if (preg_match($re, $posting))
// bad word found
else
// cleared

You only need to make sure that your buzzwords dont contain a '/' - you
could change the regex delimiter then, or simply escape it in the array.


-- 
   O Ernest E. Vogelsinger 
   (\) ICQ #13394035 
^ http://www.vogelsinger.at/


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php