Re: [PHP] Regex in PHP

2008-06-05 Thread Robert Cummings
On Thu, 2008-06-05 at 00:24 -0400, Nathan Nobbe wrote:

 you really know how to rub it in there rob.  but i was looking at the
 implementation in the php code, looks like somebody likes my idea
 (this code
 found in ext/standard/string.c).  on the second line the haystack is
 converted to lower case[1], then if it passes a couple of checks, the
 needle
 is converted to lower case[2], and lastly the comparison is
 performed[3].
 there is no logic to check both cases.
 (i have placed a star beside the statements ive referred to).
 ...
 haystack_dup = estrndup(haystack, haystack_len);
 *[1]php_strtolower(haystack_dup, haystack_len);
 
 if (Z_TYPE_P(needle) == IS_STRING) {
 if (Z_STRLEN_P(needle) == 0 || Z_STRLEN_P(needle) 
 haystack_len) {
 efree(haystack_dup);
 RETURN_FALSE;
 }
 
 needle_dup = estrndup(Z_STRVAL_P(needle), Z_STRLEN_P(needle));
 *[2]php_strtolower(needle_dup, Z_STRLEN_P(needle));
 *[3]found = php_memnstr(haystack_dup + offset, needle_dup,
 Z_STRLEN_P(needle), haystack_dup + haystack_len);
 }

Funny, I guess they took the quick route. This code could obviously be
optmized :)

But let's go with something used more often... such as more traditional
string comparison where you're more likely to want to eke out
efficiency:

ZEND_API int zend_binary_strcasecmp(char *s1, uint len1, char *s2, uint
len2)
{
int len;
int c1, c2;

len = MIN(len1, len2);

while (len--) {
c1 = zend_tolower((int)*(unsigned char *)s1++);
c2 = zend_tolower((int)*(unsigned char *)s2++);
if (c1 != c2) {
return c1 - c2;
}
}

return len1 - len2;
}

Well looks like they do indeed do a conversion.. but on a char by char
basis. Strange that. Could more than likely speed it up by doing an
initial exactness comparison and then falling back on the above. Maybe
I'll compile and test out the following later:

ZEND_API int zend_binary_strcasecmp
(char *s1, uint len1, char *s2, uint len2)
{
int len;
int c1, c2;

len = MIN(len1, len2);

while (len--) {
c1 = (int)*(unsigned char *)s1++;
c2 = (int)*(unsigned char *)s2++;

if( c1 != c2 ){
c1 = zend_tolower( c1 );
c2 = zend_tolower( c2 );

if (c1 != c2) {
return c1 - c2;
}
}
}

return len1 - len2;
}

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-05 Thread Richard Heyes

 sorry to bother you richard.

You didn't, I just wanted to make sure I wasn't losing it (more).

--
Richard Heyes

++
| Access SSH with a Windows mapped drive |
|http://www.phpguru.org/sftpdrive|
++

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-04 Thread Richard Heyes

Hi,


and the case insensitive versions are a hair faster still ;)


Are they? I always thought that case-sensitive functions were faster 
because they have to test fewer comparisons. Eg To test if i == I in a 
case-insensitive fashion requires two comparisons (i == I and i == i) 
whereas a case-sensitive comparison requires only one (i == i).


Cheers.

--
Richard Heyes

++
| Access SSH with a Windows mapped drive |
|http://www.phpguru.org/sftpdrive|
++

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-04 Thread Nathan Nobbe
On Wed, Jun 4, 2008 at 10:10 AM, Richard Heyes [EMAIL PROTECTED] wrote:

 Hi,

  and the case insensitive versions are a hair faster still ;)


 Are they? I always thought that case-sensitive functions were faster
 because they have to test fewer comparisons. Eg To test if i == I in a
 case-insensitive fashion requires two comparisons (i == I and i == i)
 whereas a case-sensitive comparison requires only one (i == i).


umm, isnt it like the other way around.  in the case of case-sensitive, you
have to be able to distinguish between i and I, whereas w/ the case
insensitive, you dont care so, basically, you strtolower() first thing, then
just compare to lower case characters.

-nathan


Re: [PHP] Regex in PHP

2008-06-04 Thread Robert Cummings
On Wed, 2008-06-04 at 10:18 -0600, Nathan Nobbe wrote:
 On Wed, Jun 4, 2008 at 10:10 AM, Richard Heyes [EMAIL PROTECTED] wrote:
 
  Hi,
 
   and the case insensitive versions are a hair faster still ;)
 
 
  Are they? I always thought that case-sensitive functions were faster
  because they have to test fewer comparisons. Eg To test if i == I in a
  case-insensitive fashion requires two comparisons (i == I and i == i)
  whereas a case-sensitive comparison requires only one (i == i).
 
 
 umm, isnt it like the other way around.  in the case of case-sensitive, you
 have to be able to distinguish between i and I, whereas w/ the case
 insensitive, you dont care so, basically, you strtolower() first thing, then
 just compare to lower case characters.

Nope, case insensitive is slower since you must make two tests for
characters having a lower and upper case version. With case sensitive
comparisons you only need to make a single comparison.

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-04 Thread Nathan Nobbe
On Wed, Jun 4, 2008 at 10:26 AM, Robert Cummings [EMAIL PROTECTED]
wrote:

 Nope, case insensitive is slower since you must make two tests for
 characters having a lower and upper case version. With case sensitive
 comparisons you only need to make a single comparison.


a quick test shows stripos beating strpos.

?php

$str = 'asSAFAASFDADSfasfjhalskfjhlaseAERQWERQWER;.dafasjhflasfjd';
$search = 'fdasASDFAafdas';

$start = microtime();
strpos($str, $search);
$end = microtime();
$r1 = $end - $start;

$start = microtime();
stripos($str, $search);
$end2 = microtime();
$r2 = $end2 - $start;

echo strpos: $r1\n;
echo stripos: $r2\n;

if($r2  $r1) {
echo 'stripos is faster' . PHP_EOL;
}
?

-nathan


Re: [PHP] Regex in PHP

2008-06-04 Thread Nitsan Bin-Nun
I can't find any good reason for regex in this case.
you can try to split it with explode / stristr / create a function by your
own which goes over the string and check when a @ is catched, something
like:


function GetDomainName ($a)
{

$returnDomain = ;
$beigale = false;
for ($i = 0; $i  strlen($a)  !$beigale; $i++)

if ($a[$i] == '@')
{

for ($z = ($i+1); $z  strlen($a); $z++)
$returnDomain .= $a[$z];
$beigale = true;
}
return $returnDomain;
}



(there is probably a better way to do this - this is just what came up at my
mind right now..)

On 04/06/2008, VamVan [EMAIL PROTECTED] wrote:

 Hello All,

 For example I have these email addressess -

 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]

 What would be my PHP function[Regular expression[ to that can give me some
 thing like

 yahoo.com
 hotmail.com
 gmail.com

 Thanks



Re: [PHP] Regex in PHP

2008-06-04 Thread Robert Cummings
On Wed, 2008-06-04 at 10:56 -0600, Nathan Nobbe wrote:
 On Wed, Jun 4, 2008 at 10:26 AM, Robert Cummings [EMAIL PROTECTED]
 wrote:
 
  Nope, case insensitive is slower since you must make two tests for
  characters having a lower and upper case version. With case sensitive
  comparisons you only need to make a single comparison.
 
 
 a quick test shows stripos beating strpos.
 
 ?php
 
 $str = 'asSAFAASFDADSfasfjhalskfjhlaseAERQWERQWER;.dafasjhflasfjd';
 $search = 'fdasASDFAafdas';
 
 $start = microtime();
 strpos($str, $search);
 $end = microtime();
 $r1 = $end - $start;
 
 $start = microtime();
 stripos($str, $search);
 $end2 = microtime();
 $r2 = $end2 - $start;
 
 echo strpos: $r1\n;
 echo stripos: $r2\n;
 
 if($r2  $r1) {
 echo 'stripos is faster' . PHP_EOL;
 }
 ?

Did you just try to use a test that used a single iteration to prove me
wrong? OMFG ponies!!! Loop each one of those 10 million times, use a
separate script for each, and use the system time program to
appropriately measure the time the system takes.

:)

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-04 Thread Nitsan Bin-Nun
at least he have some humer ;-)

On 04/06/2008, Robert Cummings [EMAIL PROTECTED] wrote:

 On Wed, 2008-06-04 at 10:56 -0600, Nathan Nobbe wrote:
  On Wed, Jun 4, 2008 at 10:26 AM, Robert Cummings [EMAIL PROTECTED]
  wrote:
 
   Nope, case insensitive is slower since you must make two tests for
   characters having a lower and upper case version. With case sensitive
   comparisons you only need to make a single comparison.
 
 
  a quick test shows stripos beating strpos.
 
  ?php
 
  $str = 'asSAFAASFDADSfasfjhalskfjhlaseAERQWERQWER;.dafasjhflasfjd';
  $search = 'fdasASDFAafdas';
 
  $start = microtime();
  strpos($str, $search);
  $end = microtime();
  $r1 = $end - $start;
 
  $start = microtime();
  stripos($str, $search);
  $end2 = microtime();
  $r2 = $end2 - $start;
 
  echo strpos: $r1\n;
  echo stripos: $r2\n;
 
  if($r2  $r1) {
  echo 'stripos is faster' . PHP_EOL;
  }
  ?

 Did you just try to use a test that used a single iteration to prove me
 wrong? OMFG ponies!!! Loop each one of those 10 million times, use a
 separate script for each, and use the system time program to
 appropriately measure the time the system takes.

 :)

 Cheers,
 Rob.
 --
 http://www.interjinn.com
 Application and Templating Framework for PHP


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Regex in PHP

2008-06-04 Thread Nathan Nobbe
On Wed, Jun 4, 2008 at 11:12 AM, Robert Cummings [EMAIL PROTECTED]
wrote:

 Did you just try to use a test that used a single iteration to prove me
 wrong? OMFG ponies!!! Loop each one of those 10 million times, use a
 separate script for each, and use the system time program to
 appropriately measure the time the system takes.


?php

$str = 'asSAFAASFDADSfasfjhalskfjhlaseAERQWERQWER;.dafasjhflasfjd';
$search = 'fdasASDFAafdas';

$start = microtime();

for($i = 0; $i  1000; $i++)
strpos($str, $search);

$end = microtime();
$r1 = $end - $start;

$start = microtime();

for($i = 0; $i  1000; $i++)
stripos($str, $search);

$end2 = microtime();
$r2 = $end2 - $start;

echo strpos: $r1\n;
echo stripos: $r2\n;

if($r2  $r1) {
echo 'stripos is faster' . PHP_EOL;
}
--
strpos: 0.730519
stripos: -0.098887
stripos is faster

stripos still dominates ;)  what is this system time program you speak of ?
and, ill put them into separate programs when i get home this evening, and
have more time to screw around.

-nathan


Re: [PHP] Regex in PHP

2008-06-04 Thread Robert Cummings
On Wed, 2008-06-04 at 13:12 -0400, Robert Cummings wrote:
 On Wed, 2008-06-04 at 10:56 -0600, Nathan Nobbe wrote:
  On Wed, Jun 4, 2008 at 10:26 AM, Robert Cummings [EMAIL PROTECTED]
  wrote:
  
   Nope, case insensitive is slower since you must make two tests for
   characters having a lower and upper case version. With case sensitive
   comparisons you only need to make a single comparison.
  
  
  a quick test shows stripos beating strpos.
  
  ?php
  
  $str = 'asSAFAASFDADSfasfjhalskfjhlaseAERQWERQWER;.dafasjhflasfjd';
  $search = 'fdasASDFAafdas';
  
  $start = microtime();
  strpos($str, $search);
  $end = microtime();
  $r1 = $end - $start;
  
  $start = microtime();
  stripos($str, $search);
  $end2 = microtime();
  $r2 = $end2 - $start;
  
  echo strpos: $r1\n;
  echo stripos: $r2\n;
  
  if($r2  $r1) {
  echo 'stripos is faster' . PHP_EOL;
  }
  ?
 
 Did you just try to use a test that used a single iteration to prove me
 wrong? OMFG ponies!!! Loop each one of those 10 million times, use a
 separate script for each, and use the system time program to
 appropriately measure the time the system takes.

Here's my results on my Athlon 2400, 10 million loops on each type using
your script settings for $str and $search and making 3 runs each time:

strpos()
===
real0m7.133s
user0m6.480s
sys 0m0.020s

real0m6.134s
user0m6.068s
sys 0m0.016s

real0m6.527s
user0m6.476s
sys 0m0.012s

stripos()
===
real0m13.720s
user0m13.517s
sys 0m0.072s

real0m13.158s
user0m13.009s
sys 0m0.016s

real0m13.151s
user0m13.013s
sys 0m0.012s

Now, that's how you test efficiency. Doing a single run is very, very
subject to whatever else your processor might be doing and as such is
usually garbage for any kind of analysis.

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-04 Thread Robert Cummings
On Wed, 2008-06-04 at 11:18 -0600, Nathan Nobbe wrote:
 On Wed, Jun 4, 2008 at 11:12 AM, Robert Cummings [EMAIL PROTECTED]
 wrote:
 
  Did you just try to use a test that used a single iteration to prove me
  wrong? OMFG ponies!!! Loop each one of those 10 million times, use a
  separate script for each, and use the system time program to
  appropriately measure the time the system takes.
 
 
 ?php
 
 $str = 'asSAFAASFDADSfasfjhalskfjhlaseAERQWERQWER;.dafasjhflasfjd';
 $search = 'fdasASDFAafdas';
 
 $start = microtime();
 
 for($i = 0; $i  1000; $i++)
 strpos($str, $search);
 
 $end = microtime();
 $r1 = $end - $start;
 
 $start = microtime();
 
 for($i = 0; $i  1000; $i++)
 stripos($str, $search);
 
 $end2 = microtime();
 $r2 = $end2 - $start;
 
 echo strpos: $r1\n;
 echo stripos: $r2\n;
 
 if($r2  $r1) {
 echo 'stripos is faster' . PHP_EOL;
 }
 --
 strpos: 0.730519
 stripos: -0.098887
 stripos is faster

Negative time eh!? You're code must be buggy :| The time program works
like this unde rmost nix systems:

time php -q foo.php

And then it returns a report of how much time was taken for various
types of time. I've already sent an email with the appropriate timing of
both versions. BTW, as primtive as microtime() is for this kind of
measurement... you might want to read the manual to use it properly:

http://ca3.php.net/manual/en/function.microtime.php

You probably want:

microtime( true )

 stripos still dominates ;)  what is this system time program you speak of ?
 and, ill put them into separate programs when i get home this evening, and
 have more time to screw around.

It's a simple thought process to understand that unless someone coding
the PHP internals buggered their code, that stripos() cannot possibly be
faster than strpos(). I really don't need benchmarks for something this
simple to know which SHOULD be faster.

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-04 Thread Nathan Nobbe
On Wed, Jun 4, 2008 at 2:06 PM, Robert Cummings [EMAIL PROTECTED]
wrote:

 On Wed, 2008-06-04 at 11:18 -0600, Nathan Nobbe wrote:
  On Wed, Jun 4, 2008 at 11:12 AM, Robert Cummings [EMAIL PROTECTED]
  wrote:
 
   Did you just try to use a test that used a single iteration to prove me
   wrong? OMFG ponies!!! Loop each one of those 10 million times, use a
   separate script for each, and use the system time program to
   appropriately measure the time the system takes.
 
 
  ?php
 
  $str = 'asSAFAASFDADSfasfjhalskfjhlaseAERQWERQWER;.dafasjhflasfjd';
  $search = 'fdasASDFAafdas';
 
  $start = microtime();
 
  for($i = 0; $i  1000; $i++)
  strpos($str, $search);
 
  $end = microtime();
  $r1 = $end - $start;
 
  $start = microtime();
 
  for($i = 0; $i  1000; $i++)
  stripos($str, $search);
 
  $end2 = microtime();
  $r2 = $end2 - $start;
 
  echo strpos: $r1\n;
  echo stripos: $r2\n;
 
  if($r2  $r1) {
  echo 'stripos is faster' . PHP_EOL;
  }
  --
  strpos: 0.730519
  stripos: -0.098887
  stripos is faster

 Negative time eh!? You're code must be buggy :| The time program works
 like this unde rmost nix systems:

time php -q foo.php

 And then it returns a report of how much time was taken for various
 types of time. I've already sent an email with the appropriate timing of
 both versions. BTW, as primtive as microtime() is for this kind of
 measurement... you might want to read the manual to use it properly:

http://ca3.php.net/manual/en/function.microtime.php

 You probably want:

microtime( true )

  stripos still dominates ;)  what is this system time program you speak of
 ?
  and, ill put them into separate programs when i get home this evening,
 and
  have more time to screw around.

 It's a simple thought process to understand that unless someone coding
 the PHP internals buggered their code, that stripos() cannot possibly be
 faster than strpos(). I really don't need benchmarks for something this
 simple to know which SHOULD be faster.


i repeated your test using the time program and splitting the script into 2,
one for each strpos and stripos, to find similar results.  imo, there is no
need for 2 comparisons for case-insensitive searches, because both arguments
can be converted to a single case prior to the search.  obviously, there is
a small amount of overhead there the case-sensitive search is unencumbered
by.  i guess i never sat down and thought about how that algorithm would
work (case-sensitive) =/.

thanks for the tips rob.  sorry to bother you richard.

-nathan


Re: [PHP] Regex in PHP

2008-06-04 Thread Robert Cummings
On Wed, 2008-06-04 at 23:20 -0400, Nathan Nobbe wrote:

 i repeated your test using the time program and splitting the script into 2,
 one for each strpos and stripos, to find similar results.  imo, there is no
 need for 2 comparisons for case-insensitive searches, because both arguments
 can be converted to a single case prior to the search.  obviously, there is
 a small amount of overhead there the case-sensitive search is unencumbered
 by.  i guess i never sat down and thought about how that algorithm would
 work (case-sensitive) =/.
 
 thanks for the tips rob.  sorry to bother you richard.

You would do two comparisons... why incur the overhead of a conversion
if one is not necessary. First you do case sensitive match, if that
fails then you try the alternative version comparison. It is inefficient
to perform 2 conversions and a single comparison in contrast. Similarly,
it's very inefficient to convert two entire strings then perform a
comparison. If the first characters differ then conversion of the rest
of the strings was pointless. This is basic algorithms in computer
science.

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-04 Thread Nathan Nobbe
On Wed, Jun 4, 2008 at 11:43 PM, Robert Cummings [EMAIL PROTECTED]
wrote:

 On Wed, 2008-06-04 at 23:20 -0400, Nathan Nobbe wrote:
 
  i repeated your test using the time program and splitting the script into
 2,
  one for each strpos and stripos, to find similar results.  imo, there is
 no
  need for 2 comparisons for case-insensitive searches, because both
 arguments
  can be converted to a single case prior to the search.  obviously, there
 is
  a small amount of overhead there the case-sensitive search is
 unencumbered
  by.  i guess i never sat down and thought about how that algorithm would
  work (case-sensitive) =/.
 
  thanks for the tips rob.  sorry to bother you richard.

 You would do two comparisons... why incur the overhead of a conversion
 if one is not necessary.


because it simplifies the algorithm, there is no need for conditional logic.


 First you do case sensitive match, if that
 fails then you try the alternative version comparison. It is inefficient
 to perform 2 conversions and a single comparison in contrast.


3 operations vs. 1 or potentially 2, sure.


 Similarly,
 it's very inefficient to convert two entire strings then perform a
 comparison.


then they could be converted one at a time as the strings were traversed to
increase efficiency.


 If the first characters differ then conversion of the rest
 of the strings was pointless.


good point.


 This is basic algorithms in computer
 science.


you really know how to rub it in there rob.  but i was looking at the
implementation in the php code, looks like somebody likes my idea (this code
found in ext/standard/string.c).  on the second line the haystack is
converted to lower case[1], then if it passes a couple of checks, the needle
is converted to lower case[2], and lastly the comparison is performed[3].
there is no logic to check both cases.
(i have placed a star beside the statements ive referred to).
...
haystack_dup = estrndup(haystack, haystack_len);
*[1]php_strtolower(haystack_dup, haystack_len);

if (Z_TYPE_P(needle) == IS_STRING) {
if (Z_STRLEN_P(needle) == 0 || Z_STRLEN_P(needle)  haystack_len) {
efree(haystack_dup);
RETURN_FALSE;
}

needle_dup = estrndup(Z_STRVAL_P(needle), Z_STRLEN_P(needle));
*[2]php_strtolower(needle_dup, Z_STRLEN_P(needle));
*[3]found = php_memnstr(haystack_dup + offset, needle_dup,
Z_STRLEN_P(needle), haystack_dup + haystack_len);
}
...

-nathan


Re: [PHP] Regex in PHP

2008-06-03 Thread Liran Oz

You can use this:
$str = '[EMAIL PROTECTED]';

preg_match('/[EMAIL PROTECTED]@(.+)/', $str, $matches);
var_dump($matches);//will be in $matches[1]


Or without regex:
echo substr($str, strpos($str, '@')+1);

Liran

- Original Message - 
From: VamVan [EMAIL PROTECTED]

To: php-general@lists.php.net
Sent: Wednesday, June 04, 2008 3:39 AM
Subject: [PHP] Regex in PHP



Hello All,

For example I have these email addressess -

[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

What would be my PHP function[Regular expression[ to that can give me some
thing like

yahoo.com
hotmail.com
gmail.com

Thanks




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Regex in PHP

2008-06-03 Thread Nathan Nobbe
On Tue, Jun 3, 2008 at 8:39 PM, VamVan [EMAIL PROTECTED] wrote:

 Hello All,

 For example I have these email addressess -

 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]

 What would be my PHP function[Regular expression[ to that can give me some
 thing like

 yahoo.com
 hotmail.com
 gmail.com


if you know the values are valid email addresses, use a combination of
strripos() and substr().  it will be nice a fast that way.

as an aside, this is what the manual says on preg_match()
Do not use *preg_match()* if you only want to check if one string is
contained in another string. Use
strpos()http://www.php.net/manual/en/function.strpos.phpor
strstr() http://www.php.net/manual/en/function.strstr.php instead as they
will be faster.

and the case insensitive versions are a hair faster still ;)

-nathan