Re: [PHP] filter_var using regex

2011-05-05 Thread Jason Gerfen
On 05/04/2011 03:10 PM, Ashley Sheridan wrote:
 On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote:
 
 On 05/04/2011 01:27 PM, Ashley Sheridan wrote:
 On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote:

 I am running into a problem using the REGEXP option with filter_var().

 The string I am using: 09VolunteerApplication.doc
 The PCRE regex I am using:
 /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di

 The function in it's entirety:
 return (!filter_var('09VolunteerApplication.doc',
 FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di'
 ? false : true;

 Anyone have any insight into this?



 You missed a + in your regex, at the moment you're only checking to see
 if a file starts with a single a-z or number and then is followed by the
 period. Then you're checking for oddly for one to four extensions in the
 list, are you sure you want to do that? And the square brackets are used
 to match characters, not strings, use the standard brackets to allow
 from a choice of strings

 Try this:

 '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di'

 One other thing you should be aware of maybe, filenames won't always
 consist of just the letters a-z and numbers 0-9, they may contain
 accented or foreign letters, hyphens, spaces and a number of other
 characters depending on the client machines OS. Windows allows very few
 characters for example compared to the Unix-like OS's like MacOS and
 Linux.


 Both are valid PCRE regex's. However the rules regarding usage of
 parenthesis for an XOR string does not explain a similar regex being
 used with the filter_var() like so:

 return (filter_var('kc-1', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di')))
 ? true : false;

 The above returns string(4) kc-1

 Another test using the following works similarly:

 return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
 true : false;

 The above returns string(8) u0368839

 And
 return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
 true : false;

 returns string(8) gp123456

 As you can see these three examples use the start [] as XOR conditionals
 for multiple strings as prefixes.



 
 
 Not quite, you think they match correctly because that's all you're
 testing for, and you're not looking for anything that might disprove
 that. Using your last example, it will also match these strings:
 
 gu0368839
 xx0368839
 p0368839
 
 
 I tested your first regex with '09VolunteerApplication.doc' and it
 doesn't work at all until you add in that plus after the basename match
 part of the regex:
 
 ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$
 
 However, your regex (with the plus) also matches these strings:
 
 09VolunteerApplication.docp
 09VolunteerApplication.docj
 09VolunteerApplication.doc|-- note it's matching the literal bar
 character
 
 Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|
 docx|csv|xls)$) means the regex works as you expect. Square brackets in
 a regex match a range, not a literal string, and without any sort of
 modifier, match only a single instance of that range. So in your
 example, you're matching a 4 character extension containing any of the
 following characters '|cdfgjlnopstx', and a basename containing only 1
 character that is either an a-z or a number.
 

You are right, after a few other tests I stand corrected. My apologies.
However according to the documentation for filter_var() and the PCRE
regexp option if it returns false, which it is, this is indicating an
error with the regex.

Here are the changes I have made:
print_r(var_dump(filter_var('09VolunteerApplication.doc',
FILTER_VALIDATE_REGEXP,
array('options'=array('regexp'='/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di');

I appreciate your assistance and insights.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] filter_var using regex

2011-05-05 Thread Jason Gerfen
On 05/04/2011 03:10 PM, Ashley Sheridan wrote:
 On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote:
 
 On 05/04/2011 01:27 PM, Ashley Sheridan wrote:
 On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote:

 I am running into a problem using the REGEXP option with filter_var().

 The string I am using: 09VolunteerApplication.doc
 The PCRE regex I am using:
 /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di

 The function in it's entirety:
 return (!filter_var('09VolunteerApplication.doc',
 FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di'
 ? false : true;

 Anyone have any insight into this?



 You missed a + in your regex, at the moment you're only checking to see
 if a file starts with a single a-z or number and then is followed by the
 period. Then you're checking for oddly for one to four extensions in the
 list, are you sure you want to do that? And the square brackets are used
 to match characters, not strings, use the standard brackets to allow
 from a choice of strings

 Try this:

 '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di'

 One other thing you should be aware of maybe, filenames won't always
 consist of just the letters a-z and numbers 0-9, they may contain
 accented or foreign letters, hyphens, spaces and a number of other
 characters depending on the client machines OS. Windows allows very few
 characters for example compared to the Unix-like OS's like MacOS and
 Linux.


 Both are valid PCRE regex's. However the rules regarding usage of
 parenthesis for an XOR string does not explain a similar regex being
 used with the filter_var() like so:

 return (filter_var('kc-1', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di')))
 ? true : false;

 The above returns string(4) kc-1

 Another test using the following works similarly:

 return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
 true : false;

 The above returns string(8) u0368839

 And
 return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
 true : false;

 returns string(8) gp123456

 As you can see these three examples use the start [] as XOR conditionals
 for multiple strings as prefixes.



 
 
 Not quite, you think they match correctly because that's all you're
 testing for, and you're not looking for anything that might disprove
 that. Using your last example, it will also match these strings:
 
 gu0368839
 xx0368839
 p0368839
 
 
 I tested your first regex with '09VolunteerApplication.doc' and it
 doesn't work at all until you add in that plus after the basename match
 part of the regex:
 
 ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$
 
 However, your regex (with the plus) also matches these strings:
 
 09VolunteerApplication.docp
 09VolunteerApplication.docj
 09VolunteerApplication.doc|-- note it's matching the literal bar
 character
 
 Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|
 docx|csv|xls)$) means the regex works as you expect. Square brackets in
 a regex match a range, not a literal string, and without any sort of
 modifier, match only a single instance of that range. So in your
 example, you're matching a 4 character extension containing any of the
 following characters '|cdfgjlnopstx', and a basename containing only 1
 character that is either an a-z or a number.
 

You are right, after a few other tests I stand corrected. My apologies.
However according to the documentation for filter_var() and the PCRE
regexp option if it returns false, which it is, this is indicating an
error with the regex.

In addition to this I would like to point out that the same regex using
the older preg_match() function works as it should while the character
class following by the pattern (+) fails the validation portion of the
regex.

print_r(var_dump(filter_var('09VolunteerApplication.doc',
FILTER_VALIDATE_REGEXP,
array('options'=array('regexp'='/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di');

returns false (invalid regex) when using the character matching class
[a-z0-9]+ with the filter_var() function with the FILTER_VALIDATE_REGEXP
option

print_r(var_dump(preg_match('/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/i',
'09VolunteerApplication.doc')));

return int(1) indicating a valid regex as well as a valid match.

I believe this should be reported as a bug but I appreciate your
assistance and insights.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] filter_var using regex

2011-05-05 Thread Ashley Sheridan
On Thu, 2011-05-05 at 13:39 -0600, Jason Gerfen wrote:

 On 05/04/2011 03:10 PM, Ashley Sheridan wrote:
  On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote:
  
  On 05/04/2011 01:27 PM, Ashley Sheridan wrote:
  On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote:
 
  I am running into a problem using the REGEXP option with filter_var().
 
  The string I am using: 09VolunteerApplication.doc
  The PCRE regex I am using:
  /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di
 
  The function in it's entirety:
  return (!filter_var('09VolunteerApplication.doc',
  FILTER_VALIDATE_REGEXP,
  array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di'
  ? false : true;
 
  Anyone have any insight into this?
 
 
 
  You missed a + in your regex, at the moment you're only checking to see
  if a file starts with a single a-z or number and then is followed by the
  period. Then you're checking for oddly for one to four extensions in the
  list, are you sure you want to do that? And the square brackets are used
  to match characters, not strings, use the standard brackets to allow
  from a choice of strings
 
  Try this:
 
  '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di'
 
  One other thing you should be aware of maybe, filenames won't always
  consist of just the letters a-z and numbers 0-9, they may contain
  accented or foreign letters, hyphens, spaces and a number of other
  characters depending on the client machines OS. Windows allows very few
  characters for example compared to the Unix-like OS's like MacOS and
  Linux.
 
 
  Both are valid PCRE regex's. However the rules regarding usage of
  parenthesis for an XOR string does not explain a similar regex being
  used with the filter_var() like so:
 
  return (filter_var('kc-1', FILTER_VALIDATE_REGEXP,
  array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di')))
  ? true : false;
 
  The above returns string(4) kc-1
 
  Another test using the following works similarly:
 
  return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
  array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
  true : false;
 
  The above returns string(8) u0368839
 
  And
  return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
  array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
  true : false;
 
  returns string(8) gp123456
 
  As you can see these three examples use the start [] as XOR conditionals
  for multiple strings as prefixes.
 
 
 
  
  
  Not quite, you think they match correctly because that's all you're
  testing for, and you're not looking for anything that might disprove
  that. Using your last example, it will also match these strings:
  
  gu0368839
  xx0368839
  p0368839
  
  
  I tested your first regex with '09VolunteerApplication.doc' and it
  doesn't work at all until you add in that plus after the basename match
  part of the regex:
  
  ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$
  
  However, your regex (with the plus) also matches these strings:
  
  09VolunteerApplication.docp
  09VolunteerApplication.docj
  09VolunteerApplication.doc|-- note it's matching the literal bar
  character
  
  Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|
  docx|csv|xls)$) means the regex works as you expect. Square brackets in
  a regex match a range, not a literal string, and without any sort of
  modifier, match only a single instance of that range. So in your
  example, you're matching a 4 character extension containing any of the
  following characters '|cdfgjlnopstx', and a basename containing only 1
  character that is either an a-z or a number.
  
 
 You are right, after a few other tests I stand corrected. My apologies.
 However according to the documentation for filter_var() and the PCRE
 regexp option if it returns false, which it is, this is indicating an
 error with the regex.
 
 In addition to this I would like to point out that the same regex using
 the older preg_match() function works as it should while the character
 class following by the pattern (+) fails the validation portion of the
 regex.
 
 print_r(var_dump(filter_var('09VolunteerApplication.doc',
 FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di');
 
 returns false (invalid regex) when using the character matching class
 [a-z0-9]+ with the filter_var() function with the FILTER_VALIDATE_REGEXP
 option
 
 print_r(var_dump(preg_match('/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/i',
 '09VolunteerApplication.doc')));
 
 return int(1) indicating a valid regex as well as a valid match.
 
 I believe this should be reported as a bug but I appreciate your
 assistance and insights.
 
 


Remove the {1,4} bit, as you're looking for 4 extensions. It's a valid
regex sure, but not the regex to match what you're looking for.


Re: [PHP] filter_var using regex

2011-05-04 Thread Ashley Sheridan
On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote:

 I am running into a problem using the REGEXP option with filter_var().
 
 The string I am using: 09VolunteerApplication.doc
 The PCRE regex I am using:
 /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di
 
 The function in it's entirety:
 return (!filter_var('09VolunteerApplication.doc',
 FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di'
 ? false : true;
 
 Anyone have any insight into this?
 


You missed a + in your regex, at the moment you're only checking to see
if a file starts with a single a-z or number and then is followed by the
period. Then you're checking for oddly for one to four extensions in the
list, are you sure you want to do that? And the square brackets are used
to match characters, not strings, use the standard brackets to allow
from a choice of strings

Try this:

'/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di'

One other thing you should be aware of maybe, filenames won't always
consist of just the letters a-z and numbers 0-9, they may contain
accented or foreign letters, hyphens, spaces and a number of other
characters depending on the client machines OS. Windows allows very few
characters for example compared to the Unix-like OS's like MacOS and
Linux.

-- 
Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] filter_var using regex

2011-05-04 Thread Jason Gerfen
On 05/04/2011 01:27 PM, Ashley Sheridan wrote:
 On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote:
 
 I am running into a problem using the REGEXP option with filter_var().

 The string I am using: 09VolunteerApplication.doc
 The PCRE regex I am using:
 /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di

 The function in it's entirety:
 return (!filter_var('09VolunteerApplication.doc',
 FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di'
 ? false : true;

 Anyone have any insight into this?

 
 
 You missed a + in your regex, at the moment you're only checking to see
 if a file starts with a single a-z or number and then is followed by the
 period. Then you're checking for oddly for one to four extensions in the
 list, are you sure you want to do that? And the square brackets are used
 to match characters, not strings, use the standard brackets to allow
 from a choice of strings
 
 Try this:
 
 '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di'
 
 One other thing you should be aware of maybe, filenames won't always
 consist of just the letters a-z and numbers 0-9, they may contain
 accented or foreign letters, hyphens, spaces and a number of other
 characters depending on the client machines OS. Windows allows very few
 characters for example compared to the Unix-like OS's like MacOS and
 Linux.
 

Both are valid PCRE regex's. However the rules regarding usage of
parenthesis for an XOR string does not explain a similar regex being
used with the filter_var() like so:

return (filter_var('kc-1', FILTER_VALIDATE_REGEXP,
array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di')))
? true : false;

The above returns string(4) kc-1

Another test using the following works similarly:

return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
true : false;

The above returns string(8) u0368839

And
return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
true : false;

returns string(8) gp123456

As you can see these three examples use the start [] as XOR conditionals
for multiple strings as prefixes.



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] filter_var using regex

2011-05-04 Thread Ashley Sheridan
On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote:

 On 05/04/2011 01:27 PM, Ashley Sheridan wrote:
  On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote:
  
  I am running into a problem using the REGEXP option with filter_var().
 
  The string I am using: 09VolunteerApplication.doc
  The PCRE regex I am using:
  /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di
 
  The function in it's entirety:
  return (!filter_var('09VolunteerApplication.doc',
  FILTER_VALIDATE_REGEXP,
  array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di'
  ? false : true;
 
  Anyone have any insight into this?
 
  
  
  You missed a + in your regex, at the moment you're only checking to see
  if a file starts with a single a-z or number and then is followed by the
  period. Then you're checking for oddly for one to four extensions in the
  list, are you sure you want to do that? And the square brackets are used
  to match characters, not strings, use the standard brackets to allow
  from a choice of strings
  
  Try this:
  
  '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di'
  
  One other thing you should be aware of maybe, filenames won't always
  consist of just the letters a-z and numbers 0-9, they may contain
  accented or foreign letters, hyphens, spaces and a number of other
  characters depending on the client machines OS. Windows allows very few
  characters for example compared to the Unix-like OS's like MacOS and
  Linux.
  
 
 Both are valid PCRE regex's. However the rules regarding usage of
 parenthesis for an XOR string does not explain a similar regex being
 used with the filter_var() like so:
 
 return (filter_var('kc-1', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di')))
 ? true : false;
 
 The above returns string(4) kc-1
 
 Another test using the following works similarly:
 
 return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
 true : false;
 
 The above returns string(8) u0368839
 
 And
 return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
 array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
 true : false;
 
 returns string(8) gp123456
 
 As you can see these three examples use the start [] as XOR conditionals
 for multiple strings as prefixes.
 
 
 


Not quite, you think they match correctly because that's all you're
testing for, and you're not looking for anything that might disprove
that. Using your last example, it will also match these strings:

gu0368839
xx0368839
p0368839


I tested your first regex with '09VolunteerApplication.doc' and it
doesn't work at all until you add in that plus after the basename match
part of the regex:

^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$

However, your regex (with the plus) also matches these strings:

09VolunteerApplication.docp
09VolunteerApplication.docj
09VolunteerApplication.doc|-- note it's matching the literal bar
character

Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|
docx|csv|xls)$) means the regex works as you expect. Square brackets in
a regex match a range, not a literal string, and without any sort of
modifier, match only a single instance of that range. So in your
example, you're matching a 4 character extension containing any of the
following characters '|cdfgjlnopstx', and a basename containing only 1
character that is either an a-z or a number.

-- 
Thanks,
Ash
http://www.ashleysheridan.co.uk