Re: [PHP] filter_var using regex
On 05/04/2011 03:10 PM, Ashley Sheridan wrote: On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote: On 05/04/2011 01:27 PM, Ashley Sheridan wrote: On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: I am running into a problem using the REGEXP option with filter_var(). The string I am using: 09VolunteerApplication.doc The PCRE regex I am using: /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di The function in it's entirety: return (!filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di' ? false : true; Anyone have any insight into this? You missed a + in your regex, at the moment you're only checking to see if a file starts with a single a-z or number and then is followed by the period. Then you're checking for oddly for one to four extensions in the list, are you sure you want to do that? And the square brackets are used to match characters, not strings, use the standard brackets to allow from a choice of strings Try this: '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' One other thing you should be aware of maybe, filenames won't always consist of just the letters a-z and numbers 0-9, they may contain accented or foreign letters, hyphens, spaces and a number of other characters depending on the client machines OS. Windows allows very few characters for example compared to the Unix-like OS's like MacOS and Linux. Both are valid PCRE regex's. However the rules regarding usage of parenthesis for an XOR string does not explain a similar regex being used with the filter_var() like so: return (filter_var('kc-1', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di'))) ? true : false; The above returns string(4) kc-1 Another test using the following works similarly: return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; The above returns string(8) u0368839 And return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; returns string(8) gp123456 As you can see these three examples use the start [] as XOR conditionals for multiple strings as prefixes. Not quite, you think they match correctly because that's all you're testing for, and you're not looking for anything that might disprove that. Using your last example, it will also match these strings: gu0368839 xx0368839 p0368839 I tested your first regex with '09VolunteerApplication.doc' and it doesn't work at all until you add in that plus after the basename match part of the regex: ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$ However, your regex (with the plus) also matches these strings: 09VolunteerApplication.docp 09VolunteerApplication.docj 09VolunteerApplication.doc|-- note it's matching the literal bar character Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png| docx|csv|xls)$) means the regex works as you expect. Square brackets in a regex match a range, not a literal string, and without any sort of modifier, match only a single instance of that range. So in your example, you're matching a 4 character extension containing any of the following characters '|cdfgjlnopstx', and a basename containing only 1 character that is either an a-z or a number. You are right, after a few other tests I stand corrected. My apologies. However according to the documentation for filter_var() and the PCRE regexp option if it returns false, which it is, this is indicating an error with the regex. Here are the changes I have made: print_r(var_dump(filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di'); I appreciate your assistance and insights. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] filter_var using regex
On 05/04/2011 03:10 PM, Ashley Sheridan wrote: On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote: On 05/04/2011 01:27 PM, Ashley Sheridan wrote: On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: I am running into a problem using the REGEXP option with filter_var(). The string I am using: 09VolunteerApplication.doc The PCRE regex I am using: /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di The function in it's entirety: return (!filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di' ? false : true; Anyone have any insight into this? You missed a + in your regex, at the moment you're only checking to see if a file starts with a single a-z or number and then is followed by the period. Then you're checking for oddly for one to four extensions in the list, are you sure you want to do that? And the square brackets are used to match characters, not strings, use the standard brackets to allow from a choice of strings Try this: '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' One other thing you should be aware of maybe, filenames won't always consist of just the letters a-z and numbers 0-9, they may contain accented or foreign letters, hyphens, spaces and a number of other characters depending on the client machines OS. Windows allows very few characters for example compared to the Unix-like OS's like MacOS and Linux. Both are valid PCRE regex's. However the rules regarding usage of parenthesis for an XOR string does not explain a similar regex being used with the filter_var() like so: return (filter_var('kc-1', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di'))) ? true : false; The above returns string(4) kc-1 Another test using the following works similarly: return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; The above returns string(8) u0368839 And return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; returns string(8) gp123456 As you can see these three examples use the start [] as XOR conditionals for multiple strings as prefixes. Not quite, you think they match correctly because that's all you're testing for, and you're not looking for anything that might disprove that. Using your last example, it will also match these strings: gu0368839 xx0368839 p0368839 I tested your first regex with '09VolunteerApplication.doc' and it doesn't work at all until you add in that plus after the basename match part of the regex: ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$ However, your regex (with the plus) also matches these strings: 09VolunteerApplication.docp 09VolunteerApplication.docj 09VolunteerApplication.doc|-- note it's matching the literal bar character Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png| docx|csv|xls)$) means the regex works as you expect. Square brackets in a regex match a range, not a literal string, and without any sort of modifier, match only a single instance of that range. So in your example, you're matching a 4 character extension containing any of the following characters '|cdfgjlnopstx', and a basename containing only 1 character that is either an a-z or a number. You are right, after a few other tests I stand corrected. My apologies. However according to the documentation for filter_var() and the PCRE regexp option if it returns false, which it is, this is indicating an error with the regex. In addition to this I would like to point out that the same regex using the older preg_match() function works as it should while the character class following by the pattern (+) fails the validation portion of the regex. print_r(var_dump(filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di'); returns false (invalid regex) when using the character matching class [a-z0-9]+ with the filter_var() function with the FILTER_VALIDATE_REGEXP option print_r(var_dump(preg_match('/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/i', '09VolunteerApplication.doc'))); return int(1) indicating a valid regex as well as a valid match. I believe this should be reported as a bug but I appreciate your assistance and insights. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] filter_var using regex
On Thu, 2011-05-05 at 13:39 -0600, Jason Gerfen wrote: On 05/04/2011 03:10 PM, Ashley Sheridan wrote: On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote: On 05/04/2011 01:27 PM, Ashley Sheridan wrote: On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: I am running into a problem using the REGEXP option with filter_var(). The string I am using: 09VolunteerApplication.doc The PCRE regex I am using: /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di The function in it's entirety: return (!filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di' ? false : true; Anyone have any insight into this? You missed a + in your regex, at the moment you're only checking to see if a file starts with a single a-z or number and then is followed by the period. Then you're checking for oddly for one to four extensions in the list, are you sure you want to do that? And the square brackets are used to match characters, not strings, use the standard brackets to allow from a choice of strings Try this: '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' One other thing you should be aware of maybe, filenames won't always consist of just the letters a-z and numbers 0-9, they may contain accented or foreign letters, hyphens, spaces and a number of other characters depending on the client machines OS. Windows allows very few characters for example compared to the Unix-like OS's like MacOS and Linux. Both are valid PCRE regex's. However the rules regarding usage of parenthesis for an XOR string does not explain a similar regex being used with the filter_var() like so: return (filter_var('kc-1', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di'))) ? true : false; The above returns string(4) kc-1 Another test using the following works similarly: return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; The above returns string(8) u0368839 And return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; returns string(8) gp123456 As you can see these three examples use the start [] as XOR conditionals for multiple strings as prefixes. Not quite, you think they match correctly because that's all you're testing for, and you're not looking for anything that might disprove that. Using your last example, it will also match these strings: gu0368839 xx0368839 p0368839 I tested your first regex with '09VolunteerApplication.doc' and it doesn't work at all until you add in that plus after the basename match part of the regex: ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$ However, your regex (with the plus) also matches these strings: 09VolunteerApplication.docp 09VolunteerApplication.docj 09VolunteerApplication.doc|-- note it's matching the literal bar character Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png| docx|csv|xls)$) means the regex works as you expect. Square brackets in a regex match a range, not a literal string, and without any sort of modifier, match only a single instance of that range. So in your example, you're matching a 4 character extension containing any of the following characters '|cdfgjlnopstx', and a basename containing only 1 character that is either an a-z or a number. You are right, after a few other tests I stand corrected. My apologies. However according to the documentation for filter_var() and the PCRE regexp option if it returns false, which it is, this is indicating an error with the regex. In addition to this I would like to point out that the same regex using the older preg_match() function works as it should while the character class following by the pattern (+) fails the validation portion of the regex. print_r(var_dump(filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di'); returns false (invalid regex) when using the character matching class [a-z0-9]+ with the filter_var() function with the FILTER_VALIDATE_REGEXP option print_r(var_dump(preg_match('/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/i', '09VolunteerApplication.doc'))); return int(1) indicating a valid regex as well as a valid match. I believe this should be reported as a bug but I appreciate your assistance and insights. Remove the {1,4} bit, as you're looking for 4 extensions. It's a valid regex sure, but not the regex to match what you're looking for.
Re: [PHP] filter_var using regex
On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: I am running into a problem using the REGEXP option with filter_var(). The string I am using: 09VolunteerApplication.doc The PCRE regex I am using: /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di The function in it's entirety: return (!filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di' ? false : true; Anyone have any insight into this? You missed a + in your regex, at the moment you're only checking to see if a file starts with a single a-z or number and then is followed by the period. Then you're checking for oddly for one to four extensions in the list, are you sure you want to do that? And the square brackets are used to match characters, not strings, use the standard brackets to allow from a choice of strings Try this: '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' One other thing you should be aware of maybe, filenames won't always consist of just the letters a-z and numbers 0-9, they may contain accented or foreign letters, hyphens, spaces and a number of other characters depending on the client machines OS. Windows allows very few characters for example compared to the Unix-like OS's like MacOS and Linux. -- Thanks, Ash http://www.ashleysheridan.co.uk
Re: [PHP] filter_var using regex
On 05/04/2011 01:27 PM, Ashley Sheridan wrote: On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: I am running into a problem using the REGEXP option with filter_var(). The string I am using: 09VolunteerApplication.doc The PCRE regex I am using: /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di The function in it's entirety: return (!filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di' ? false : true; Anyone have any insight into this? You missed a + in your regex, at the moment you're only checking to see if a file starts with a single a-z or number and then is followed by the period. Then you're checking for oddly for one to four extensions in the list, are you sure you want to do that? And the square brackets are used to match characters, not strings, use the standard brackets to allow from a choice of strings Try this: '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' One other thing you should be aware of maybe, filenames won't always consist of just the letters a-z and numbers 0-9, they may contain accented or foreign letters, hyphens, spaces and a number of other characters depending on the client machines OS. Windows allows very few characters for example compared to the Unix-like OS's like MacOS and Linux. Both are valid PCRE regex's. However the rules regarding usage of parenthesis for an XOR string does not explain a similar regex being used with the filter_var() like so: return (filter_var('kc-1', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di'))) ? true : false; The above returns string(4) kc-1 Another test using the following works similarly: return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; The above returns string(8) u0368839 And return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; returns string(8) gp123456 As you can see these three examples use the start [] as XOR conditionals for multiple strings as prefixes. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] filter_var using regex
On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote: On 05/04/2011 01:27 PM, Ashley Sheridan wrote: On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote: I am running into a problem using the REGEXP option with filter_var(). The string I am using: 09VolunteerApplication.doc The PCRE regex I am using: /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di The function in it's entirety: return (!filter_var('09VolunteerApplication.doc', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di' ? false : true; Anyone have any insight into this? You missed a + in your regex, at the moment you're only checking to see if a file starts with a single a-z or number and then is followed by the period. Then you're checking for oddly for one to four extensions in the list, are you sure you want to do that? And the square brackets are used to match characters, not strings, use the standard brackets to allow from a choice of strings Try this: '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di' One other thing you should be aware of maybe, filenames won't always consist of just the letters a-z and numbers 0-9, they may contain accented or foreign letters, hyphens, spaces and a number of other characters depending on the client machines OS. Windows allows very few characters for example compared to the Unix-like OS's like MacOS and Linux. Both are valid PCRE regex's. However the rules regarding usage of parenthesis for an XOR string does not explain a similar regex being used with the filter_var() like so: return (filter_var('kc-1', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di'))) ? true : false; The above returns string(4) kc-1 Another test using the following works similarly: return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; The above returns string(8) u0368839 And return (filter_var('u0368839', FILTER_VALIDATE_REGEXP, array('options'=array('regexp'='/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ? true : false; returns string(8) gp123456 As you can see these three examples use the start [] as XOR conditionals for multiple strings as prefixes. Not quite, you think they match correctly because that's all you're testing for, and you're not looking for anything that might disprove that. Using your last example, it will also match these strings: gu0368839 xx0368839 p0368839 I tested your first regex with '09VolunteerApplication.doc' and it doesn't work at all until you add in that plus after the basename match part of the regex: ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$ However, your regex (with the plus) also matches these strings: 09VolunteerApplication.docp 09VolunteerApplication.docj 09VolunteerApplication.doc|-- note it's matching the literal bar character Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png| docx|csv|xls)$) means the regex works as you expect. Square brackets in a regex match a range, not a literal string, and without any sort of modifier, match only a single instance of that range. So in your example, you're matching a 4 character extension containing any of the following characters '|cdfgjlnopstx', and a basename containing only 1 character that is either an a-z or a number. -- Thanks, Ash http://www.ashleysheridan.co.uk