On Feb 3, 2011, at 3:03 PM, Ken wrote:
> 
> You are right. Having (re)read the documentation for re, I find that
> it is working as advertised. My original regex was wrong. However, I
> would argue that if the match found by regex.match() is different from
> the input value, IS_MATCH should return an error. That is, in the
> IS_MATCH.__call__ definition, "if match:" should be "if match and
> (value == match.group():". That change would raise an error that would
> force a user like me to correct a regex that was matching in an
> unexpected way. I would never want IS_MATCH to silently change data
> between a form and insertion into a database.

IS_MATCH is already implicitly anchored at the beginning of the field, since it 
uses re.match. I think it'd make sense to implicitly anchor at the end as well. 

We could change this:

        self.regex = re.compile(expression)

to this:

        self.regex = re.compile('(%s)$' % expression)


> 
> Ken
> 
> On Feb 2, 9:13 pm, Massimo Di Pierro <[email protected]>
> wrote:
>> This is the correct behavio of regular expressions. Anyway, good that
>> you are pointing this out since others may find it counter intuitive.
>> 
>> Massimo
>> 
>> On Feb 2, 6:33 pm, Ken <[email protected]> wrote:> I have been having trouble 
>> with truncation of data from one field of a
>>> form. The culprit turned out to be the IS_MATCH() validator, which was
>>> truncating a valid value to return a shorter valid value. I'm not sure
>>> whether to call this a bug or just unexpected behavior, but if I had
>>> trouble with it, someone else may.
>> 
>>> The data in question were spreadsheet-style coordinate values with
>>> letters for rows and numbers for columns, in the range A1 to J10.
>>> Initially, I used a validator like IS_MATCH('^[A-J][1-9]|[A-J]10$').
>>> This checks first for the two-character combinations A1 to J9, then
>>> checks for A10 to J10. If I test this in a web2py shell, it accepts
>>> and returns the two-character combinations, but it accepts and
>>> truncates any values ending in 10.
>> 
>>> In [1] : vdtr = IS_MATCH('^[A-J][1-9]|[A-J]10$')
>> 
>>> In [2] : vdtr('A1')
>>> ('A1', None)
>> 
>>> In [3] : vdtr('J1')
>>> ('J1', None)
>> 
>>> In [4] : vdtr('A10')
>>> ('A1', None)
>> 
>>> In [5] : vdtr('J10')
>>> ('J1', None)
>> 
>>> It seems to me that A1 and J1 are not proper matches because the '1'
>>> does not appear at the end of the validated string. In any case, I am
>>> surprised that IS_MATCH() would modify a value under any
>>> circumstances.
>> 
>>> If I turn the regex around, so that it tests for the three-character
>>> combinations first, like IS_MATCH('^[A-J]10|[A-J][1-9]$'), then things
>>> work better.
>> 
>>> In [6] : vdtr = IS_MATCH('^[A-J]10|[A-J][1-9]$')
>> 
>>> In [7] : vdtr('A1')
>>> ('A1', None)
>> 
>>> In [8] : vdtr('J1')
>>> ('J1', None)
>> 
>>> In [9] : vdtr('A10')
>>> ('A10', None)
>> 
>>> In [10] : vdtr('J10')
>>> ('J10', None)
>> 
>> 


Reply via email to