Re: Unclear behaviour of formencode.Schema with UnicodeString items

Ian Wilson Thu, 10 Feb 2011 18:10:31 -0800

After thinking about this more it would probably be better if UnicodeString
just had a parameter that turned coercion of everything but instances of
basestring and None off.  Or maybe another validator was created that did
that.  It makes more sense for the validator that is assigned that key to
respond to both lists and non-lists.


What outcome do you actually want?  Only the first or last thing from the
list to get validated? Or an error if there is more than one thing ?

Here is a validator I hacked together from the String and UnicodeString
validators that only accepts None and basestrings.  None and non-unicode
strings are both converted to unicode.

class StrictUnicodeString(FancyValidator):
    """"""

    min = None
    max = None
    not_empty = None
    convert_none = True
    encoding = 'utf-8'

    messages = {
        'notString': "Please enter a string",
        'tooLong': "Enter a value less than %(max)i characters long",
        'tooShort': "Enter a value %(min)i characters long or more",
        'badEncoding' : "Invalid data or incorrect encoding",
    }

    def __initargs__(self, new_attrs):
        if self.not_empty is None and self.min:
            self.not_empty = True

    def __init__(self, input_encoding=None, output_encoding=None,
            convert_none=True, **kw):
        FancyValidator.__init__(self, **kw)
        self.input_encoding = input_encoding or self.encoding
        self.output_encoding = output_encoding or self.encoding
        self.convert_none = convert_none

    def _to_python(self, value, state):
        """ Converts to unicode. """
        if self.convert_none and value is None:
            value = u''
        if not isinstance(value, basestring):
            raise Invalid(self.message('notString', state), value, state)
        if not isinstance(value, unicode):
            try:
                value = unicode(value, self.input_encoding)
            except UnicodeDecodeError:
                raise Invalid(self.message('badEncoding', state), value,
state)
        return value

    def _from_python(self, value, state):
        """ Converts to a bytestring. """
        if not isinstance(value, unicode):
            if hasattr(value, '__unicode__'):
                value = unicode(value)
            else:
                value = str(value)
        if isinstance(value, unicode):
            value = value.encode(self.output_encoding)
        return value

    def validate_other(self, value, state):
        if (self.max is not None and value is not None
            and len(value) > self.max):
            raise Invalid(self.message('tooLong', state,
                                       max=self.max),
                          value, state)
        if (self.min is not None
            and (not value or len(value) < self.min)):
            raise Invalid(self.message('tooShort', state,
                                       min=self.min),
                          value, state)

    def empty_value(self, value):
        return u''

-Ian


On Thu, Feb 10, 2011 at 2:52 PM, Maxim Avanov <[email protected]>wrote:

> Hi, Ian. Thanks for reply.
>
> >If you want to be EXTRA strict then you could try ConfirmType and
> UnicodeString combined in an All validator to catch this error. Or something
> to that effect.
>
> This solution will work, but I wouldn't like to use it for several
> reasons. First of all, we already have a huge code base that
> intensively uses UnicodeStrings.  Surely, we could define our own
> UnicodeString validator like below,
>
> UnicodeString =
> formencode.All(formencode.ConfirmType(subclass=unicode),
> formencode.UnicodeString())
>
> but we trying to keep our design clean.  Moreover, "All" and
> "ConfirmType" require extra validation steps (and hence more function
> calls). I think it has to be solved in more generic and concise way
> (i.e. in formencode's internal api level).
>
> > Also note that someone could actually send ?username=[u'John', u'Mike']
> in
> > the query which would exhibit similar behavior.
>
> Yes, and it is the place where the inconsistency of validators
> behaviour comes. if we'd have an unified behaviour for all single-
> value validators (Int, UnicodeString, Bool etc.) we could get
> "[u'John', u'Mike']" result only for "/?username=[u'John', u'Mike']"
> request. But now, we can get the same result by the two different
> requests - by "/?username=[u'John', u'Mike']" and by "/?
> username=John&username=Mike". This shouldn't be allowed. And it's not
> allowed for all single-value validators except the UnicodeString.
>
> > If we don't use mixed then how do we get the multiple values when we want
> > them?
>
> We might explicitly specify our expectations with ForEach() and Set()
> validators.
> formencode's FancyValidator could internally test currently running
> validator by calling something like
>
> isinstance(current_validator, ForEach)
>
> and then perform appropriate actions for this case (i.e. call single-
> value validator for each found item with the same key).
>
> P.S. I hope Ian Bicking will see this topic and give his opinion all
> about this, as I might miss something important here.
>
>
> On Feb 9, 6:38 am, Ian Wilson <[email protected]> wrote:
> > Hi,
> >
> > I think this behavior happens because mixed is used herehttps://
> bitbucket.org/ianb/formencode/src/d95237b33f3c/formencode/api....
> > If you don't want that to happen ever then I think you need to cast
> params
> > to a regular dictionary, with something like
> dict(request.params.items()).
> > This will silently ignore one of the names though which might be worse.
> >
> > If you want to be EXTRA strict then you could try ConfirmType and
> > UnicodeString combined in an All validator to catch this error.  Or
> > something to that effect.
> >
> > Also note that someone could actually send ?username=[u'John', u'Mike']
> in
> > the query which would exhibit similar behavior.  So as far as I can tell
> if
> > that is a problem you'd need to validate it either way.
> >
> > I agree that this might be misleading but its a difficult problem to
> solve.
> > If we don't use mixed then how do we get the multiple values when we want
> > them?  I think formencode might just need better internal integration
> with
> > multiple value dictionaries so that different types don't show up
> depending
> > on the input.  It tries to be input agnostic though.
> >
> > -Ian
> >
> > On Tue, Feb 8, 2011 at 10:25 AM, Maxim Avanov <[email protected]
> >wrote:
> >
> >
> >
> >
> >
> > > Here's an example.
> >
> > > # =====================
> > > from formencode import Schema, Invalid
> > > from formencode.validators import UnicodeString, Int
> > > from webob import Request
> >
> > > class StrictSchema(Schema):
> > >    allow_extra_fields = False
> >
> > > class IntegerTestSchema(StrictSchema):
> > >    testfield = Int(not_empty=True)
> >
> > > class StringTestSchema(StrictSchema):
> > >    testfield = UnicodeString(not_empty=True)
> >
> > > # Testing.
> > > # =====================
> > > req = Request.blank('/?testfield=111')
> > > print IntegerTestSchema.to_python(req.params)
> >
> > > # This raises an exception
> > > req = Request.blank('/?testfield=111&testfield=222')
> > > try:
> > >    IntegerTestSchema.to_python(req.params)
> > > except Invalid as e:
> > >    print "Caught Exception: {0}".format(e)
> >
> > > req = Request.blank('/?testfield=aaa')
> > > print StringTestSchema.to_python(req.params)
> >
> > > # This will be passed successfully (!)
> > > # The output will be {'testfield': u"[u'aaa', u'bbb']"}
> > > req = Request.blank('/?testfield=aaa&testfield=bbb')
> > > print StringTestSchema.to_python(req.params)
> >
> > > # ========================
> >
> > > Please note we do not use formencode.ForEach() or formencode.Set()
> > > here. I think this is very unclear behaviour.
> > > Imagine an UsernameValidator (or something related to "not-so-strict-
> > > string-validator"). Instead of indicating an input error, we show the
> > > service realization details to our users -- "{'username': u"[u'John',
> > > u'Mike']"}" - "Ok. This is Python list inside the dict".
> >
> > > According to WebOb documentation (http://pythonpaste.org/webob/
> > > #multidict), we probably should use request.GET.getone() instead of
> > > request.GET.getall().
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "pylons-discuss" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to
> > > [email protected].
> > > For more options, visit this group at
> > >http://groups.google.com/group/pylons-discuss?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pylons-discuss" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/pylons-discuss?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en.

Re: Unclear behaviour of formencode.Schema with UnicodeString items

Reply via email to