Re: [Wikidata-tech] Improving the ValueValidator interface

Daniel Kinzler Wed, 12 Jun 2013 01:07:57 -0700

Am 12.06.2013 05:25, schrieb Jeroen De Dauw:
> Hey,
> You raise some good questions.
> 
>     * why do we need the setOptions() method? Shouldn't the validator be fully
>     configured in the constructor?
> 
> 
> We certainly do not need it. The question is why the interface has it. Looking
> at current usage, I see no valid reason to have it, as the users are following
> poor design practices.


I suggest to deprecate it, then.

>     * why are errors collected in a member? I would expect a validator to be
>     stateless and reusable.
>     * why do we need the Result object? Can't we just throw an exception? The 
> "value
>     with warnings" use case for MediaWiki's status object doesn't seem to 
> apply
>     here.
> 
> 
> If there is a result object, having a field that contains state for that is
> perfectly reasonable, and in this case IMO preferred (note how this state is 
> not
> exposed outside of the class).

I strongly disagree. Making the validator stateful is asking for trouble. In
order to re-use it, you have to re-set it. And if you happen to re-use it when
recursively validating a data structure, your state gets messed up.

Collecting the state in a member means using a member to pass information
between function scopes. That is a very bad idea.

It would be different if the validator was it's own result object, that is, if
for every validation, you'd create a new validator instance, use it on the data,
and in the end as the validator itself if it's in the OK state. When conflating
the validator with the result that way, the validator would be stateful, and
should have such a member variable.

> I know I've been shouting "use Exceptions rather then returning null/false or
> potentially invalid objects" lately; this is a good example of why one should
> not apply such a "rule" blindly, as it does not apply everywhere. Exceptions
> should not be used to implement control flow for expected situations. If one 
> has
> a simple isValid method, returning a boolean makes perfect sense. Returning
> true, or throwing an exception would be wrong. In case of the ValueValidators
> just returning a boolean would not be expressive enough, as the information of
> which checks (note the plural here) failed is desired, so the user can be
> informed of why the value did not pass validation.

I have no strong feelings about this. I think an exception would be the easiest
and cleanest solution, but perhaps not the nicest and most powerful.

The advantage of a result object is that it can collect and report multiple 
issues.

The disadvantage is your validators can not "stack": e.g. with exceptions, the
first validator could check that $x is a string, and the second validator could
check that $x matches some regular expression. If you try to collect all
outcomes of all validators, the second one in this example with cause a warning
or runtime exception if $x is not a string.

Also, reporting multiple errors at once is often not easy to do in a meaningful
way, but that's just a UI issue.

Apropos... how can we internationalize the errors reported by validators? Should
errors use well defined string keys? Or do we inject some kind of error factory?
After all, validation failures often result from user interaction, and thus
should be reported in a human-friendly form.

>     * There is a lot of "convenient stuff" implemented in the base class 
> which i
>     expect many implementations are not going to need, e.g. the
>     allowdValues/forbiddenValues stuff.
> 
> 
> This is certainly poor design. Given the whole ValueValidators resulted from
> refactoring some awful code in which every class did a dozen different things,
> I'm mildly surprised there are not more such artefacts left. Ideally we could
> just kill this one, but like with most legacy code, we can't do this without
> breaking existing functionality.

I propose to not use the ValueValidatorObject base class for Wikibase. We could
make our own, but if we stick to "trivial" validators, we don't even need that.

>      * options are passed to sub-validators using an option map, changing the
>     configuration and thus behavior of the (sub-)validators on the fly. That 
> seems
>     like asking for trouble.
> 
> Agree.

So, let's at least not do this in Wikibase.

>     * the ValueValidator interface has two methods, validate() and 
> canValidate().
>     validate() can throw a validation exception, which contains one or more 
> errors.
> 
> 
> What would this canValidate method take as parameter? Can you give some 
> concrete
> examples of how this would be used in the Wikibase context?

For recursive validation of complex structures. This would follow the pattern
that Tobi S. suggested for serialization/unserialization.

If we wanted to validate an entity as a recursive structure, applying validators
to Claims, Snaks, DataValues, etc recursively, we could do this by having a
registry of validators, where each validator knows which kind of object it can
validate. For each object encountered in the structure, we'd ask each validator
if it can validate that kind of object, and apply it if so.

The issue here though is that we'd need to pass the top level validator to each
"inner" validator, either in the constructor or as an additional method
argument, to allow the top-level validator (which knows all the specialized
validators) to be applied to sub.structures recursively.

I think this could be useful, but we don't have an immediate need for this at
the moment. So I guess we can just leave it.

> As already mentioned, -2 on using exceptions here.
> 
>     * Validators should generally implement only a single constraint - so 
> instead of
>     a String validator that implements min/max length, min/max value, and a 
> regex
>     check, each of these checks should have their own validator; A
>     CompositeValidator could be used to bundle multiple validations, if 
> needed.
> 
> 
> I find that proposal somewhat surprising coming from you :) 

I suppose the "pragmatic" coding style I got into working on MediaWiki is not
representative of my design preferences :) Maybe I should dig up some of my old
Java libs full of nicely chopped and diced functionality to compose and 
re-use...

> The current
> ValueValidators are designed for a use case where there is a single such
> validator per "parameter type" (ie boolean, float, string, etc). 

Fair enough for a baseline, but the StringValidator should probably not
implement any and all validations that can be applied to a string.

> The
> ValueValidator interface itself does not specify anything about this though, 
> so
> one is entirely free to make more low level validators where this makes sense.

Of course - in the second section of my mail, I was talking about
ValueValidatorsObject, not the interface.

>     As far as I can see, the Validator interface isn't yet widely used, so it 
> would
>     be easy enough to change these things.
> 
> 
> It is used by the ParamProcessor extension, and by extension by everything 
> that
> uses that. 
[...]
> Perhaps a good mix of avoiding legacy crud and pragmatism is to create fresh
> implementations of the interface for our usage. Nothing forces us to make use 
> of
> the existing ones, for which we do not have all that much usage in the 
> Wikidata
> context to begin with. The only legacy thing we'd have is the setOptions 
> method,
> which we can easily implement by a method throwing an exception. (And no, this
> is in fact not a LSP violation.) Compared to the MediaWiki core legacy code we
> are dealing with, that really is nothing.


I agree: we should stick with the ValueValidator interface, but we should not
use ValueValidatorObject or StringValidator in Wikibase. Not sure about the
other default validators, will have to look at them in detail.

I also suggest to deprecate much of the functionality of ValueValidatorObject
along with the setOptions method, but that has little to do with Wikibase.

Implementing setOptions() to throw an exception may work, but I'm curious how
that is not an LSP violation. In languages that enforce declaration of
exceptions thrown, this kind of violation is even forbidden by the language.

I think in our case, setOptions() should just do nothing. After all, it's
defined to apply the options known/supported by the respective validator - which
in our case would ju8st always be none.

> The one concern I have with creating an alternative is that we will likely be
> making use of this existing code if we share result format stuff with SMW. I'd
> be thus unfortunate to duplicate this functionally for us (and it'll be
> unfortunate in general either way). However if we end up with something clean
> and reusable, and keep the existing usecase in mind, we ought to not have real
> troubles later on using it for these existing usecases. Given the reuse of the
> existing code for us right now is limited anyway, I thus certainly do not 
> object
> to a fresh start approach either.

Yea, I think that would work - the only trouble I see is naming conflicts. Do we
create a NewBooleanValidator? OR do we rely on namespaces?

-- daniel


_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] Improving the ValueValidator interface

Reply via email to