sorry, took a bit longer than I wanted, but here it is. thanks for looking into this! :) at first glance it seemed like it's more straight-forward than I expected - but as always with JSONSchema.pm, some rabbit-hole chasing later I am not quite so sure anymore. I hope my walls of text below are not too confusing ;)
one sentence summary: to make this workable with the current approach, we need to switch all calls of check_format and parse_property_string that use a registered format to use the format's ID instead of the format description hash. On April 16, 2020 4:39 pm, Stefan Reiter wrote: > Adds a third, optional parameter to register_format that allows specifying > a function that will be called after parsing and can validate the parsed > data. A validator should die on failed validation, and can also change the > parsed object by returning a modified version of it. > > This is useful so one can register a format with its hash, thus allowing > documentation to be generated automatically, while still enforcing certain > validation rules. > > And since I found it impossible to extend the current check_format code, > clean that function up as well (which pretty much amounts to a rewrite). > Also use get_format consistently to avoid future breakages. > > No existing functionality is intentionally changed. > > Signed-off-by: Stefan Reiter <s.rei...@proxmox.com> > --- > > @Fabian G.: Since we discussed this off-list, is this good? At least there > shouldn't be a possibility for a parser/hash format drift the way I've > implemented it. > > src/PVE/JSONSchema.pm | 60 +++++++++++++++++++++++++++---------------- > 1 file changed, 38 insertions(+), 22 deletions(-) > > diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm > index 01a3cce..106a812 100644 > --- a/src/PVE/JSONSchema.pm > +++ b/src/PVE/JSONSchema.pm > @@ -120,19 +120,26 @@ register_standard_option('pve-snapshot-name', { > }); > > my $format_list = {}; > +my $format_validators = {}; > > sub register_format { > - my ($format, $code) = @_; > + my ($name, $format, $validator) = @_; > > - die "JSON schema format '$format' already registered\n" > - if $format_list->{$format}; > + die "JSON schema format '$name' already registered\n" > + if $format_list->{$name}; > > - $format_list->{$format} = $code; > + if ($validator) { > + die "A \$validator function can only be specified for hash-based > formats\n" > + if ref($format) ne 'HASH'; > + $format_validators->{$name} = $validator; > + } > + > + $format_list->{$name} = $format; > } > > sub get_format { > - my ($format) = @_; > - return $format_list->{$format}; > + my ($name) = @_; > + return $format_list->{$name}; > } > > my $renderer_hash = {}; > @@ -646,13 +653,16 @@ sub pve_verify_tfa_secret { > sub check_format { > my ($format, $value, $path) = @_; > > - return parse_property_string($format, $value, $path) if ref($format) eq > 'HASH'; > return if $format eq 'regex'; > > - if ($format =~ m/^(.*)-a?list$/) { > + my $parsed; > > - my $code = $format_list->{$1}; > + if (ref($format) eq 'HASH') { > + # in case it's a hash we can't have a validator registered, so return it took me several tries to full get this comment and the implications. if we pass a format hash to check_format, then check_format can't know whether there is an associated validator since it doesn't know the name. so in effect this means we should avoid calling check_format with a hash directly, when that hash is also registered as format with an ID. this should probably be documented somewhere, as it also extends to parse_property_string (see comments there), and needs careful analysis of the code paths ending up in check_format so that no caller already resolves format ID to HASH prematurely. I think we could switch this around and start check_format with: if (ref($format) eq 'HASH') { # don't know whether a validator exists return parse_property_string($format, ...); } if (ref($format) eq 'CODE') { # we are the (sole, old-style) validator return $format->(...); } return if $format eq 'regex'; # AFAICT regex is not registered as actual format, so this should not be able to clash with the rest my ($name, $registered_format, $parsed); big existing if, but adapted to set those three variables as appropriate if (defined($name) && ref($registered_format) eq 'HASH') { my $validator = $format_validators->{$name}; $parsed = $validator->($parsed) if $validator; return $parsed; } if we want to support -opt and -a?list with hash based formats that's fine as well (and I guess it's not unlikely we'll run into that if we start converting some existing parse_/verify_foo into hash + validator). those branches then just need to check whether $registered_format is a CODE-ref (existing behaviour) or not (it's a HASH, so parse it accordingly). at that point we could probably combine the -opt and else branches, since they just differ in whether the empty value is allowed. for -a?list it is indeed tricky what the semantics of the validator should be.. does it get called on a list of verified values? what should the return value be? a list? or is the validator responsible for converting it back to the string representation before returning? note: I only found a single instance of check_format being called with the return value checked (in QemuServer.pm, in a format validator sub ;), so that can be safely dropped to just return the original verified value if check_format didn't die). if PMG does not use that functionality either, we could also just say we drop check_format returning anything, and it's just die on error (either parse_property_string, or direct validator sub, or post-parse_property_string extra validator, or undefined format) or return on success? some comments below become irrelevant if we go down that route.. that would make -a?list implementation trivial (validator gets the full verified-according-to-format-hash list, and can then do both checks on the list as a whole and each element depending on use case. > + return parse_property_string($format, $value, $path); > > + } elsif ($format =~ m/^(.*)-a?list$/) { > + my $code = get_format($1); > die "undefined format '$format'\n" if !$code; > > # Note: we allow empty lists > @@ -660,25 +670,31 @@ sub check_format { > &$code($v); nit: this should be switched to $code->($v) as well > } > > - } elsif ($format =~ m/^(.*)-opt$/) { > - > - my $code = $format_list->{$1}; > + # since the list might contain multiple values, we don't allow running > + # validator functions either > + return undef; but not because of multiple values, but because these lists here already only work with format=validator sub, and not format=hash ? see above as well.. > > + } elsif ($format =~ m/^(.*)-opt$/) { > + my $code = get_format($1); > die "undefined format '$format'\n" if !$code; > > - return if !$value; # allow empty string > - > - &$code($value); > + # empty string is allowed > + $parsed = $code->($value) if $value; so this is a bit tricky - the &$code($value) was previously the last statement of this branch, with no code after the conditional, meaning that its return value was returned by check_format. also, since we can't have a validator for a format that is a validator sub itself, shouldn't this be just return if !$value; return $code->{$value); for now? > > } else { > + my $registered_format = get_format($format); > + die "undefined format '$format'\n" if !$registered_format; > > - my $code = $format_list->{$format}; > - > - die "undefined format '$format'\n" if !$code; > - > - return parse_property_string($code, $value, $path) if ref($code) eq > 'HASH'; > - &$code($value); > + if (ref($registered_format) eq 'HASH') { > + $parsed = parse_property_string($registered_format, $value, $path); > + } else { > + $parsed = $registered_format->($value); same as above for -opt applies here as well (we can't have a validator, so we might as well make this explicit here and return early. > + } > } > + > + my $validator = $format_validators->{$format}; > + $parsed = $validator->($parsed) if $validator; > + return $parsed; > } > > sub parse_size { > @@ -735,7 +751,7 @@ sub parse_property_string { > > # Support named formats here, too: > if (!ref($format)) { > - if (my $desc = $format_list->{$format}) { > + if (my $desc = get_format($format)) { > $format = $desc; parse_property_string called directly with a format ID is rarely used (only found two hits in PVE code), but this was already missing a check for $desc being a hash ref, and is now missing support for calling the validator before returning at the end of the sub.. it would be quite unexpected that check_format('foo', 'bar', '/path') and parse_property_string('foo', 'bar', '/path') where 'foo' is a hash-ref format with new-fangled validator function behave differently. funnily enough, parse_property_string does then transitively call check_format for the individual properties, so I guess at least that stays consistent ;) note that a format that has a validator NEEDs to call parse_property_string with the format name, not the format hash, otherwise the validator does not take effect. not a very nice interface, especially given that that variant is basically unused :-/ maybe we do need to back to the drawing board? or do we take this one as a tree-wide, post 6.2 release overhaul of our JSONSchema declaration? go through all the registered formats, convert those that are currently using a CODE-ref definition into a HASH + validator, and convert all call sites of the corresponding parse_property_string to use the format ID instead of the format HASH. print_property_string also assumes that any named format is a hash-based format btw. > } else { > die "unknown format: $format\n"; > -- > 2.26.1 > > > _______________________________________________ > pve-devel mailing list > pve-devel@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel