In various parts, PCRE2 runs the following to check if a substring number if valid:

  if (stringnumber >= match_data->oveccount ||
      stringnumber > match_data->code->top_bracket ||
      match_data->ovector[stringnumber*2] == PCRE2_UNSET)

I wonder if it is indeed necessary to compare with top_bracket. Rationale is this comment in pcre2_match.c:

  If there is space in the offset vector, set any unused pairs at the
  end to PCRE2_UNSET for backwards compatibility.

Provided that the above holds true, should it not be sufficient to test for PCRE2_UNSET?

In addition, the code snippet makes pcre2_match_data depend on pcre2_code. If pcre2_code is freed before pcre2_match_data, the outcome of the code snipped is undetermined. I have searched the documentation, but have not found it mentioning this.


A related thought:

The above code extract reappears identically multiple times in pcre2_substring.c. Would it make sense to refactor substring validity checking into its own, dedicated function?

IMO, this would also be a welcome addition to the public API.

Ralf

--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Reply via email to