https://bugs.exim.org/show_bug.cgi?id=2472
Bug ID: 2472 Summary: Feature Request: PCRE2_SUBSTITUTE_LITERAL option for pcre2_substitute without processing replacement strings Product: PCRE Version: N/A Hardware: x86 OS: Windows Status: NEW Severity: wishlist Priority: medium Component: Code Assignee: p...@hermes.cam.ac.uk Reporter: ew3...@gmail.com CC: pcre-dev@exim.org Hi, I am following the guidelines on https://pcre.org/ to file a feature request by opening a bug ticket. I also tried searching for literal and pcre2_substitute in the closed and open bug section but was not able to find a similar feature request. Description: ------------ I think an additional option e.g. PCRE2_SUBSTITUTE_LITERAL which specifies that the replacement string in pcre2_substitute should not be processed at all would be useful for many programs that utilize pcre2_substitute. Rationale --------- I believe a common use case is when arbitrary replacement strings are obtained from an external source and copying replacement strings for preprocessing/escaping is to be avoided. One example that should be quite common are many long strings with monetary values such as "....amounts to $10 in value...." (here the replacement string refers to the currency symbol $ for a monetary dollar value). Currently this would have to be escaped as "....amounts to $$10 in value...." or with extended syntax "\Q....amounts to $10 in value....\E" according to https://pcre.org/current/doc/html/pcre2api.html#substitutions. My personal use case is obtaining the replacement strings inside a user defined function of a database application. Comparison to other PCRE2 options -------------------------------- A similar option PCRE2_LITERAL is available for pcre_compile despite regular expressions not being efficient for its use case. The proposed option would be the counterpart to PCRE2_SUBSTITUTE_EXTENDED. While PCRE2_SUBSTITUTE_EXTENDED increases replacement string processing complexity, PCRE2_SUBSTITUTE_LITERAL would decrease it. Disadvantages of Alternatives ----------------------------- Escape Replacement String Replacement strings need to be copied to a new buffer and escaped. This requires extra memory and knowledge of which characters are to be escaped ($). Extended syntax e.g. \Q \E Extended sytnax also requires a new copy and adding \Q and \E as well as escaping \E in the replacement string. Substitution callouts A placeholder replacement string could be handed to pcre2_compile (e.g. empty string) and literal replacement handled by a callout. This is not only cumbersome but also makes PCRE2_SUBSTITUTE_OVERFLOW_LENGTH not easy to use because callouts are not called for overflows. Implementing a separate routine based on pcre2_substitute Implementing a correct routine that behaves as pcre2_substitute does is not trivial and some internal methods that pcre2_substitute uses are not exported. (e.g. UTF checks or direct access to the callouts set in the match context which would require a different parameter set in the separate implementation to handle callouts). - Actually get_callout and get_substitute_callout functionality with the public headers seems something that could also be useful but is not part of this feature request). Implementation Thoughts ---------------------- I hope some thoughts on untested code are appropriate here. I could not find a guideline with respect to that and I saw some code in other reports. My first impression is that since the size of the replacement is known and it is constant, one could call the CHECKMEMCPY macro in pcre2_substitute.c before replacement processing and skip entering the replacement string processing loop with the next relevant section being the callout section. E.g. BOOL all_literal = ((options & PCRE2_SUBSTITUTE_LITERAL)!=0); ... if (all_literal) { CHECKMEMCPY(replacement,rlength); // skip replacement processing loop ... } else { // replacement processing loop ... } //callout section The cost of such implementation would then be an option bit of the match options and one additional if check within the global loop of pcre2_substitute. -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev