I've been playing around with these changes today and the more I thnk about it the more it seems it should be in src/anyp/ using the Uri:: namespace - which will encompass both RFCs 1738 and 3896 CharacterSet definitions along with other src/URL.h src/url.cc code implementing those RFCs logics.
On 29/12/2015 10:53 a.m., Alex Rousskov wrote: > On 12/28/2015 12:01 PM, Kinkie wrote: >> On Mon, Dec 28, 2015 at 5:46 PM, Alex Rousskov wrote: >>> On 12/28/2015 07:01 AM, Kinkie wrote: >>>> +static int >>>> +fromhex(char ch) >>>> +{ >>>> + if (ch >= '0' && ch <= '9') >>>> + return ch - '0'; >>>> + if (ch >= 'a' && ch <= 'f') >>>> + return ch - 'a' + 10; >>>> + if (ch >= 'A' && ch <= 'F') >>>> + return ch - 'A' + 10; >>>> + return -1; >>>> +} >>> >>> If you are after performance, an unconditional lookup in a 256-member >>> int vector would be faster. >> >> Performance is good, but it's not the primary objective. For now I'd >> leave as-is. > > Consider adding a TODO comment then. Someday, somebody might actually > look through TODO comments and implement simple and useful optimizations > like this one [ instead of complaining that there are only hard problems > that they cannot solve well or, worse, attempting to solve those problems ]. > Today I re-implemented both tohex() and fromhex() as array lookups. It is a little time consuming to manually generate the fromhex array. But a one-off action and seems to work fine. > >>>> + for (auto in = s.begin(); in != e; ++in) { >>>> + if (*in != '%') { // normal case, copy and continue >>>> + rv.push_back(*in); >>>> + continue; >>>> + } >>> ... > > >>> I cannot validate that low-level C code. Can you use a Tokenizer >>> instead? Is it critical to provide copy-less performance for helpers >>> using std::string instead of SBuf? > > >> Performance for helpers is not really critical IMO. >> SBuf is a lot of code and it really depends on infrastructure provided >> by squid (e.g. mempools). >> Outside squid it really makes not much sense to bring in that much >> scaffolding - which would also have include stubs etc. >> It's much better to be API-compatible and use std::string there. > > I doubt that writing low-level code that is difficult to review and that > we often get wrong is worth std::string API compatibility. I have > already speculated about that in the previous squid-dev message so I > will not repeat those thoughts here. > > > > >>>> +const CharacterSet >>>> + RFC3986::Unsafe("rfc1738:unsafe", "<>\"# %{}|\\^~[]`'"), RFC 1738 does not specify the ' character in that unsafe set. >>>> + RFC3986::Ctrls("rfc1738:ctrls", {{0x00, 0x1f}, {0x7f,0xff}}), >>>> + RFC3986::Reserved1738("rfc1738:reserved", ";/?:@=&"), >>> ... >>>> + RFC3986::Unescaped = (UnsafeAndCtrls - CharacterSet(nullptr,"%") >>>> ).rename("rfc1738:unescaped"), >>> >>> >>> If these are RFC 1738 sequences, why place them in the RFC 3986 namespace? >> >> Providing callers backwards compatibility is the main reason for me. > > This answer does not compute for me: AFAIK, you are adding a _new_ class > or namespace called RFC3986. Thus, backwards compatibility is not an > issue here. > > Please note that I am not saying that "rfc1738:ctrls" and > "rfc1738:unsafe" sets should not be provided. I am saying that if they > are provided, they should be provided inside an Rfc1738 namespace rather > than inside RFC3986. The set called Ctrls is actually CharacterSet::CTL, which we already have defined in CharacterSet.h Amos _______________________________________________ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev