Adding regular expression support to CORS filter

Carsten Klein Mon, 21 Sep 2020 01:46:41 -0700

Hi there,

I'd like to contribute a CORS filter enhancement, making it accept bothwildcard-based and 'regular expression'-based expressions for itsallowed origins list.

I know this from a project based on Jetty, which has support for, atleast, simple wildcard matching (*). Specifying multiple allowed originswith one pattern comes quite handy if there are numerous developermachines that all access one single server from their browsers.


The implementation shall support two flavors of expressions:

- Java Pattern Class based expressions

Enclosed between slashes (/.../) these bring the full power of regularexpressions to the filter's allowed origins list configuration. Thiswill also support specifying pattern flags by appending singlecharacters after the terminating slash (like in /^foo.bar$/i forcase-insensitive matching).


- Wildcard-based simple expressions

With a much simpler syntax, these expressions are more compact and maybe more intuitive for people not familiar with regular expressions.Although less powerful than *real* regular expressions, the specialcharacters supported should provide enough options to specify allowedorigins efficiently:


?       matches any single character except the domain separator (.)

#       matches any single digit

*       matches any number of any characters except the domain
        separator (.) including none

**      matches any number of any characters including none (this also
        matches the domain separator and so matches several sub-
        domains)
        (Technically, any number > 1 of consecutive asterisks are
        treated as a single ** pattern.)

[abc]   not yet sure about character classes
[a-z]
[^abc]

Wildcard-based expressions are implemented by regular expressions aswell. Such an expression is turned into an *anchored* regular expressionduring initialization e.g.


http://?*.devzone.intra  ==>  ^http://[^.][^.]*\.devzone\.intra$

Of course, it is still possible to specify literal origins, as well asthe sole '*' to allow access from any origin. So, the value of the'cors.allowed.origins' initialization parameter may look like this:



https://www.apache.org,
http://?*.devzone.intra,
/:\/\/staginghost-\d{1,3}.mycompany.corp$/i

As you can see, *real* Java regular expressions are not anchored bydefault. The above one is end anchored only (making it match any protocol).

Obviously, forward slashes withing the regular expression must beescaped in order not to end the expression prematurely.

The current CORS filter implementation uses a HashSet<String> todetermine whether a given origin is valid or not. That's a quite fastsolution. Since evaluating a regular expression is much more expensive(~25 times slower), a sort of caching mechanism is required.

The idea is to transparently add positive matches (allowed origins) tothe same HashSet<String> that already contains all literally specifiedallowed origins. Since these positives form a (rather small) countableset (in practice) we could simply just add these without worrying aboutcache removal. There is no difference in memory consumption compared tothe current implementation: all allowed origins must be stored in thathash set.

There *may* be more disallowed origins than allowed ones so, anothercache for non-matching origins is certainly a good idea. However, thisadds extra memory consumption and that cache should not grow with nolimit. The idea is to use a LinkedHashMap with access order orderingmode (accessOrder = true). This makes the map an LRU-cache, removing theleast recently used entry when a defined maximum capacity is reachedwhile adding a new entry. Maybe that cache's capacity should beconfigurable.

Write access to that caches must be synchronized. At current, I tend touse a ReadWriteLock.

So, the new algorithm for determining whether an origin is allowed ornot is like so:


private boolean isOriginAllowed(final String origin) {

    if (anyOriginAllowed) {
        return true;
    }

    if (allowedOrigins.contains(origin)) {
        return true;
    }

    if (notAllowedOrigins.containsKey(origin)) {
        return false;
    }

    // synchronized block starts here (using a ReadWriteLock)

    boolean result = false;
    for (Pattern pattern : allowedOriginPatterns) {
        if (pattern.matcher(origin).matches()) {
            allowedOrigins.add(origin);
            break;
        }
    }

    if (!result) {
        // origin is definitely not allowed
        notAllowedOrigins.put(origin, null);
    }

    // release lock here

    return result;
}

As you can see, there are not more than N extra evaluations of a regularexpression required for an allowed origin during the filter's lifetime(when the origin is used for the first time), N being the number ofdifferent expressions configured.

The same is true for not allowed origins if the LRU cache is large andif there are requests from only few different not allowed origins (thatis, if the LRU cache is not smaller than the number of different notallowed origins).

However, since the LinkedHashMap modifies its linked list when the'containsKey' method is called, that invocation is about 1.6 timesslower than the corresponding call to the 'contains' method of theHashSet. But, we are only slowing down requests for not allowed origins.

Also, absolutes timings per request are ~ 5.0 ns to ~7.8 ns (on a quiterecent i7 @ 3.600 GHz) so, that should not really be an issue. With theabove algorithm, these two times must be added for a single not allowedrequest that is in the LRU cache, but ~12.8 ns is still not thatdramatic, right?

In all other cases, up to N extra evaluations of a regular expressionare required if a not allowed origin is currently not in the LRU cache(which, in theory, may be the case for every request). However, thathighly depends on the order of requests and the distribution of theirorigins.

It would be great to get your support for contributing the describedCORS filter enhancement.


Carsten


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Adding regular expression support to CORS filter

Reply via email to