Hi there,

I'd like to contribute a CORS filter enhancement, making it accept both wildcard-based and 'regular expression'-based expressions for its allowed origins list.

I know this from a project based on Jetty, which has support for, at least, simple wildcard matching (*). Specifying multiple allowed origins with one pattern comes quite handy if there are numerous developer machines that all access one single server from their browsers.

The implementation shall support two flavors of expressions:

- Java Pattern Class based expressions

Enclosed between slashes (/.../) these bring the full power of regular expressions to the filter's allowed origins list configuration. This will also support specifying pattern flags by appending single characters after the terminating slash (like in /^foo.bar$/i for case-insensitive matching).

- Wildcard-based simple expressions

With a much simpler syntax, these expressions are more compact and may be more intuitive for people not familiar with regular expressions. Although less powerful than *real* regular expressions, the special characters supported should provide enough options to specify allowed origins efficiently:

?       matches any single character except the domain separator (.)

#       matches any single digit

*       matches any number of any characters except the domain
        separator (.) including none

**      matches any number of any characters including none (this also
        matches the domain separator and so matches several sub-
        domains)
        (Technically, any number > 1 of consecutive asterisks are
        treated as a single ** pattern.)

[abc]   not yet sure about character classes
[a-z]
[^abc]

Wildcard-based expressions are implemented by regular expressions as well. Such an expression is turned into an *anchored* regular expression during initialization e.g.

http://?*.devzone.intra  ==>  ^http://[^.][^.]*\.devzone\.intra$

Of course, it is still possible to specify literal origins, as well as the sole '*' to allow access from any origin. So, the value of the 'cors.allowed.origins' initialization parameter may look like this:


https://www.apache.org,
http://?*.devzone.intra,
/:\/\/staginghost-\d{1,3}.mycompany.corp$/i


As you can see, *real* Java regular expressions are not anchored by default. The above one is end anchored only (making it match any protocol).

Obviously, forward slashes withing the regular expression must be escaped in order not to end the expression prematurely.


The current CORS filter implementation uses a HashSet<String> to determine whether a given origin is valid or not. That's a quite fast solution. Since evaluating a regular expression is much more expensive (~25 times slower), a sort of caching mechanism is required.

The idea is to transparently add positive matches (allowed origins) to the same HashSet<String> that already contains all literally specified allowed origins. Since these positives form a (rather small) countable set (in practice) we could simply just add these without worrying about cache removal. There is no difference in memory consumption compared to the current implementation: all allowed origins must be stored in that hash set.

There *may* be more disallowed origins than allowed ones so, another cache for non-matching origins is certainly a good idea. However, this adds extra memory consumption and that cache should not grow with no limit. The idea is to use a LinkedHashMap with access order ordering mode (accessOrder = true). This makes the map an LRU-cache, removing the least recently used entry when a defined maximum capacity is reached while adding a new entry. Maybe that cache's capacity should be configurable.

Write access to that caches must be synchronized. At current, I tend to use a ReadWriteLock.

So, the new algorithm for determining whether an origin is allowed or not is like so:

private boolean isOriginAllowed(final String origin) {

    if (anyOriginAllowed) {
        return true;
    }

    if (allowedOrigins.contains(origin)) {
        return true;
    }

    if (notAllowedOrigins.containsKey(origin)) {
        return false;
    }

    // synchronized block starts here (using a ReadWriteLock)

    boolean result = false;
    for (Pattern pattern : allowedOriginPatterns) {
        if (pattern.matcher(origin).matches()) {
            allowedOrigins.add(origin);
            break;
        }
    }

    if (!result) {
        // origin is definitely not allowed
        notAllowedOrigins.put(origin, null);
    }

    // release lock here

    return result;
}

As you can see, there are not more than N extra evaluations of a regular expression required for an allowed origin during the filter's lifetime (when the origin is used for the first time), N being the number of different expressions configured.

The same is true for not allowed origins if the LRU cache is large and if there are requests from only few different not allowed origins (that is, if the LRU cache is not smaller than the number of different not allowed origins).

However, since the LinkedHashMap modifies its linked list when the 'containsKey' method is called, that invocation is about 1.6 times slower than the corresponding call to the 'contains' method of the HashSet. But, we are only slowing down requests for not allowed origins.

Also, absolutes timings per request are ~ 5.0 ns to ~7.8 ns (on a quite recent i7 @ 3.600 GHz) so, that should not really be an issue. With the above algorithm, these two times must be added for a single not allowed request that is in the LRU cache, but ~12.8 ns is still not that dramatic, right?

In all other cases, up to N extra evaluations of a regular expression are required if a not allowed origin is currently not in the LRU cache (which, in theory, may be the case for every request). However, that highly depends on the order of requests and the distribution of their origins.

It would be great to get your support for contributing the described CORS filter enhancement.

Carsten


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to