Hi there,
I'd like to contribute a CORS filter enhancement, making it accept both
wildcard-based and 'regular expression'-based expressions for its
allowed origins list.
I know this from a project based on Jetty, which has support for, at
least, simple wildcard matching (*). Specifying multiple allowed origins
with one pattern comes quite handy if there are numerous developer
machines that all access one single server from their browsers.
The implementation shall support two flavors of expressions:
- Java Pattern Class based expressions
Enclosed between slashes (/.../) these bring the full power of regular
expressions to the filter's allowed origins list configuration. This
will also support specifying pattern flags by appending single
characters after the terminating slash (like in /^foo.bar$/i for
case-insensitive matching).
- Wildcard-based simple expressions
With a much simpler syntax, these expressions are more compact and may
be more intuitive for people not familiar with regular expressions.
Although less powerful than *real* regular expressions, the special
characters supported should provide enough options to specify allowed
origins efficiently:
? matches any single character except the domain separator (.)
# matches any single digit
* matches any number of any characters except the domain
separator (.) including none
** matches any number of any characters including none (this also
matches the domain separator and so matches several sub-
domains)
(Technically, any number > 1 of consecutive asterisks are
treated as a single ** pattern.)
[abc] not yet sure about character classes
[a-z]
[^abc]
Wildcard-based expressions are implemented by regular expressions as
well. Such an expression is turned into an *anchored* regular expression
during initialization e.g.
http://?*.devzone.intra ==> ^http://[^.][^.]*\.devzone\.intra$
Of course, it is still possible to specify literal origins, as well as
the sole '*' to allow access from any origin. So, the value of the
'cors.allowed.origins' initialization parameter may look like this:
https://www.apache.org,
http://?*.devzone.intra,
/:\/\/staginghost-\d{1,3}.mycompany.corp$/i
As you can see, *real* Java regular expressions are not anchored by
default. The above one is end anchored only (making it match any protocol).
Obviously, forward slashes withing the regular expression must be
escaped in order not to end the expression prematurely.
The current CORS filter implementation uses a HashSet<String> to
determine whether a given origin is valid or not. That's a quite fast
solution. Since evaluating a regular expression is much more expensive
(~25 times slower), a sort of caching mechanism is required.
The idea is to transparently add positive matches (allowed origins) to
the same HashSet<String> that already contains all literally specified
allowed origins. Since these positives form a (rather small) countable
set (in practice) we could simply just add these without worrying about
cache removal. There is no difference in memory consumption compared to
the current implementation: all allowed origins must be stored in that
hash set.
There *may* be more disallowed origins than allowed ones so, another
cache for non-matching origins is certainly a good idea. However, this
adds extra memory consumption and that cache should not grow with no
limit. The idea is to use a LinkedHashMap with access order ordering
mode (accessOrder = true). This makes the map an LRU-cache, removing the
least recently used entry when a defined maximum capacity is reached
while adding a new entry. Maybe that cache's capacity should be
configurable.
Write access to that caches must be synchronized. At current, I tend to
use a ReadWriteLock.
So, the new algorithm for determining whether an origin is allowed or
not is like so:
private boolean isOriginAllowed(final String origin) {
if (anyOriginAllowed) {
return true;
}
if (allowedOrigins.contains(origin)) {
return true;
}
if (notAllowedOrigins.containsKey(origin)) {
return false;
}
// synchronized block starts here (using a ReadWriteLock)
boolean result = false;
for (Pattern pattern : allowedOriginPatterns) {
if (pattern.matcher(origin).matches()) {
allowedOrigins.add(origin);
break;
}
}
if (!result) {
// origin is definitely not allowed
notAllowedOrigins.put(origin, null);
}
// release lock here
return result;
}
As you can see, there are not more than N extra evaluations of a regular
expression required for an allowed origin during the filter's lifetime
(when the origin is used for the first time), N being the number of
different expressions configured.
The same is true for not allowed origins if the LRU cache is large and
if there are requests from only few different not allowed origins (that
is, if the LRU cache is not smaller than the number of different not
allowed origins).
However, since the LinkedHashMap modifies its linked list when the
'containsKey' method is called, that invocation is about 1.6 times
slower than the corresponding call to the 'contains' method of the
HashSet. But, we are only slowing down requests for not allowed origins.
Also, absolutes timings per request are ~ 5.0 ns to ~7.8 ns (on a quite
recent i7 @ 3.600 GHz) so, that should not really be an issue. With the
above algorithm, these two times must be added for a single not allowed
request that is in the LRU cache, but ~12.8 ns is still not that
dramatic, right?
In all other cases, up to N extra evaluations of a regular expression
are required if a not allowed origin is currently not in the LRU cache
(which, in theory, may be the case for every request). However, that
highly depends on the order of requests and the distribution of their
origins.
It would be great to get your support for contributing the described
CORS filter enhancement.
Carsten
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org