An issue discovered at the end of the Liberty cycle [1] meant that while still 
present in results (for admins) we had to disable searching of admin-only 
fields (TL;DR – a non-admin can fish for results to discover values for 
admin-only fields by searching e.g. "hostId": "a*"). We'd like to fix this, 
though I've only come up with three ideas. Any feedback would be very welcome.

1. There is a plugin for Elasticsearch called Shield (unfortunately only 
available commercially) that provides field level access by allowing roles to 
specify a list of fields they can search and see in results. The list is 
inclusive and can use wildcards. Shield has access to the parsed query and so 
can exclude any terms that refer to blocked fields (it treats them as having no 
results). It also disables the _all field for roles where a field list is 
specified. Even were we able to use it, Shield is more restrictive that we 
would like.

2. Create multiple indices (searchlight-admin, searchlight-user) and index some 
fields only in the admin index. This has several things going for it:
 * it's free
 * it's reasonably easy to understand
 * the code isn't complicated
 * it's secure
 * allows full text searching of _all

The strikes:
 * more indices (especially where someone configures indices specific to a 
plugin, though we could also allow plugins to not require an admin index)
 * double the data stored
 * greater risk of inconsistency (Elasticsearch doesn't have transactions)
 * complicates the effort for zero-downtime reindexing

3. Implement something similar to Shield's field control ourselves. We'd need 
to exclude fields from _all (because there's no good way to intercept queries 
against it), and scrub incoming queries against the admin-only field list.

Naively, it's not too hard to conceive of doing this, but I envisage a trickle 
of edge cases that work around the protection. For instance, to protect 
'hostId' one might take the incoming dictionary and look for all instances of 
'hostId', returning a 403 if it's found. This will find false positives (e.g. 
another type has a non-admin field called hostId), and (worse) false negatives; 
a query such as {"query_string": {"query": "hostId:a*"}} would escape it. Even 
scrubbing the entire input string would have holes ({"multimatch": {"fields": 
["hostI*"], "query": "aaabbccc"}}). We would probably be able to determine many 
of the issues, but I'd always worry about finding more holes. Shield has the 
advantage of being post-query parser.

4. ???

Conclusion:
My view is that a separate index is the only sensible way to do this, but I am 
willing to be swayed.

[1] https://bugs.launchpad.net/searchlight/+bug/1504399


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to