An issue discovered at the end of the Liberty cycle [1] meant that while still
present in results (for admins) we had to disable searching of admin-only
fields (TL;DR – a non-admin can fish for results to discover values for
admin-only fields by searching e.g. "hostId": "a*"). We'd like to fix this,
though I've only come up with three ideas. Any feedback would be very welcome.
1. There is a plugin for Elasticsearch called Shield (unfortunately only
available commercially) that provides field level access by allowing roles to
specify a list of fields they can search and see in results. The list is
inclusive and can use wildcards. Shield has access to the parsed query and so
can exclude any terms that refer to blocked fields (it treats them as having no
results). It also disables the _all field for roles where a field list is
specified. Even were we able to use it, Shield is more restrictive that we
would like.
2. Create multiple indices (searchlight-admin, searchlight-user) and index some
fields only in the admin index. This has several things going for it:
* it's free
* it's reasonably easy to understand
* the code isn't complicated
* it's secure
* allows full text searching of _all
The strikes:
* more indices (especially where someone configures indices specific to a
plugin, though we could also allow plugins to not require an admin index)
* double the data stored
* greater risk of inconsistency (Elasticsearch doesn't have transactions)
* complicates the effort for zero-downtime reindexing
3. Implement something similar to Shield's field control ourselves. We'd need
to exclude fields from _all (because there's no good way to intercept queries
against it), and scrub incoming queries against the admin-only field list.
Naively, it's not too hard to conceive of doing this, but I envisage a trickle
of edge cases that work around the protection. For instance, to protect
'hostId' one might take the incoming dictionary and look for all instances of
'hostId', returning a 403 if it's found. This will find false positives (e.g.
another type has a non-admin field called hostId), and (worse) false negatives;
a query such as {"query_string": {"query": "hostId:a*"}} would escape it. Even
scrubbing the entire input string would have holes ({"multimatch": {"fields":
["hostI*"], "query": "aaabbccc"}}). We would probably be able to determine many
of the issues, but I'd always worry about finding more holes. Shield has the
advantage of being post-query parser.
4. ???
Conclusion:
My view is that a separate index is the only sensible way to do this, but I am
willing to be swayed.
[1] https://bugs.launchpad.net/searchlight/+bug/1504399
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev