Re: Dynamically restricting graph access at SPARQL query time

2022-01-21 Thread Martynas Jusevičius
WebAccessControl ontology might be relevant here:
https://www.w3.org/wiki/WebAccessControl
We're using a request filter that controls access against
authorizations using SPARQL.

On Fri, Jan 21, 2022 at 4:13 PM Vilnis Termanis
 wrote:
>
> Hi,
>
> For a SPARQL query via Fuseki, we are trying to restrict visibility of
> groups of triples (each with multiple subjects) dynamically, in order
> to allow for generic queries to be executed by users (instead of
> providing tinned ones).
>
> Looking at the available ACL mechanisms in Jena/Fuseki, I assume
> storing each of these groups as a distinct graph might be the way
> forward. (The expectation is to be able to support 10^5 or higher
> number of these.)
>
> I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
> LDAP/other), only consider triples from the set of graphs 1..N during
> the query. (Where the allowed list of 1..N graphs is to be looked up
> at the point of the query.)
>
> From my limited understanding, some potential routes are:
>
> a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
> Filter" support in TDB.
> However, the configuration of allowed graphs per user is static at runtime.
>
> b) jena-permissions - Extends the SPARQL query engine with an Op
> rewriter which allows a user-defined evalulator implementation to
> allow/deny access to a graph/triple, given a specific user/principle.
> (The specific yes/no evaluation responses are cached for the duration
> of a query/operation.)
> However, this can only applied to a single graph as it stands.
>
> c) Parse & re-write the query to e.g. scope it using a fixed set of
> "FROM" clauses. From some minimal testing (with ~200 FROM clauses)
> this does not appear to perform well (compare to a tinned query which
> explicitly restricts access via knowledge of the ontologies involved).
> I appreciate that maybe having a large list of FROM clauses is an
> anti-pattern.
>
> My questions are:
>
> 1) Does filtering to a set of subset of graphs (from a large set of
> graphs) to restrict access sounds like a sensible thing to do? (Note
> that each of these graphs would contain a set of multiple subjects -
> i.e. we are not trying filter by specific predicate/object values.)
>
> 2) Would extending either jena-fuseki-access to support the
> user-graph-list lookup dynamically OR extend jena-permissions to work
> at dataset level be sensible things to do?
>
> 3) If the answer to either of (2) is yes - I'd be interested in
> getting a better understanding of what would be involved to gauge the
> size/effort of such an extension. I have had a look codebases for the
> aforementioned projects, but my knowledge of TDB/ARQ/etc is very
> limited. (We'd potentially be interested in taking this on, time &
> priorities permitting.)
>
> I didn't know which mailing list to send this to but I thought the
> users list would probably be a better starting point.
>
> Regards,
> Vilnis
>
> --
> Vilnis Termanis
> Senior Software Developer
>
> e | vilnis.terma...@iotics.com
> www.iotics.com


Dynamically restricting graph access at SPARQL query time

2022-01-21 Thread Vilnis Termanis
Hi,

For a SPARQL query via Fuseki, we are trying to restrict visibility of
groups of triples (each with multiple subjects) dynamically, in order
to allow for generic queries to be executed by users (instead of
providing tinned ones).

Looking at the available ACL mechanisms in Jena/Fuseki, I assume
storing each of these groups as a distinct graph might be the way
forward. (The expectation is to be able to support 10^5 or higher
number of these.)

I.e.: Given a user (external to Fuseki, e.g. presented via shiro via
LDAP/other), only consider triples from the set of graphs 1..N during
the query. (Where the allowed list of 1..N graphs is to be looked up
at the point of the query.)

>From my limited understanding, some potential routes are:

a) jena-fuseki-access - Filters triples at storage level via "TDB Quad
Filter" support in TDB.
However, the configuration of allowed graphs per user is static at runtime.

b) jena-permissions - Extends the SPARQL query engine with an Op
rewriter which allows a user-defined evalulator implementation to
allow/deny access to a graph/triple, given a specific user/principle.
(The specific yes/no evaluation responses are cached for the duration
of a query/operation.)
However, this can only applied to a single graph as it stands.

c) Parse & re-write the query to e.g. scope it using a fixed set of
"FROM" clauses. From some minimal testing (with ~200 FROM clauses)
this does not appear to perform well (compare to a tinned query which
explicitly restricts access via knowledge of the ontologies involved).
I appreciate that maybe having a large list of FROM clauses is an
anti-pattern.

My questions are:

1) Does filtering to a set of subset of graphs (from a large set of
graphs) to restrict access sounds like a sensible thing to do? (Note
that each of these graphs would contain a set of multiple subjects -
i.e. we are not trying filter by specific predicate/object values.)

2) Would extending either jena-fuseki-access to support the
user-graph-list lookup dynamically OR extend jena-permissions to work
at dataset level be sensible things to do?

3) If the answer to either of (2) is yes - I'd be interested in
getting a better understanding of what would be involved to gauge the
size/effort of such an extension. I have had a look codebases for the
aforementioned projects, but my knowledge of TDB/ARQ/etc is very
limited. (We'd potentially be interested in taking this on, time &
priorities permitting.)

I didn't know which mailing list to send this to but I thought the
users list would probably be a better starting point.

Regards,
Vilnis

--
Vilnis Termanis
Senior Software Developer

e | vilnis.terma...@iotics.com
www.iotics.com