Re: Interaction between Text indexing, Fuseki services & Data Access Control

Vilnis Termanis Tue, 03 May 2022 07:57:26 -0700

On Sat, 16 Apr 2022 at 21:29, Andy Seaborne <[email protected]> wrote:

>
>
> On 14/04/2022 20:57, Vilnis Termanis wrote:
> > Hi,
> >
> > Specifying the following in each service (which is to ignore text
> > indexing) now works (tested against 4.4.0):
> >
> > ja:context [ ja:cxtName "http://jena.apache.org/text#index"; ;
> > ja:cxtValue false ] ;
>
> Doesn't that cause warnings in the Fuseki log?
>


Yes it does - three (expected) warnings from TextQueryPF:
1) Context setting 'symbol:http://jena.apache.org/text#index'is not a
TextIndex
2) Failed to find the text index : tried context and as a text-enabled
dataset
3) No text index - no text search performed

(Note that I also tried specifying the override as "undef" - but that
doesn't work because it basically unsets the existing value and the dataset
one then rightly is used.)

Is the stack of datasets still the same as earlier?
>

Yes it is: One as TextDataset(DatasetTDB) and the other as
AccessControlledDataset(DatasetTDB) (where DatasetTDB is the same dataset
for both).


> If you are saying that is necessary that is necessary, it looks like the
> text context contaminates the base dataset but the fix may break the
> reverse case (if anyone uses it) of direct access to the storage DB. The
> fix isn't a quick one, but it so happens the code (a version of Context
> that keeps changes does exists albeit not in that codebase).
>
> My understanding is that, for updates, the stack of datasets is respected
when deciding whether access the text index whereas for querying the
dataset context is used, which is not stacked. (The context merging is for
server + dataset + endpoint but there is only one of each.)
I think the proposed fix would cause issues only if there is an expectation
for AccessControlledDataset to behave differently when it comes to handling
endpoint context: Right now it is ignored.


> >
> > ... but not if the associated dataset is an AccessControlledDataset.
> >
> >  From my understanding, the issue is to do with the fact that
> > fuseki-access uses QueryExecutionFactory whilst fuseki-core uses
> > QueryExecDatasetBuilder. The latter takes the HttpAction's context
> > into account (which presumably leads to the inclusion of the service
> > context values) while the former does not. (In addition, it would
> > appear that fuseki-access does not honour query-specific timeouts due
> > to a similar reason.)
>
> Timeouts ignored because the context is skipped?
>
> The timeout is not stored in the context, but instead on the
SPARQLQueryProcessor instance (via its setTimeouts method).


> > This patch seems to fix the issue:
> >
> https://github.com/vtermanis/jena/commit/e5cb112f829f305c1f76c8f5305f4394d8e9b04f
>
> Would you be able to turn that into a PR?
>
> https://github.com/apache/jena/pull/1291

As I said, I'm aware there's probably a better way to expose the common
"merge endpoint context with DS + server ones" + "honour timeout
parameter". But hopefully this is a good starting point to throw rocks at.

> I know that this most likely is not the right way to address it.
> > (Should fuseki-access re-use some common QueryExecution building code
> > from fuseki-core?) I also wasn't sure how to add an automated (to
> > jena-integration-tests or with mocking in fuseki-access?) test-case
> > for this, but I can provide minimal manual steps.
> >
> > Should I create a Jira ticket for this?
>
> There are several "this" here.
>
> > Regards,
> > Vilnis
>
>      Andy
>
> >
> >
> > On Tue, 12 Apr 2022 at 21:44, Vilnis Termanis
> > <[email protected]> wrote:
> >>
> >> Hi Andy,
> >>
> >> Thank you for the suggestion of in-config context overrides - I had
> >> not realised that was possible (with the newer style of defining
> >> Fuseki services) - that's really useful.
> >> We'll re-rest the aforementioned 2b & 3b cases.
> >>
> >> Regards,
> >> Vilnis
> >>
> >> On Fri, 8 Apr 2022 at 11:51, Andy Seaborne <[email protected]> wrote:
> >>>
> >>> Hi Vilnis,
> >>>
> >>> On 07/04/2022 11:10, Vilnis Termanis wrote:
> >>>> Hi,
> >>>>
> >>>> In brief: Can Fuseki Data ACL be applied to text indexing?
> >>>
> >>> As a general point - a text index itself is not ACL aware. It is setup
> >>> ahead of time and does not index triples directly. The GeoSPARQL cache
> >>> is probably similar (I'm less familiar with the GeoSPARQL code).
> >>>
> >>> When the query is under the control of a trusted client, the pattern:
> >>>
> >>> WHERE {
> >>>       ?s a ex:Product ;
> >>>          text:query (rdfs:label 'printer') ;
> >>>          rdfs:label ?lbl
> >>> }
> >>>
> >>> can be check of the triple.
> >>>
> >>> If the query isn't controlled, then that won't work.
> >>>
> >>> (Has your usage style changed in the last year?)
> >>>
> >>>> And is it
> >>>> possible to selectively expose text index access per service for a
> >>>> shared dataset?
> >>>
> >>> Yes.
> >>>
> >>> The context setting can be set per dataset, per service or per endpoint
> >>> with ja:context [ ja:cxtName "NAME" ;  ja:cxtValue "VALUE" ] ;
> >>>
> >>> E.g.
> >>>       fuseki:endpoint [
> >>>           fuseki:operation fuseki:query ;
> >>>           fuseki:name "sparql"
> >>>           ja:context [
> >>>              ja:cxtName "NAME" ;  ja:cxtValue "VALUE"
> >>>           ] ;
> >>>       ] ;
> >>>
> >>>>
> >>>> In detail:
> >>>>
> >>>> We're using a single TDB dataset in unionDefaultGraph mode) with
> >>>> multiple services, wrapped with both ACL (AccessControlledDataset) as
> >>>> well as text indexing (TextDataset) and are hoping to provide the
> >>>> following Fuseki services:
> >>>>
> >>>> 1. "full access" - a) Read/write everything b) including text index
> >>>> 2. "selected graphs only" - a) Read only from selected graphs b) no
> index access
> >>>> 3. "read all" - a) Read everything b) no index access
> >>>>
> >>>> In the assembler configuration, datasets for the above services are
> >>>> respectively defined as (where all use the same underlying dataset):
> >>>> 1. TextDataset(DatasetTDB)
> >>>> 2. AccessControlledDataset(DatasetTDB)
> >>>> 3. DatasetTDB
> >>>>
> >>>> 1a & 1b work as expected, as do 2a & 3a. 2b & 3b however still allow
> >>>> access to text indexing, despite not being explicitly configured as
> >>>> such in their respective services.
> >>>
> >>> re: 2b/3b: That could be a bug or a configuration error.
> >>>
> >>> The context value is set on the text dataset. So if the server
> >>> configuration has a service that does not go through the text dataset,
> >>> the index should not be visible. There will be an entry in the server
> log.
> >>>
> >>> You don't actually need the DatasetGraphText if the index is only read
> >>> (i.e. preloaded and no runtime updates).
> >>>
> >>>>   From looking at code, I can see that index availability is based on
> >>>> the TextQuery.textIndex symbol in the execution context
> >>>> (TextQueryPF.java). This means that, as long as at least one service
> >>>> enabled text indexing on a dataset, any other services referencing the
> >>>> same underlying store will also use it.
> >>>> (Judging by comments in the code, the "instanceof DatasetGraphText"
> >>>> check is deprecated, even if the logic for now remains in
> >>>> chooseTextIndex()).
> >>>>
> >>>> So our questions are:
> >>>>
> >>>> I) Is it currently possible to disallow access to the text index for
> >>>> some services but not others (using the same underlying dataset)?
> >>>
> >>> Should be - see above.
> >>>
> >>>> II) If not, what might be best approach to implement such a
> >>>> restriction? (Would traversal of DatasetGraphWrapper to explicitly
> >>>> find a DatasetGraphText instance make sense?)
> >>>> III) Or: Is there a different/better approach to solve the index
> >>>> visibility need described above?
> >>>>
> >>>> In addition, regarding spatial lookups:
> >>>> IV) Would GeoSPARQL querying (and it's online caching) respect
> >>>> AccessControlledDataset restrictions (when querying is performed over
> >>>> multiple services with different levels of ACL)?
> >>>
> >>> The GeoSPARQL cache is like the text index - not request principal
> >>> sensitive. (see caveat!)
> >>>
> >>>> Regards,
> >>>> Vilnis
> >>>
> >>>       Andy
> >>
> >>
> >>
> >> --
> >> Vilnis Termanis
> >> Senior Software Developer
> >>
> >> m | +44 (0) 7521 012309
> >> e | [email protected]
> >> www.iotics.com
> >>
> >> The information contained in this email is strictly confidential and
> >> intended only for the parties noted. If this email was not intended
> >> for your use, please contact Iotics. For more on our Privacy Policy
> >> please visit https://www.iotics.com/legal/
> >
> >
> >
>


-- 
Vilnis Termanis
Technical Specialist

e | [email protected]
www.iotics.com

The information contained in this email is strictly confidential and
intended only for the parties noted. If this email was not intended for
your use, please contact Iotics. For more on our Privacy Policy please
visit https://www.iotics.com/legal/

Re: Interaction between Text indexing, Fuseki services & Data Access Control

Reply via email to