Re: Re: Query - Update

2021-12-10 Thread Lorenz Buehmann
minor fix, the regex for ?county should be "east sussex" for the first 
two options, and for the property path solution it should be


regex("east sussex|lewes", ?X, "i")

On 10.12.21 12:44, Andy Seaborne wrote:

Matt - thanks for the update

Other ways to speed the query up are:

* Use regex - I know you don't liek regex but the regex is compiled 
only once


FILTER ( regex("lewes", ?county, "i")
   || regex("lewes", ?district, "i")
   || regex("lewes", ?parish, "i")
   )

* Use UNION:

This is exploiting the data shape: each optional is independent and 
the overall filter means no matches at all never gets out.


{
  ?s heng-schema:county ?county .
  FILTER ( regex("lewes", ?county, "i")
} UNION {
  ?s heng-schema:district ?district
  FILTER ( regex("lewes", ?district, "i")
} UNION {
    ?s heng-schema:parish   ?parish
    FILTER ( regex("lewes", ?parish, "i")
  }

* Use a pproperty path:

where {
  ?s simplename:name ?name .
  ?s heng-schema:county|heng-schema:district|heng-schema:parish ?X .
  FILTER ( regex("lewes", ?X, "i")
}

although that might play badly with the LIMIT - depends on the data

See below about comparing to GraphDB:

On 10/12/2021 07:38, Lorenz Buehmann wrote:

Yeah, as expected, putting FILTER into OPTIONAL can help.

Just as a comment, the semantics is a bit different between


SELECT ?s ?o {
?s a :C .
OPTIONAL {
 ?s  ?o
}
FILTER(?o = "val")
}

and

SELECT ?s ?o {
?s a :C .
OPTIONAL {
 ?s  ?o
 FILTER(?o = "val")
}
}

The first query evaluates to false in the FILTER if there is no ?o at 
all, thus, ?s bindings might be dropped. In the second you'll always 
get all ?s bindings. That is the reason why no optimizer will push 
the filter into the OPTIONAL pattern.



Can you give some numbers on the current runtime of the query now? 
Did you try the fulltext index?


I also saw your thread on SO where you tried GraphDB as well. Any 
comparison numbers so far?


To be comparable the query needs to be with the LIMIT or turned into a 
SELECT (COUNT(*) AS ?C) ...


because the order items come off the indexes will have an effect on 
time for LIMIT 10.




On 09.12.21 23:17, Matt Whitby wrote:

James was kind enough to spend some time talking me through the query.

My original query (which timed out) was:

PREFIX xsd: 
select ?s ?name
where {

?s  
?name .


OPTIONAL {?s 
?county}.
OPTIONAL {?s 
?district}.
OPTIONAL {?s 
?parish}.

FILTER (CONTAINS(lcase(?county),"east sussex") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))

}
limit 10

Putting the FILTER under each statement helped it immensely.

select ?s
where {
 ?s 
?name.

?s  ?parish .
   FILTER (CONTAINS(lcase(?parish),"lewes"))
?s  
?district .

   FILTER (CONTAINS(lcase(?district),"lewes"))
?s  ?county .
   FILTER (CONTAINS(lcase(?county),"east sussex"))
}

Putting back the OPTIONAL and running it a third time, slowed it down
(though not as badly as the first iteration).



M



Re: [3.16.0] Repeating identical queries from SERVICE

2021-12-10 Thread Martynas Jusevičius
Moving SERVICE down in the joins seems to have helped quite a bit:

PREFIX  rdfs: 
PREFIX  acl:  
PREFIX  lacl: 
PREFIX  foaf: 
PREFIX  sioc: 

DESCRIBE ?auth
FROM 
WHERE
  {   { ?auth  acl:mode  ?Mode
  { ?auth  acl:accessTo  ?this }
UNION
  {   { ?auth  acl:accessToClass  ?Type }
UNION
  { ?auth  acl:accessToClass  ?Class .
?Type (rdfs:subClassOf)* ?Class
  }
SERVICE ?endpoint
  { { GRAPH ?g
{ ?this  a  ?Type }
}
  }
  }
{   { ?auth  acl:agent  ?agent }
  UNION
{ ?auth   acl:agentGroup  ?Group .
  ?Group  foaf:member ?agent
}
}
  }
UNION
  { ?auth  acl:mode  ?Mode
  { ?auth  acl:agentClass  foaf:Agent }
UNION
  { ?auth  acl:agentClass  ?AuthenticatedAgentClass }
  { ?auth  acl:accessTo  ?this }
UNION
  {   { ?auth  acl:accessToClass  ?Type }
UNION
  { ?auth  acl:accessToClass  ?Class .
?Type (rdfs:subClassOf)* ?Class
  }
SERVICE ?endpoint
  { { GRAPH ?g
{ ?this  a  ?Type }
}
  }
  }
  }
  }

On Sat, Dec 11, 2021 at 12:39 AM Martynas Jusevičius
 wrote:
>
> Hi,
>
> I have query that federates between 2 Fuseki instances (the "remote"
> one is fuseki-end-user):
>
> PREFIX  rdfs: 
> PREFIX  acl:  
> PREFIX  lacl: 
> PREFIX  foaf: 
> PREFIX  sioc: 
>
> DESCRIBE ?auth
> FROM 
> WHERE
>   {   { ?auth  acl:mode  acl:Read
>   { ?auth  acl:accessTo
>  }
> UNION
>   { SERVICE 
>   { { GRAPH ?g
> { 
> 
> a  ?Type
> }
> }
>   }
>   { ?auth  acl:accessToClass  ?Type }
> UNION
>   { ?auth  acl:accessToClass  ?Class .
> ?Type (rdfs:subClassOf)* ?Class
>   }
>   }
> {   { ?auth  acl:agent  rdfs:Resource }
>   UNION
> { ?auth   acl:agentGroup  ?Group .
>   ?Group  foaf:member rdfs:Resource
> }
> }
>   }
> UNION
>   { ?auth  acl:mode  acl:Read
>   { ?auth  acl:agentClass  foaf:Agent }
> UNION
>   { ?auth  acl:agentClass  rdfs:Resource }
>   { ?auth  acl:accessTo
>  }
> UNION
>   { SERVICE 
>   { { GRAPH ?g
> { 
> 
> a  ?Type
> }
> }
>   }
>   { ?auth  acl:accessToClass  ?Type }
> UNION
>   { ?auth  acl:accessToClass  ?Class .
> ?Type (rdfs:subClassOf)* ?Class
>   }
>   }
>   }
>   }
>
> What I see in the fuseki-end-user log following this query is a bunch
> (200+ in this case) identical requests with this query:
>
> SELECT  *
> WHERE
>   { GRAPH ?g
>   { 
>   a  ?Type
>   }
>   }
>
> I understand this is due to federation and know that Fuseki does not
> cache the results, but this strikes me as terribly inefficient.
> Each SERVICE request to fuseki-end-user takes around 10 ms but 200+ of
> them add to over 2 seconds.
>
> Is there an opportunity for optimization here? Either of the query or of Jena 
> :)
>
> Martynas
> atomgraph.com


[3.16.0] Repeating identical queries from SERVICE

2021-12-10 Thread Martynas Jusevičius
Hi,

I have query that federates between 2 Fuseki instances (the "remote"
one is fuseki-end-user):

PREFIX  rdfs: 
PREFIX  acl:  
PREFIX  lacl: 
PREFIX  foaf: 
PREFIX  sioc: 

DESCRIBE ?auth
FROM 
WHERE
  {   { ?auth  acl:mode  acl:Read
  { ?auth  acl:accessTo
 }
UNION
  { SERVICE 
  { { GRAPH ?g
{ 
a  ?Type
}
}
  }
  { ?auth  acl:accessToClass  ?Type }
UNION
  { ?auth  acl:accessToClass  ?Class .
?Type (rdfs:subClassOf)* ?Class
  }
  }
{   { ?auth  acl:agent  rdfs:Resource }
  UNION
{ ?auth   acl:agentGroup  ?Group .
  ?Group  foaf:member rdfs:Resource
}
}
  }
UNION
  { ?auth  acl:mode  acl:Read
  { ?auth  acl:agentClass  foaf:Agent }
UNION
  { ?auth  acl:agentClass  rdfs:Resource }
  { ?auth  acl:accessTo
 }
UNION
  { SERVICE 
  { { GRAPH ?g
{ 
a  ?Type
}
}
  }
  { ?auth  acl:accessToClass  ?Type }
UNION
  { ?auth  acl:accessToClass  ?Class .
?Type (rdfs:subClassOf)* ?Class
  }
  }
  }
  }

What I see in the fuseki-end-user log following this query is a bunch
(200+ in this case) identical requests with this query:

SELECT  *
WHERE
  { GRAPH ?g
  { 
  a  ?Type
  }
  }

I understand this is due to federation and know that Fuseki does not
cache the results, but this strikes me as terribly inefficient.
Each SERVICE request to fuseki-end-user takes around 10 ms but 200+ of
them add to over 2 seconds.

Is there an opportunity for optimization here? Either of the query or of Jena :)

Martynas
atomgraph.com


Re: Information about Apache Jena and Log4j2 vulnerability.

2021-12-10 Thread Brandon Sara
Andy, will you be releasing an RDF-Delta update that uses 4.3.1 soon as well?
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Re: Access control for federated queries

2021-12-10 Thread Andy Seaborne




On 10/12/2021 13:53, Mikael Pesonen wrote:


Hi,
is there a recommended way to handle access control for servers which 
are serving federated queries?
So is it possible to configure which servers may access which graphs or 
datasets, for example?





Yes. Allocate a "user id" to the servers, have them make HTTP using 
authentication, and use the API and graph access security.


Andy


Information about Apache Jena and Log4j2 vulnerability.

2021-12-10 Thread Andy Seaborne

This message is about the effect of CVE-2021-44228 (log4j2) on Fuseki.

https://nvd.nist.gov/vuln/detail/CVE-2021-44228

Jena ships log4j2 in Fuseki and the command line tools.

The vulnerability of log4j2 does impact Fuseki 3.15 - 3.17, and 4.x.

Remote execution is only possible with older versions of Java.

Java versions Java 8u121 and Java 11.0.1, and later, set 
"com.sun.jndi.rmi.object.trustURLCodebase"

and
"com.sun.jndi.cosnaming.object.trustURLCodebase"

to "false" protecting against remote code execution by default.


The workaround of setting "-Dlog4j2.formatMsgNoLookups=true" works with 
all affected Fuseki versions:


JVM_ARGS="-Dlog4j2.formatMsgNoLookups=true" ./fuseki-server 


Note that Apache Jena 4.2.0 addresses an unrelated Jena-specific CVE
https://nvd.nist.gov/vuln/detail/CVE-2021-39239

We will release Jena 4.3.1 with upgraded log4j2.

Andy
on behalf of the Jena PMC


Access control for federated queries

2021-12-10 Thread Mikael Pesonen



Hi,
is there a recommended way to handle access control for servers which 
are serving federated queries?
So is it possible to configure which servers may access which graphs or 
datasets, for example?





Re: Query - Update

2021-12-10 Thread Andy Seaborne

Matt - thanks for the update

Other ways to speed the query up are:

* Use regex - I know you don't liek regex but the regex is compiled only 
once


FILTER ( regex("lewes", ?county, "i")
   || regex("lewes", ?district, "i")
   || regex("lewes", ?parish, "i")
   )

* Use UNION:

This is exploiting the data shape: each optional is independent and the 
overall filter means no matches at all never gets out.


{
  ?s heng-schema:county ?county .
  FILTER ( regex("lewes", ?county, "i")
} UNION {
  ?s heng-schema:district ?district
  FILTER ( regex("lewes", ?district, "i")
} UNION {
?s heng-schema:parish   ?parish
FILTER ( regex("lewes", ?parish, "i")
  }

* Use a pproperty path:

where {
  ?s simplename:name ?name .
  ?s heng-schema:county|heng-schema:district|heng-schema:parish ?X .
  FILTER ( regex("lewes", ?X, "i")
}

although that might play badly with the LIMIT - depends on the data

See below about comparing to GraphDB:

On 10/12/2021 07:38, Lorenz Buehmann wrote:

Yeah, as expected, putting FILTER into OPTIONAL can help.

Just as a comment, the semantics is a bit different between


SELECT ?s ?o {
?s a :C .
OPTIONAL {
     ?s  ?o
}
FILTER(?o = "val")
}

and

SELECT ?s ?o {
?s a :C .
OPTIONAL {
     ?s  ?o
     FILTER(?o = "val")
}
}

The first query evaluates to false in the FILTER if there is no ?o at 
all, thus, ?s bindings might be dropped. In the second you'll always get 
all ?s bindings. That is the reason why no optimizer will push the 
filter into the OPTIONAL pattern.



Can you give some numbers on the current runtime of the query now? Did 
you try the fulltext index?


I also saw your thread on SO where you tried GraphDB as well. Any 
comparison numbers so far?


To be comparable the query needs to be with the LIMIT or turned into a 
SELECT (COUNT(*) AS ?C) ...


because the order items come off the indexes will have an effect on time 
for LIMIT 10.




On 09.12.21 23:17, Matt Whitby wrote:

James was kind enough to spend some time talking me through the query.

My original query (which timed out) was:

PREFIX xsd: 
select ?s ?name
where {

?s  
?name .


OPTIONAL {?s 
?county}.
OPTIONAL {?s 
?district}.
OPTIONAL {?s 
?parish}.

FILTER (CONTAINS(lcase(?county),"east sussex") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))

}
limit 10

Putting the FILTER under each statement helped it immensely.

select ?s
where {
 ?s 
?name.

?s  ?parish .
   FILTER (CONTAINS(lcase(?parish),"lewes"))
?s  ?district .
   FILTER (CONTAINS(lcase(?district),"lewes"))
?s  ?county .
   FILTER (CONTAINS(lcase(?county),"east sussex"))
}

Putting back the OPTIONAL and running it a third time, slowed it down
(though not as badly as the first iteration).



M