Re: Sparql Query

Harri Kiiskinen Wed, 08 Dec 2021 02:35:00 -0800

I may be wrong, but I think the problem could be with the filters. Nowyou are always filtering on content from the optionals, also in the casethere is nothing to filter on. The SPARQL recommendation says, that ifthe contents of OPTIONAL do not match, the solution is kept, but nobindings are created. I guess this _could_ mean, that in these cases thefilters are run on unbound variables, which in my experience leads tovery slow queries.

I would try putting the filters inside the optionals; but the purpose ofthe query would be different, since it would only matter inside theoptionals. Perhaps a HAVING filter at the end?



Harri Kiiskinen

On 8.12.2021 12.07, Lorenz Buehmann wrote:

Even if it's not the strings leading to performance issues, using theJena text index might be definitely more efficient


On 08.12.21 10:38, Matt Whitby wrote:

Fuseki. No inference. TDB2.

M

On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <[email protected]> wrote:

Lots of questions! Details matter!!

On 08/12/2021 09:05, Matt Whitby wrote:

It's hosted in a container in Azure.

(Jena storage layer)

Using TDB1? TDB2?

I test it via Postman (though we're writing a RESTFul API to sit ontop).

So this is Fuseki? Is there any inference being used?

      Andy

On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <[email protected]> wrote:

Hi Matt,

That query does not look couple-of-minutes expensive.

Could you run it removing parts to see what happens? e.g. Remove one
OPTIONAL and it's associated part of the filter.

Which storage layer are you using?

       Andy

On 07/12/2021 20:18, [email protected] wrote:

On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <[email protected]>

wrote:

I dare say running an lcase against each field doesn't help matters,

but

with
no other way of doing a case-insensitive search (well, Regex - butwho
likes
Regex?) I'm not sure.
On this point alone, if it does turn out that string processing iswhat
is
costing you time, you might adjust your data to include a convenience
property with county, district, and parish in lowercase. Then youcould
do
a more direct (and cheaper) match.

That having been said, it seems unlikely to me that timed-out queries

are

due to something as cheap as lowercasing. Have you tried peeling off

some

of those OPTIONALs to see how much they cost?

Adam


On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <[email protected]>

wrote:

I have a Sparql question if that's okay.

There are only around 8m triples in our test data, so pretty small.

The query takes a good couple of minutes to run (and sometimes just

times

out).

I dare say running an lcase against each field doesn't help matters,

but

with no other way of doing a case-insensitive search (well, Regex -

but

who

likes Regex?) I'm not sure.

Any obvious ways to make it less bad?

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s ?name
where {

?s <http://www.historicengland.org.uk/data/schema/simplename/name>

?name .

OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
?county}.

OPTIONAL {?s<http://www.historicengland.org.uk/data/schema/district/

?district}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
?parish}.

FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))

}
limit 10



--
Tutkijatohtori / post-doctoral researcher
Viral Culture in the Early Nineteenth-Century Europe (ViCE)

Movie Making Finland: Finnish fiction films as audiovisual big data,1907–2017 (MoMaF)

Turun yliopisto / University of Turku

Re: Sparql Query

Reply via email to