Interesting Lorenz; thanks for that pointer! nit: Looks like maybe the compatibility matrix needs to be updated for recent (>4.0) versions of Jena?
On Wed, Dec 8, 2021 at 3:42 AM Lorenz Buehmann < [email protected]> wrote: > It does indeed, you just have to set it up initially, see docs: > https://jena.apache.org/documentation/query/text-query.html > > On 08.12.21 11:47, Matt Whitby wrote: > > Jena has a text index? > > > > On Wed, 8 Dec 2021 at 10:07, Lorenz Buehmann < > > [email protected]> wrote: > > > >> Even if it's not the strings leading to performance issues, using the > >> Jena text index might be definitely more efficient > >> > >> On 08.12.21 10:38, Matt Whitby wrote: > >>> Fuseki. No inference. TDB2. > >>> > >>> M > >>> > >>> On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <[email protected]> wrote: > >>> > >>>> Lots of questions! Details matter!! > >>>> > >>>> On 08/12/2021 09:05, Matt Whitby wrote: > >>>>> It's hosted in a container in Azure. > >>>> (Jena storage layer) > >>>> > >>>> Using TDB1? TDB2? > >>>> > >>>>> I test it via Postman (though we're writing a RESTFul API to sit on > >> top). > >>>> So this is Fuseki? Is there any inference being used? > >>>> > >>>> Andy > >>>> > >>>>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <[email protected]> wrote: > >>>>> > >>>>>> Hi Matt, > >>>>>> > >>>>>> That query does not look couple-of-minutes expensive. > >>>>>> > >>>>>> Could you run it removing parts to see what happens? e.g. Remove one > >>>>>> OPTIONAL and it's associated part of the filter. > >>>>>> > >>>>>> Which storage layer are you using? > >>>>>> > >>>>>> Andy > >>>>>> > >>>>>> On 07/12/2021 20:18, [email protected] wrote: > >>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <[email protected]> > >>>> wrote: > >>>>>>> I dare say running an lcase against each field doesn't help > matters, > >>>> but > >>>>>> with > >>>>>>> no other way of doing a case-insensitive search (well, Regex - but > >> who > >>>>>> likes > >>>>>>> Regex?) I'm not sure. > >>>>>>> > >>>>>>> > >>>>>>> On this point alone, if it does turn out that string processing is > >> what > >>>>>> is > >>>>>>> costing you time, you might adjust your data to include a > convenience > >>>>>>> property with county, district, and parish in lowercase. Then you > >> could > >>>>>> do > >>>>>>> a more direct (and cheaper) match. > >>>>>>> > >>>>>>> That having been said, it seems unlikely to me that timed-out > queries > >>>> are > >>>>>>> due to something as cheap as lowercasing. Have you tried peeling > off > >>>> some > >>>>>>> of those OPTIONALs to see how much they cost? > >>>>>>> > >>>>>>> Adam > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <[email protected]> > >>>> wrote: > >>>>>>>> I have a Sparql question if that's okay. > >>>>>>>> > >>>>>>>> There are only around 8m triples in our test data, so pretty > small. > >>>>>>>> > >>>>>>>> The query takes a good couple of minutes to run (and sometimes > just > >>>>>> times > >>>>>>>> out). > >>>>>>>> > >>>>>>>> I dare say running an lcase against each field doesn't help > matters, > >>>> but > >>>>>>>> with no other way of doing a case-insensitive search (well, Regex > - > >>>> but > >>>>>> who > >>>>>>>> likes Regex?) I'm not sure. > >>>>>>>> > >>>>>>>> Any obvious ways to make it less bad? > >>>>>>>> > >>>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > >>>>>>>> select ?s ?name > >>>>>>>> where { > >>>>>>>> > >>>>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name > > > >>>>>> ?name . > >>>>>>>> OPTIONAL {?s < > http://www.historicengland.org.uk/data/schema/county> > >>>>>>>> ?county}. > >>>>>>>> OPTIONAL {?s < > >> http://www.historicengland.org.uk/data/schema/district/ > >>>>>>>> ?district}. > >>>>>>>> OPTIONAL {?s < > http://www.historicengland.org.uk/data/schema/parish> > >>>>>>>> ?parish}. > >>>>>>>> > >>>>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS( > >>>>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes")) > >>>>>>>> > >>>>>>>> } > >>>>>>>> limit 10 > >>>>>>>> > > >
