Re: Lucene Faceted search

2023-02-13 Thread Élie Roux
Dear David, I think extended Jena Lucene is a good idea, but I'm not exactly sure what you mean (partly because I'm not very familiar with Lucene's faceted search). Can you give an example? Best, -- Elie

Re: Git friendly RDF output?

2023-01-25 Thread Élie Roux
Dear François-Paul, I have developed the following library for that purpose and I'm using it in production: https://github.com/buda-base/jena-stable-turtle Blank nodes are always an issue in this exercise (which is partially why I've decided against their use as a policy). I hope this helps,

Re: GraphMem in 4.6

2022-10-18 Thread Élie Roux
Perhaps this is an instance of https://github.com/apache/jena/issues/1533 ? What triple reordering optimization are you using? Best, -- Elie

Re: "oh dear, already have a slot" exception

2022-03-17 Thread Élie Roux
> Because that is the source being copied from. > > Is it TIM or TDB1? Oh it's all in memory, I read a dataset in a trig file, extract a graph, modify it, create a new dataset, add the new modified graph and save the file in trig. Best -- Elie

Re: "oh dear, already have a slot" exception

2022-03-17 Thread Élie Roux
> addGraph is copying data from the m.getGraph into the new dataset. > > "createGeneral" is the version that does not copy, and makes a link to > the original. > > By the way : DatasetGraphFactory Thanks, that actually made the bug disappear! It's still quite unclear to me why it happened but

"oh dear, already have a slot" exception

2022-03-17 Thread Élie Roux
Dear all, I have some code that write hundreds of thousands of trig files (converting some XML data to RDF). I recently introduced a relatively minor change: some literals are now in a custom datatype, defined here:

Re: puzzling performance issue

2021-10-19 Thread Élie Roux
> As others pointed out semantically the evaluation of the two query forms > yields very different intermediate results. It's only the presence of the > post-processing CONSTRUCT stage that happens to strip out the > duplicates/unusable results. Any optimizations MUST preserve the overall >

Re: puzzling performance issue

2021-10-07 Thread Élie Roux
> Overall, it whether the WHERE answer is 16*26*2636 rows (all one BGP) or > 16+26+2636 rows (union). Yes, I understand better now, thanks! Do you think there might be some optimization at some point for that case? I suspect this is very common in SPARQL queries out there... Best, -- Elie

Re: puzzling performance issue

2021-10-07 Thread Élie Roux
> if you take this expression > > WHERE > { > { > bdr:MW23703_1183 ?instp ?insto . # 200ms alone > } union { > bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone > } union { > bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart > ?ancestorPart . # 200ms alone > } > } > >

Re: puzzling performance issue

2021-10-07 Thread Élie Roux
Thanks a lot for your very informative answer Richard, it's really helpful to know when writing queries! It seems this is a case where some optimizations might be implemented? (I'm afraid this isn't something I could contribute though, sorry) Best, -- Elie

Re: puzzling performance issue

2021-10-06 Thread Élie Roux
After long hours of anxiety, I discovered that using unions as in CONSTRUCT { bdr:MW23703_1183 ?instp ?insto . ?t ?tp ?to . ?ancestor :hasPart ?ancestorPart . } WHERE { { bdr:MW23703_1183 ?instp ?insto . # 200ms alone } union { bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to .

puzzling performance issue

2021-10-06 Thread Élie Roux
Dear all, I'm experiencing a performance issue that I can't understand... I'm using: - Jena 3.14.0 , Fuseki (I'm testing in the web interface) - TDB1 - none.opt - this configuration: https://github.com/buda-base/buda-base/blob/master/conf/fuseki/ttl.erb (with some variable substitutions) - the

Re: shortcut for querying dates fast?

2020-09-04 Thread Élie Roux
> select ?le ?la where { > ?le adm:logDate ?sdate . > FILTER(?sdate > "2021-08-20T00:00:00"^^xsd:dateTime) > BIND(?le AS ?la) > } In that trivial case yes, the longer query doesn't allow that: construct { ?eadm tmp:lastSync ?d . ?eadm tmp:dateCreated ?sdate . } WHERE {

Re: shortcut for querying dates fast?

2020-09-02 Thread Élie Roux
P.S.: Here's another aspect of the problem (although in a different aspect of the code). If I make the same query + filter on dates in unions, there seems to be no cache mechanism, as the following takes 3.5s instead of 1.7s: select ?le ?Nla where { { ?le adm:logDate ?sdate .

Re: shortcut for querying dates fast?

2020-09-02 Thread Élie Roux
> This is what was discovered before - the cost of scanning and filtering > isn't that high and why outlier cases may be measurable faster, the bulk > of queries will be marginally faster. There is always a lot of things > that can be done; it comes down to contributions and priorities. > > And

Re: shortcut for querying dates fast?

2020-08-30 Thread Élie Roux
> > Checking whether there is one first. > > Ok, I'll do that Turns out there's already a 2011 issue about that: https://issues.apache.org/jira/browse/JENA-144 I'm wondering if opening another issue about a request for a new API function would be relevant? Something in the lines of what Andy

Re: shortcut for querying dates fast?

2020-08-29 Thread Élie Roux
> Checking whether there is one first. Ok, I'll do that > The reality is also that your case seems to be a bit unuusal. To be 3.5s > I'd guess you are hitting the POS (or quad equivalent) index cold. Or > something else is interacting with it (named graphs+union?) Yes, I noticed after my

Re: shortcut for querying dates fast?

2020-08-29 Thread Élie Roux
Hi all, would opening an issue on JIRA be the right thing to do? Best, -- Elie

Re: shortcut for querying dates fast?

2020-08-28 Thread Élie Roux
> (in memory or TDB?) TDB1 > > - are there other ways to write this query to make it more performant? > Not in ARQ ubnelss there are less adm:logEntry triples. No > The access to data triples is (S,P,O) where any of S/P/O can be ANY. > So you have (ANY, adm:logDate, ANY) > > Ideally, that

Re: shortcut for querying dates fast?

2020-08-28 Thread Élie Roux
> I'm wondering whether or not using xsd:long instead of xsd:dataTime with > timestamps mapped to milliseconds in numerical form would not perform > better. well, I think it's convenient to have date times represented as xsd:dateTime in RDF... now yes, ideally they could be mapped to long

shortcut for querying dates fast?

2020-08-27 Thread Élie Roux
Dear all, I have a dataset with (among other things) about 400,000 triples in the form ?a adm:logDate ?d where ?d is an xsd:dateTime. I'm writing a query to get all the triples that have a ?d in a certain interval. There are usually very few of them (around say 200). I'm writing a query that

Re: super slow filter

2020-01-22 Thread Élie Roux
> Try changing the query to put in a no-op that stops expansion. > > SELECT * { > :G844 ?rel ?res . > ?res a :Place . > ?res ?resp ?reso . > BIND(1 AS ?X) > FILTER (?resp = skos:altLabel || ?resp = skos:prefLabel || ?resp = > skos:placeEvent || ?resp = bdo:placeLat ||

Re: super slow filter

2020-01-22 Thread Élie Roux
> * documenting that fixed.opt is the default when there is no file > * documenting that --tdb should be preferred over --loc in most cases > in tdbquery > > These you can do yourselves, find the relevant part of the website and hit > the Improve this Page button at the top and

Re: super slow filter

2020-01-22 Thread Élie Roux
Thanks a lot, after some investigation, here are a few results: - the problem was that I had no .opt file and that the default behavior was fixed.opt (or so it seems), when adding a none.opt (or a stats.opt) the performance went from 1200 to 250ms (with the version with the big filter version) -

Re: tdbquery on UnionGraph

2020-01-22 Thread Élie Roux
> You should use > > --tdb=PATH_TO_ASSEMBLER_FILE > > instead of --loc if you have such a file with more complex settings than > just a location. Ah excellent, thanks! The two being mutually excluse I had wrongly guessed they were equivalent. Best, -- Elie

tdbquery on UnionGraph

2020-01-22 Thread Élie Roux
Dear all, I'm trying to run tdbquery on my triple store which has a UnionGraph (assembler attached), but it seems that even the most simple query doesn't work: SELECT ?subject ?predicate ?object WHERE { ?subject ?predicate ?object } LIMIT 25 Looking at the results, it seems that tdbquery is

Re: super slow filter

2020-01-22 Thread Élie Roux
> Which describes the various options for optimization. I have seen the various options. I have not seen what the default one is when there is no .opt file in the directory (which is my case and seems to be the initial state). Or did I miss it? Best, -- Elie

Re: super slow filter

2020-01-22 Thread Élie Roux
Thanks for your answers! I'm trying to understand why tdbquery doesn't return any result but in the meantime: > Apart from using stats.opt, with the option to manually tune the rules, > you have the option to use none.opt to stop any reordering of triple > patterns in bgps. That allows you to

Re: super slow filter

2020-01-21 Thread Élie Roux
I'm starting to see what's going on, it seems that the optimization (according to sparql.org) gives 13 (disjunction 14 (assign ((?resp skos:altLabel)) 15 (bgp 16 (triple bdr:G844 ?rel ?res) 17 (triple ?res rdf:type :Place) 18 (triple ?res

Re: super slow filter

2020-01-21 Thread Élie Roux
> Have you tried using VALUES instead of FILTER ? I have to say I was expecting it to give different results (in terms of output) but you're right: construct { ?res ?resp ?reso . } where { { bdr:G844 ?rel ?res . ?res a bdo:Place . VALUES ?res { skos:altLabel skos:prefLabel

Re: super slow filter

2020-01-21 Thread Élie Roux
> What’s “relatively large”? 100 ms doesn’t sound that bad. That depends on how the system is used... In my initial query I have a union of 6 subqueries that look a bit like that. This brings the query time to 6s. If I run a few queries every day that doesn't matter but our system is used in

Re: super slow filter

2020-01-21 Thread Élie Roux
P.S.: some triples were missing for the dataset to work, here's an updated version. I've noted that the performance is directly proportional to the number of tests in the FILTER, it's about 100ms by comparison... that seems a little excessive... Best, -- Elie @prefix :

super slow filter

2020-01-21 Thread Élie Roux
Dear all, I have a (relatively large) dataset in Fuseki (default optimization settings) which I attached the relevant triples. The following query takes around 160ms (according to Fuseki logs): construct { ?res ?resp ?reso . } where { { bdr:G844 ?rel ?res . ?res a bdo:Place .

Re: JENA-1620 and Query timeout overrides

2019-08-30 Thread Élie Roux
> > I see, thanks for the answer! There seem be some way to have a timeout > > in the Lucene API, but it doesn't look very straightforward... should > > I open a separate issue to track that? > > Yes Ok, I'll do that > Are you able to contribute for this additional feature? Perhaps in the

Re: JENA-1620 and Query timeout overrides

2019-08-29 Thread Élie Roux
Dear All, We actually have a very similar plan: - an endpoint for general queries, with a shortish timeout - an endpoint for admin queries, with a much longer one, using the same dataset The first thing I wanted to know is if and how Fuseki can be asked to completely stop a query thread (the

Re: bottom-up semantics

2019-02-01 Thread Élie Roux
Hello, Thanks for your answer > (conditional > (bgp > (triple ?loc :locatedInWork ex:Work1) > (triple ?loc :startVolume ?startvol) > ) > (bgp (triple ?loc :endVolume ?endvol))) Am I right in understanding that in that case ?loc

bottom-up semantics

2019-02-01 Thread Élie Roux
Dear Jean users, In short, I'm wondering if there could be an option somewhere for a top-down SPARQL evaluation mechanism. Long version: the dataset I'm dealing with contains data in the following form: ex:Loc1 a :Location ; :locatedInWork ex:Work1 ; :startPage 123 ;

Re: InsertPic_(12-07(12-07-21-26-31)

2018-12-07 Thread Élie Roux
Hello, I'm curious if slightly changing the SPARQL query can have an impact on performance, maybe you could try something like: where { ?object MKG:English_name 'Pyrilamine' . ?RelAttr owl:annotatedTarget ?object ; owl:annotatedSource ?subject ;

reading a JSON string from sparql results

2018-03-26 Thread Élie Roux
Dear All, I'm trying to transform a String in the format of a SPARQL Select JSON Result into a org.apache.jena.query.ResultSet, but it proves much more difficult than I anticipated. What I'm trying to achieve is really starting with a json string, I can't use a QueryExecution.execSelect(). I

Re: Speed issue while processing query resultSets on various ontology models

2018-03-14 Thread Élie Roux
In the case of inference then yes there is also an upfront cost of computing the inferences. Once computed these are typically cached (though this depends on the rule set) and any changes to the data might invalidate that cache. You can call prepare() on the InfModel to incur the initial

Re: more predictable Turtle output

2017-08-24 Thread Élie Roux
Le 02/08/2017 à 18:39, Élie Roux a écrit : Subclassing ShellGraph and overriding the methods like writePredicateObjectList would be my approach. Too much is private to really subclass - you'll need to copy the class at them moment. Along with registration, then at least you can have both

tutle output for lists of blank nodes

2017-08-02 Thread Élie Roux
I'm currently looking more closely at Jena's ttl output and I see things like: bdr:G844 a :Place ; :placeEvent [ a :PlaceFounded ; :onOrAbout "13uu/17uu" ] ; :placeEvent [ a

Re: more predictable Turtle output

2017-08-02 Thread Élie Roux
Subclassing ShellGraph and overriding the methods like writePredicateObjectList would be my approach. Too much is private to really subclass - you'll need to copy the class at them moment. Along with registration, then at least you can have both in the same JVM and it helps testing. Thank you

Re: more predictable Turtle output

2017-08-02 Thread Élie Roux
Le 02/08/2017 à 14:13, Jean-Marc Vanel a écrit : Élie, I would use N-Triples format, sorted in alphanumerical order. Thank you very much for your answer! I thought about this approach but I see two problems: - NTRIPLE is hardly readable and I would prefer having my data stored as TURTLE for

more predictable Turtle output

2017-08-02 Thread Élie Roux
Hello, I'm currently trying to solve a problem I have in Turtle: I would like my output to stay stable, so that it can live on a git without generating too much diff noise every time the data is regenerated. One example would be something like: bdr:G844 a :Place ;

Re: persistent inference on named graphs in Fuseki

2017-04-02 Thread Élie Roux
Hello, > Jena's inference is purely in memory so running over a TDB store is > possible but doesn't give you any scalability and is slower than > running over an in-memory copy of the same data. Plus, as you already > know, it's not named-graphs-aware. Thank you for your clarifying answer! I

persistent inference on named graphs in Fuseki

2017-03-31 Thread Élie Roux
Hello, I am currently setting up a Fuseki server with the following specs in mind: - everything persistent in TDB (it works) - many different named graph, with the default graph being the union of them (it works, but without inference) - very simple inferencing (works but not with named

persistent inference on named graphs in Fuseki

2017-03-30 Thread Élie Roux
Hello, I am currently setting up a Fuseki server with the following specs in mind: - everything persistent in TDB (it works) - many different named graph, with the default graph being the union of them (it works, but without inference) - very simple inferencing (works but not with named