Dear David,
I think extended Jena Lucene is a good idea, but I'm not exactly sure
what you mean (partly because I'm not very familiar with Lucene's
faceted search). Can you give an example?
Best,
--
Elie
Dear François-Paul,
I have developed the following library for that purpose and I'm using
it in production:
https://github.com/buda-base/jena-stable-turtle
Blank nodes are always an issue in this exercise (which is partially
why I've decided against their use as a policy).
I hope this helps,
Perhaps this is an instance of
https://github.com/apache/jena/issues/1533 ? What triple reordering
optimization are you using?
Best,
--
Elie
> Because that is the source being copied from.
>
> Is it TIM or TDB1?
Oh it's all in memory, I read a dataset in a trig file, extract a
graph, modify it, create a new dataset, add the new modified graph and
save the file in trig.
Best
--
Elie
> addGraph is copying data from the m.getGraph into the new dataset.
>
> "createGeneral" is the version that does not copy, and makes a link to
> the original.
>
> By the way : DatasetGraphFactory
Thanks, that actually made the bug disappear! It's still quite unclear
to me why it happened but
Dear all,
I have some code that write hundreds of thousands of trig files
(converting some XML data to RDF). I recently introduced a relatively
minor change: some literals are now in a custom datatype, defined
here:
> As others pointed out semantically the evaluation of the two query forms
> yields very different intermediate results. It's only the presence of the
> post-processing CONSTRUCT stage that happens to strip out the
> duplicates/unusable results. Any optimizations MUST preserve the overall
>
> Overall, it whether the WHERE answer is 16*26*2636 rows (all one BGP) or
> 16+26+2636 rows (union).
Yes, I understand better now, thanks!
Do you think there might be some optimization at some point for that
case? I suspect this is very common in SPARQL queries out there...
Best,
--
Elie
> if you take this expression
>
> WHERE
> {
> {
> bdr:MW23703_1183 ?instp ?insto . # 200ms alone
> } union {
> bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone
> } union {
> bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart
> ?ancestorPart . # 200ms alone
> }
> }
>
>
Thanks a lot for your very informative answer Richard, it's really
helpful to know when writing queries!
It seems this is a case where some optimizations might be implemented?
(I'm afraid this isn't something I could contribute though, sorry)
Best,
--
Elie
After long hours of anxiety, I discovered that using unions as in
CONSTRUCT
{
bdr:MW23703_1183 ?instp ?insto .
?t ?tp ?to .
?ancestor :hasPart ?ancestorPart .
}
WHERE
{
{
bdr:MW23703_1183 ?instp ?insto . # 200ms alone
} union {
bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to .
Dear all,
I'm experiencing a performance issue that I can't understand... I'm using:
- Jena 3.14.0 , Fuseki (I'm testing in the web interface)
- TDB1
- none.opt
- this configuration:
https://github.com/buda-base/buda-base/blob/master/conf/fuseki/ttl.erb
(with some variable substitutions)
- the
> select ?le ?la where {
> ?le adm:logDate ?sdate .
> FILTER(?sdate > "2021-08-20T00:00:00"^^xsd:dateTime)
> BIND(?le AS ?la)
> }
In that trivial case yes, the longer query doesn't allow that:
construct {
?eadm tmp:lastSync ?d .
?eadm tmp:dateCreated ?sdate .
}
WHERE
{
P.S.: Here's another aspect of the problem (although in a different
aspect of the code). If I make the same query + filter on dates in
unions, there seems to be no cache mechanism, as the following takes
3.5s instead of 1.7s:
select ?le ?Nla where {
{
?le adm:logDate ?sdate .
> This is what was discovered before - the cost of scanning and filtering
> isn't that high and why outlier cases may be measurable faster, the bulk
> of queries will be marginally faster. There is always a lot of things
> that can be done; it comes down to contributions and priorities.
>
> And
> > Checking whether there is one first.
>
> Ok, I'll do that
Turns out there's already a 2011 issue about that:
https://issues.apache.org/jira/browse/JENA-144
I'm wondering if opening another issue about a request for a new API
function would be relevant? Something in the lines of what Andy
> Checking whether there is one first.
Ok, I'll do that
> The reality is also that your case seems to be a bit unuusal. To be 3.5s
> I'd guess you are hitting the POS (or quad equivalent) index cold. Or
> something else is interacting with it (named graphs+union?)
Yes, I noticed after my
Hi all,
would opening an issue on JIRA be the right thing to do?
Best,
--
Elie
> (in memory or TDB?)
TDB1
> > - are there other ways to write this query to make it more performant?
> Not in ARQ ubnelss there are less adm:logEntry triples.
No
> The access to data triples is (S,P,O) where any of S/P/O can be ANY.
> So you have (ANY, adm:logDate, ANY)
>
> Ideally, that
> I'm wondering whether or not using xsd:long instead of xsd:dataTime with
> timestamps mapped to milliseconds in numerical form would not perform
> better.
well, I think it's convenient to have date times represented as
xsd:dateTime in RDF... now yes, ideally they could be mapped to long
Dear all,
I have a dataset with (among other things) about 400,000 triples in the form
?a adm:logDate ?d
where ?d is an xsd:dateTime. I'm writing a query to get all the
triples that have a ?d in a certain interval. There are usually very
few of them (around say 200). I'm writing a query that
> Try changing the query to put in a no-op that stops expansion.
>
> SELECT * {
> :G844 ?rel ?res .
> ?res a :Place .
> ?res ?resp ?reso .
> BIND(1 AS ?X)
> FILTER (?resp = skos:altLabel || ?resp = skos:prefLabel || ?resp =
> skos:placeEvent || ?resp = bdo:placeLat ||
> * documenting that fixed.opt is the default when there is no file
> * documenting that --tdb should be preferred over --loc in most cases
> in tdbquery
>
> These you can do yourselves, find the relevant part of the website and hit
> the Improve this Page button at the top and
Thanks a lot, after some investigation, here are a few results:
- the problem was that I had no .opt file and that the default
behavior was fixed.opt (or so it seems), when adding a none.opt (or a
stats.opt) the performance went from 1200 to 250ms (with the version
with the big filter version)
-
> You should use
>
> --tdb=PATH_TO_ASSEMBLER_FILE
>
> instead of --loc if you have such a file with more complex settings than
> just a location.
Ah excellent, thanks! The two being mutually excluse I had wrongly
guessed they were equivalent.
Best,
--
Elie
Dear all,
I'm trying to run tdbquery on my triple store which has a UnionGraph
(assembler attached), but it seems that even the most simple query
doesn't work:
SELECT ?subject ?predicate ?object
WHERE {
?subject ?predicate ?object
}
LIMIT 25
Looking at the results, it seems that tdbquery is
> Which describes the various options for optimization.
I have seen the various options. I have not seen what the default one
is when there is no .opt file in the directory (which is my case and
seems to be the initial state). Or did I miss it?
Best,
--
Elie
Thanks for your answers! I'm trying to understand why tdbquery doesn't
return any result but in the meantime:
> Apart from using stats.opt, with the option to manually tune the rules,
> you have the option to use none.opt to stop any reordering of triple
> patterns in bgps. That allows you to
I'm starting to see what's going on, it seems that the optimization
(according to sparql.org) gives
13 (disjunction
14 (assign ((?resp skos:altLabel))
15 (bgp
16 (triple bdr:G844 ?rel ?res)
17 (triple ?res rdf:type :Place)
18 (triple ?res
> Have you tried using VALUES instead of FILTER ?
I have to say I was expecting it to give different results (in terms
of output) but you're right:
construct {
?res ?resp ?reso .
}
where {
{
bdr:G844 ?rel ?res .
?res a bdo:Place .
VALUES ?res { skos:altLabel skos:prefLabel
> What’s “relatively large”? 100 ms doesn’t sound that bad.
That depends on how the system is used... In my initial query I have a
union of 6 subqueries that look a bit like that. This brings the query
time to 6s. If I run a few queries every day that doesn't matter but
our system is used in
P.S.: some triples were missing for the dataset to work, here's an
updated version. I've noted that the performance is directly
proportional to the number of tests in the FILTER, it's about 100ms by
comparison... that seems a little excessive...
Best,
--
Elie
@prefix :
Dear all,
I have a (relatively large) dataset in Fuseki (default optimization
settings) which I attached the relevant triples. The following query
takes around 160ms (according to Fuseki logs):
construct {
?res ?resp ?reso .
}
where {
{
bdr:G844 ?rel ?res .
?res a bdo:Place .
> > I see, thanks for the answer! There seem be some way to have a timeout
> > in the Lucene API, but it doesn't look very straightforward... should
> > I open a separate issue to track that?
>
> Yes
Ok, I'll do that
> Are you able to contribute for this additional feature?
Perhaps in the
Dear All,
We actually have a very similar plan:
- an endpoint for general queries, with a shortish timeout
- an endpoint for admin queries, with a much longer one, using the same dataset
The first thing I wanted to know is if and how Fuseki can be asked to
completely stop a query thread (the
Hello,
Thanks for your answer
> (conditional
> (bgp
> (triple ?loc :locatedInWork ex:Work1)
> (triple ?loc :startVolume ?startvol)
> )
> (bgp (triple ?loc :endVolume ?endvol)))
Am I right in understanding that in that case ?loc
Dear Jean users,
In short, I'm wondering if there could be an option somewhere for a
top-down SPARQL evaluation mechanism.
Long version: the dataset I'm dealing with contains data in the following form:
ex:Loc1 a :Location ;
:locatedInWork ex:Work1 ;
:startPage 123 ;
Hello,
I'm curious if slightly changing the SPARQL query can have an impact
on performance, maybe you could try something like:
where {
?object MKG:English_name 'Pyrilamine' .
?RelAttr owl:annotatedTarget ?object ;
owl:annotatedSource ?subject ;
Dear All,
I'm trying to transform a String in the format of a SPARQL Select JSON
Result into a org.apache.jena.query.ResultSet, but it proves much more
difficult than I anticipated. What I'm trying to achieve is really
starting with a json string, I can't use a QueryExecution.execSelect().
I
In the case of inference then yes there is also an upfront cost of
computing the inferences. Once computed these are typically cached
(though this depends on the rule set) and any changes to the data
might invalidate that cache. You can call prepare() on the InfModel
to incur the initial
Le 02/08/2017 à 18:39, Élie Roux a écrit :
Subclassing ShellGraph and overriding the methods like
writePredicateObjectList would be my approach. Too much is private
to really subclass - you'll need to copy the class at them moment.
Along with registration, then at least you can have both
I'm currently looking more closely at Jena's ttl output and I see things
like:
bdr:G844 a :Place ;
:placeEvent [ a :PlaceFounded ;
:onOrAbout "13uu/17uu"
] ;
:placeEvent [ a
Subclassing ShellGraph and overriding the methods like
writePredicateObjectList would be my approach. Too much is private to
really subclass - you'll need to copy the class at them moment.
Along with registration, then at least you can have both in the same
JVM and it helps testing.
Thank you
Le 02/08/2017 à 14:13, Jean-Marc Vanel a écrit :
Élie,
I would use N-Triples format, sorted in alphanumerical order.
Thank you very much for your answer! I thought about this approach but I
see two problems:
- NTRIPLE is hardly readable and I would prefer having my data stored as
TURTLE for
Hello,
I'm currently trying to solve a problem I have in Turtle: I would like
my output to stay stable, so that it can live on a git without
generating too much diff noise every time the data is regenerated. One
example would be something like:
bdr:G844 a :Place ;
Hello,
> Jena's inference is purely in memory so running over a TDB store is
> possible but doesn't give you any scalability and is slower than
> running over an in-memory copy of the same data. Plus, as you already
> know, it's not named-graphs-aware.
Thank you for your clarifying answer! I
Hello,
I am currently setting up a Fuseki server with the following specs in mind:
- everything persistent in TDB (it works)
- many different named graph, with the default graph being the union of
them (it works, but without inference)
- very simple inferencing (works but not with named
Hello,
I am currently setting up a Fuseki server with the following specs in mind:
- everything persistent in TDB (it works)
- many different named graph, with the default graph being the union of
them (it works, but without inference)
- very simple inferencing (works but not with named
48 matches
Mail list logo