On 29/11/17 12:07, anuj kumar wrote:
Hi,done
So I am working on a performance issue with our Triple Store (which is
based on HBase)
To give a background, the query I am executing looks like:
SELECT ?s
WHERE {
?s a file:File .
?s ex:modified ?modified .
FILTER(?modified >="2017-11-05T00:00:00.00000"^^<http://
www.w3.org/2001/XMLSchema#dateTime>)
}
Looking at the ARQ Execution plan, it is like this:
It's an algebra expression - it may not may not have been through the
optimizer. In this case the high-level 9algebra) optimize doesn't do
much with this query.
This does not stop your system doing some more optimziation in its own
OpExecutor.
(slice 0 1000
Not in your query.
(project (?s)
(filter (>= ?modified "2017-1105T00:00:00.00000"^^<http://
www.w3.org/2001/XMLSchema#dateTime>)
(bgp
(triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
http://www.example.com/File#File>)
(triple ?s <http://www.example.com/common#modified> ?modified)
))))
AND I have around 45000 File Objects in my Triple Store.
As you can see from the above execution plan, I first get the Subject ID
for these 45000 File objects and then I fire a query per File Id to get the
odified date for the same. This clearly is not performant.
Not good for two reasons:
All the round triples to get the "ex:modified" when it should be server
side (OK - that means putting something in the Hbase machine)
And also, it could do a range scan:
(think of hat as a physical execution plan and the algebra as a logical
execution plan)
My Questions:
1. Is there a better way to create a SELECT query to have a good execution
plan.
Ideally, no but try this
SELECT ?s
WHERE {
?s ex:modified ?modified .
FILTER(?modified >="2017-11-05T00:00:00.00000"^^xsd;dateTime)
?s a file:File .
}
changing the BGP order and doing filter placement to get:
(project (?s)
(sequence
(filter (>= ?modified "2017-11-05T00:00:00.00000"^^xsd:dateTime)
(bgp (triple ?s ex:modified ?modified)))
(bgp (triple ?s rdf:type :File>))))
then in your code do:
(filter (>= ?modified "2017-11-05T00:00:00.00000"^^xsd:dateTime)
(bgp (triple ?s ex:modified ?modified)))
all in HBase (its a single range scan)
Subclass OpExecutor and implement OpFilter to spot such cases.
2. If not, then can I somehow change the generation of execution plan?
3. Is it advisable to re-write the ARQ Execution Plan to suite our need and
how complicated this might be.
How sophisticated do you want it to be?!
It's an open ended question - more work, better optimization!
Thanks and please let me know if you need more information.
Thanks,
Anuj Kumar