You can try forcing the scope of the filter to be like your second query
then do the optional part:
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?mID ?e ?nf ?desc ?wikipedia_url
where
{
{
?mID fb:type.object.type fb:people.person .
?mID fb:type.object.name ?e .
?mID fb:common.topic.notable_for ?notab_for .
?notab_for fb:common.notable_for.display_name ?nf .
?mID fb:common.topic.description ?desc .
FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf),
"en") && langMatches(lang(?desc), "en"))
}
OPTIONAL
{
?mID fb:common.topic.topic_equivalent_webpage ?wikipedia_url .
FILTER (regex (str(?wikipedia_url), "en.wikipedia", "i") &&
!regex (str(?wikipedia_url), "curid=", "i")) .
}
}
which may be more like the 3h query.
There are improvements in progress for this, but they haven't reached
TDB yet.
The hardware you are running will be a big factor.
Andy
On 07/03/14 11:22, Paton, Diego wrote:
Hi,
I am working with the Freebase ontology stored in Apache JENA TDB and executing
queries using Fuseki.
What I want to retrieve is the mID, entity name, description and optionally the
wikipedia url if present ( I expect to obtain more than 6M of results ). The
problem is the query takes more than 24h to run.
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?mID ?e ?nf ?desc ?wikipedia_url
where
{
{
?mID fb:type.object.type fb:people.person .
?mID fb:type.object.name ?e .
?mID fb:common.topic.notable_for ?notab_for .
?notab_for fb:common.notable_for.display_name ?nf .
?mID fb:common.topic.description ?desc .
OPTIONAL
{
?mID fb:common.topic.topic_equivalent_webpage ?wikipedia_url .
FILTER (regex (str(?wikipedia_url), "en.wikipedia", "i") && !regex (str(?wikipedia_url),
"curid=", "i")) .
}
FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en") &&
langMatches(lang(?desc), "en"))
}
}
ORDER BY ?mID
This modified query below with optional attribute removed takes 3h.
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?mID ?e ?nf ?desc
where
{
{
?mID fb:type.object.type fb:people.person .
?mID fb:type.object.name ?e .
?mID fb:common.topic.notable_for ?notab_for .
?notab_for fb:common.notable_for.display_name ?nf .
?mID_raw fb:common.topic.description ?desc .
FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en") &&
langMatches(lang(?desc), "en"))
}
BIND(REPLACE(str(?mID_raw), "http://rdf.freebase.com/ns/", "") as ?mID)
}
ORDER BY ?mID
And the modified query below with filter removed in the optional clause takes
more than 20h ( still running )
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?mID ?e ?nf ?desc ?wikipedia_url
where
{
{
?mID fb:type.object.type fb:people.person .
?mID fb:type.object.name ?e .
?mID fb:common.topic.notable_for ?notab_for .
?notab_for fb:common.notable_for.display_name ?nf .
?mID fb:common.topic.description ?desc .
OPTIONAL
{
?mID fb:common.topic.topic_equivalent_webpage ?wikipedia_url .
}
FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en") &&
langMatches(lang(?desc), "en"))
}
}
ORDER BY ?mID
Do you have some ideas about how to improve the performance of the first query
that is the one meets my requirements?
Regards,
Diego.