OK, I did further investigations:
I had a backup of the virtuoso db at the point just after loading the DBpedia
dumps (en & de) and installing the DBpedia and rdf_mappers packages.
I replayed this backup and executed all 3 queries:
On Thursday 19 August 2010, Jörn Hees wrote:
> On Thursday 19 August 2010, Hugh Williams wrote:
> > SPARQL SELECT ?g count(*) WHERE { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g
> > ORDER BY DESC 2;
> >
> > SPARQL SELECT DISTINCT ?g WHERE { GRAPH ?g {?s ?p ?o.} };
> >
> > select * from SPARQL_SELECT_KNOWN_GRAPHS_T;
All resulting in the same 26 graphs.
I then went to the conductor / RDF / Schemas tab and imported
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf .
Then went to the Graphs tab, deleted the http://www.w3.org/2004/02/skos/core
graph (worked fine), and renamed
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf to
http://www.w3.org/2004/02/skos/core .
After this the first of the queries results in 27 graphs, both others in 26.
(In the first one the
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf still exists.)
I went back to the Schemas tab and deleted the
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf Schema. Still the
same behavior, so 27 vs. 26 graphs.
When I now try to delete the http://www.w3.org/2004/02/skos/core from the
Graphs tab the first query returns 26 graphs, the second and third 25. The
http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf is still there.
I then tried to *manually delete* the persisting graph:
sparql clear graph
<http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf>;
The result is even stranger:
If I do
sparql select count(*) from
<http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf>
where {?s ?p ?o};
I get 0 rows.
BUT: my first sparql query still tells me that there are 1196 triples in that
graph!
Also: if I execute this sql query:
select distinct id_to_iri(g) from rdf_quad;
or this one:
select distinct id_to_iri(g), count(*) from rdf_quad group by g;
The *graph still exists*.
A control query:
sparql select count(*) from <http://dbpedia.org> where {?s ?p ?o} ;
tells me that there are 258867871 rows in my DBpedia dump.
As a side-question: why do all those queries take so long? Isn't there a
primary key index on the rdf_quad table for g,s,p,o which they should be able
to use and return with the count in a split second?
If you want I can provide you with a detailed description how I imported the
DBpedia dumps (en & de) and could even give you the backup (11 GB gzipped).
Jörn