Hi All, I have been trying to understand how virtuoso's crawler content import and sponging features work. I'm currently evaluating virtuoso using 07.20.3214 VOS.
I set up three crawl jobs for three different HTML/RDFa files and received no errors. When I attempt to use the sparql interface to query the data it doesn't show up: For example, http://w3id.org/xapi/adb/verbs/ is the target URL of a crawl job I set up in conductor under content imports. I am using the xhtml/HTM5 variants cartridge with the following options: fallback-mode=no rdfa=yes reify_html5md=0 reify_rdfa=1 reify_jsonld=0 reify_all_grddl=0 reify_html=0 passthrough_mode=yes loose=yes reify_html_misc=no reify_turtle=no If I go to http://54.152.125.100:8890/sparql and use the following sparql query it returns no results: #Query all Verb IRIs PREFIX xapi: <https://w3id.org/xapi/ontology#> SELECT DISTINCT ?Verb WHERE { ?Verb a xapi:Verb . } However, the data does start to show up in this query if I subsequently add http://w3id.org/xapi/adb/verbs/ as the default data set name / graph IRI in the sparql interface and also select the sponging option to download all RDF resources. Is this sponging option from the sparql interface actually adding/download the triples? Wouldn't this allow anyone to add triples that has access to the sparql interface? The faceted search interface seems to indicate so as I did this with the following graph IRI, http://adlnet.gov/expapi/verbs http://54.152.125.100:8890/describe/?url=http%3A%2F%2Fadlnet.gov%2Fexpapi%2Fverbs&sid=4 I tried to set up this IRI as a crawl job and it never populated virtuoso's data store. But as soon as I add it as a graph IRI using the sparql interface and sponging it shows up. Is this the expected behavior / by design for this sparql sponging option? I thought graphs and triples could only be added with special SPARQL permissions and using INSERT. I still don't think the crawler feature is working for HTML/RDFa. It appears to be processing and storing the HTML file in the repository/locally in virtuoso, but it doesn't seem to actually add the graph or triples to the database. Thanks in advance for your patience and help! J Haag ------------------------------------------------------- On Wed, Oct 28, 2015 at 5:17 AM, Tim Haynes <thay...@openlinksw.com> wrote: > > On 27 October 2015 at 20:49, Haag, Jason <jhaa...@gmail.com> wrote: > >> I think I know the answer to my last two questions. I had additional html >> files below the /verbs/ directory. I believe that is where the duplicates >> came from. I'm guessing sponger also looks for any html files at the >> specified path, not just the "index.html" file that was specified as a >> target URL. Can anyone verify this? > > > Hi, > > It's unlikely - I don't know of anything in the Sponger that implements > directory browsing, but it may well be following e.g. <link > rel="alternate" href="...." /> to RSS/Atom feeds, etc. > > As Kingsley says, Faceted Browser will show you what graphs the triples > appear in. > > When a page is sponged, its URL becomes 1:1 the graph IRI in which data > from/about/in that resource is stored. Multiple graphs implies multiple > sponging events. > > HTH, > > ~Tim > -- > Tim Haynes > Product Development Consultant > OpenLink Software > <http://www.openlinksw.com/> > <http://twitter.com/openlink> >
------------------------------------------------------------------------------
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users