Hi All,

I have been trying to understand how virtuoso's crawler content import and
sponging features work. I'm currently evaluating virtuoso using 07.20.3214
VOS.

I set up three crawl jobs for three different HTML/RDFa files and received
no errors.

When I attempt to use the sparql interface to query the data it doesn't
show up:

For example, http://w3id.org/xapi/adb/verbs/ is the target URL of a crawl
job I set up in conductor under content imports. I am using the xhtml/HTM5
variants cartridge with the following options:

fallback-mode=no
rdfa=yes
reify_html5md=0
reify_rdfa=1
reify_jsonld=0
reify_all_grddl=0
reify_html=0
passthrough_mode=yes
loose=yes
reify_html_misc=no
reify_turtle=no

If I go to http://54.152.125.100:8890/sparql and use the following sparql
query it returns no results:

#Query all Verb IRIs
PREFIX xapi: <https://w3id.org/xapi/ontology#>

SELECT DISTINCT ?Verb

WHERE {
   ?Verb a xapi:Verb .

}


However, the data does start to show up in this query if I subsequently add
http://w3id.org/xapi/adb/verbs/ as the default data set name / graph IRI in
the sparql interface and also select the sponging option to download all
RDF resources.

Is this sponging option from the sparql interface actually adding/download
the triples? Wouldn't this allow anyone to add triples that has access to
the sparql interface? The faceted search interface seems to indicate so as
I did this with
the following graph IRI, http://adlnet.gov/expapi/verbs

http://54.152.125.100:8890/describe/?url=http%3A%2F%2Fadlnet.gov%2Fexpapi%2Fverbs&sid=4

I tried to set up this IRI as a crawl job and it never populated virtuoso's
data store. But as soon as I add it as a graph IRI using the sparql
interface and sponging it shows up. Is this the expected behavior / by
design for this sparql sponging option? I thought graphs and triples could
only be added with special SPARQL permissions and using INSERT.

I still don't think the crawler feature is working for HTML/RDFa. It
appears to be processing and storing the HTML file in the
repository/locally in virtuoso, but it doesn't seem to actually add the
graph or triples to the database.

Thanks in advance for your patience and help!

J Haag

-------------------------------------------------------



On Wed, Oct 28, 2015 at 5:17 AM, Tim Haynes <thay...@openlinksw.com> wrote:

>
> On 27 October 2015 at 20:49, Haag, Jason <jhaa...@gmail.com> wrote:
>
>> I think I know the answer to my last two questions. I had additional html
>> files below the /verbs/ directory. I believe that is where the duplicates
>> came from. I'm guessing sponger also looks for any html files at the
>> specified path, not just the "index.html" file that was specified as a
>> target URL. Can anyone verify this?
>
>
> Hi,
>
> It's unlikely - I don't know of anything in the Sponger that implements
> directory browsing, but it may well be following e.g. <link
> rel="alternate" href="...." /> to RSS/Atom feeds, etc.
>
> As Kingsley says, Faceted Browser will show you what graphs the triples
> appear in.
>
> When a page is sponged, its URL becomes 1:1 the graph IRI in which data
> from/about/in that resource is stored. Multiple graphs implies multiple
> sponging events.
>
> HTH,
>
> ~Tim
> --
> Tim Haynes
> Product Development Consultant
> OpenLink Software
> <http://www.openlinksw.com/>
> <http://twitter.com/openlink>
>
------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to