I think I know the answer to my last two questions. I had additional html files below the /verbs/ directory. I believe that is where the duplicates came from. I'm guessing sponger also looks for any html files at the specified path, not just the "index.html" file that was specified as a target URL. Can anyone verify this?
------------------------------------------------------- +1.850.266.7100(office) +1.850.471.1300 (mobile) jhaag75 (skype) http://jasonhaag.com (Web) http://twitter.com/mobilejson (Twitter) http://linkedin.com/in/jasonhaag (LinkedIn) On Tue, Oct 27, 2015 at 2:44 PM, Haag, Jason <jhaa...@gmail.com> wrote: > Thanks again for all of the help Kingsley and Hugh. > > The sponger appears to be working. Initially, I ran into some errors: > > XM003: XML parser detected an error: ERROR : Tag nesting error: name 'div' > of end tag does not match the name 'br' of start tag at line 125 column 8 > at line 479 column 8 of source text </div> > > XM003: XML parser detected an error: ERROR : Tag nesting error: name > 'body' of end tag does not match the name 'br' of start tag at line 90 > column 6 at line 489 column 9 of source text </body> -------^ > > It appears as though the sponger is looking for strict xHTML. I updated my > HTML to remove all instances of <meta>, <link> and <br> elements and it > finally imported without any errors. None of these elements require end > tags, but for some reason it was not sponging/importing unless I moved > them. > > At first the RDF data did not seem to be there. But when I specified a > graph IRI (e.g., http://xapi.vocab.pub/datasets/adl/verbs/index.html) in > my SPARQL queries using the SPARQL endpoint in virtuoso the RDF data began > showing up. If I first didn't specify a graph IRI in SPARQL nothing was > returned. Does RDF data also get added to virtuoso through SPARQL graph > queries? I thought this seemed odd. Perhaps there is a length of time to > wait for the sponging process to end? > > I'm noticing some duplicates and that it is sponging more than just > HTML/RDFa, but also the turtle and RDF/XML files on the server. > > Is there configuration setting for the sponger to only look for HTML/RDFa? > Otherwise, it appears to be adding duplicates. Thanks again. > > > > ------------------------------------------------------- > +1.850.266.7100(office) > +1.850.471.1300 (mobile) > jhaag75 (skype) > http://jasonhaag.com (Web) > http://twitter.com/mobilejson (Twitter) > http://linkedin.com/in/jasonhaag (LinkedIn) > > > On Tue, Oct 27, 2015 at 10:27 AM, Haag, Jason <jhaa...@gmail.com> wrote: > >> Thank you for the clarification Kingsley! I will try configuring xHTML >> with the settings you suggested now: >> >> add-html-meta=yes >> get-feeds=no >> preview-length=512 >> fallback-mode=no >> rdfa=yes >> reify_html5md=0 >> reify_rdfa=0 >> reify_jsonld=0 >> reify_all_grddl=0 >> reify_html=0 >> passthrough_mode=yes >> loose=yes >> reify_html_misc=no >> reify_turtle=no >> >> >> ------------------------------------------------------- >> +1.850.266.7100(office) >> +1.850.471.1300 (mobile) >> jhaag75 (skype) >> http://jasonhaag.com (Web) >> http://twitter.com/mobilejson (Twitter) >> http://linkedin.com/in/jasonhaag (LinkedIn) >> >> >> On Tue, Oct 27, 2015 at 10:14 AM, Kingsley Idehen <kide...@openlinksw.com >> > wrote: >> >>> On 10/27/15 10:02 AM, Haag, Jason wrote: >>> >>> Thank you Hugh for the follow up! This is what I have installed. It >>> looks like the same version of the cartridges, but not conductor. Should >>> that matter? >>> >>> name >>> VARCHAR title >>> VARCHAR version >>> VARCHAR build_date >>> VARCHAR install_date >>> VARCHAR cartridges Linked Data Cartridges 1.99_git747 2015-06-18 >>> 08:12 2015-10-21 18:39 conductor Virtuoso Conductor 1.00.8752 >>> 2015-10-19 >>> 16:40 2015-10-20 15:10 >>> >>> >>> >>> >>> >>> Here's a link to what I'm seeing under extractor cartridges: >>> https://drive.google.com/file/d/0BxhK5TH2EsphX0ppc3VjUXExdmM/view?usp=sharing >>> >>> I don't see HTML and variants listed, but do see xHTML and something >>> called HTML Table. >>> >>> >>> >>> There's a naming issue (being addressed), hence the confusion. The XHTML >>> Cartridge is what's now know as HTML (and variants). >>> >>> -- >>> Regards, >>> >>> Kingsley Idehen >>> Founder & CEO >>> OpenLink Software >>> Company Web: http://www.openlinksw.com >>> Personal Weblog 1: http://kidehen.blogspot.com >>> Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen >>> Twitter Profile: https://twitter.com/kidehen >>> Google+ Profile: https://plus.google.com/+KingsleyIdehen/about >>> LinkedIn Profile: http://www.linkedin.com/in/kidehen >>> Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this >>> >>> >> >
------------------------------------------------------------------------------
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users