On 10/27/15 4:49 PM, Haag, Jason wrote:
> I think I know the answer to my last two questions. I had additional
> html files below the /verbs/ directory. I believe that is where the
> duplicates came from. I'm guessing sponger also looks for any html
> files at the specified path, not just the "index.html" file that was
> specified as a target URL. Can anyone verify this? 
>
> -------------------------------------------------------
> +1.850.266.7100(office)
> +1.850.471.1300 (mobile)
> jhaag75 (skype)
> http://jasonhaag.com <http://jasonhaag.com>(Web)
> http://twitter.com/mobilejson (Twitter)
> http://linkedin.com/in/jasonhaag (LinkedIn)

You need to ensure that you aren't dealing with the same triples across
multiple named graphs. Depending on your config the crawled and sponged
data should end up in a designated named graph.

If you install the Faceted Browser VAD you'll have a tool for looking-up
entity URIs and associated provenance (via Metadata tab) .

[1]
http://kidehen.blogspot.com/2015/03/experiencing-power-of-virtuoso-in-5.html 
-- Experience the power of Virtuoso in 5 simple steps .

Kingsley
>
>
> On Tue, Oct 27, 2015 at 2:44 PM, Haag, Jason <jhaa...@gmail.com
> <mailto:jhaa...@gmail.com>> wrote:
>
>     Thanks again for all of the help Kingsley and Hugh. 
>
>     The sponger appears to be working. Initially, I ran into some errors:
>
>     XM003: XML parser detected an error: ERROR : Tag nesting error:
>     name 'div' of end tag does not match the name 'br' of start tag at
>     line 125 column 8 at line 479 column 8 of source text </div>
>
>     XM003: XML parser detected an error: ERROR : Tag nesting error:
>     name 'body' of end tag does not match the name 'br' of start tag
>     at line 90 column 6 at line 489 column 9 of source text </body>
>     -------^
>
>     It appears as though the sponger is looking for strict xHTML. I
>     updated my HTML to remove all instances of <meta>, <link> and <br>
>     elements and it finally imported without any errors. None of these
>     elements require end tags, but for some reason it was not
>     sponging/importing unless I moved them. 
>
>     At first the RDF data did not seem to be there. But when I
>     specified a graph IRI (e.g.,
>     http://xapi.vocab.pub/datasets/adl/verbs/index.html) in my SPARQL
>     queries using the SPARQL endpoint in virtuoso the RDF data began
>     showing up. If I first didn't specify a graph IRI in SPARQL
>     nothing was returned. Does RDF data also get added to virtuoso
>     through SPARQL graph queries? I thought this seemed odd. Perhaps
>     there is a length of time to wait for the sponging process to end? 
>
>     I'm noticing some duplicates and that it is sponging more than
>     just HTML/RDFa, but also the turtle and RDF/XML files on the server.
>
>     Is there configuration setting for the sponger to only look for
>     HTML/RDFa? Otherwise, it appears to be adding duplicates. Thanks
>     again.
>
>
>
>     -------------------------------------------------------
>     +1.850.266.7100 <tel:%2B1.850.266.7100>(office)
>     +1.850.471.1300 <tel:%2B1.850.471.1300> (mobile)
>     jhaag75 (skype)
>     http://jasonhaag.com (Web)
>     http://twitter.com/mobilejson (Twitter)
>     http://linkedin.com/in/jasonhaag (LinkedIn)
>
>
>     On Tue, Oct 27, 2015 at 10:27 AM, Haag, Jason <jhaa...@gmail.com
>     <mailto:jhaa...@gmail.com>> wrote:
>
>         Thank you for the clarification Kingsley! I will try
>         configuring xHTML with the settings you suggested now:
>
>         add-html-meta=yes
>         get-feeds=no
>         preview-length=512
>         fallback-mode=no
>         rdfa=yes
>         reify_html5md=0
>         reify_rdfa=0
>         reify_jsonld=0
>         reify_all_grddl=0
>         reify_html=0
>         passthrough_mode=yes
>         loose=yes
>         reify_html_misc=no
>         reify_turtle=no
>
>
>         -------------------------------------------------------
>         +1.850.266.7100 <tel:%2B1.850.266.7100>(office)
>         +1.850.471.1300 <tel:%2B1.850.471.1300> (mobile)
>         jhaag75 (skype)
>         http://jasonhaag.com <http://jasonhaag.com>(Web)
>         http://twitter.com/mobilejson (Twitter)
>         http://linkedin.com/in/jasonhaag (LinkedIn)
>
>
>         On Tue, Oct 27, 2015 at 10:14 AM, Kingsley Idehen
>         <kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:
>
>             On 10/27/15 10:02 AM, Haag, Jason wrote:
>>             Thank you Hugh for the follow up! This is what I have
>>             installed. It looks like the same version of the
>>             cartridges, but not conductor. Should that matter?
>>
>>             name
>>             VARCHAR  title
>>             VARCHAR  version
>>             VARCHAR  build_date
>>             VARCHAR  install_date
>>             VARCHAR
>>              cartridges       Linked Data Cartridges          1.99_git747
>>              2015-06-18 08:12         2015-10-21 18:39
>>              conductor        Virtuoso Conductor      1.00.8752       
>> 2015-10-19
>>             16:40     2015-10-20 15:10
>>
>>              
>>              
>>              
>>              
>>
>>
>>             Here's a link to what I'm seeing under extractor
>>             cartridges: 
>> https://drive.google.com/file/d/0BxhK5TH2EsphX0ppc3VjUXExdmM/view?usp=sharing
>>
>>              I don't see HTML and variants listed, but do see xHTML
>>             and something called HTML Table. 
>>
>
>
>             There's a naming issue (being addressed), hence the
>             confusion. The XHTML Cartridge is what's now know as HTML
>             (and variants).
>
>             -- 
>             Regards,
>
>             Kingsley Idehen         
>             Founder & CEO 
>             OpenLink Software     
>             Company Web: http://www.openlinksw.com
>             Personal Weblog 1: http://kidehen.blogspot.com
>             Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
>             <http://www.openlinksw.com/blog/%7Ekidehen>
>             Twitter Profile: https://twitter.com/kidehen
>             Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
>             LinkedIn Profile: http://www.linkedin.com/in/kidehen
>             Personal WebID: 
> http://kingsley.idehen.net/dataspace/person/kidehen#this
>
>
>
>
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


-- 
Regards,

Kingsley Idehen       
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to