I think I know the answer to my last two questions. I had additional html
files below the /verbs/ directory. I believe that is where the duplicates
came from. I'm guessing sponger also looks for any html files at the
specified path, not just the "index.html" file that was specified as a
target URL. Can anyone verify this?

-------------------------------------------------------
+1.850.266.7100(office)
+1.850.471.1300 (mobile)
jhaag75 (skype)
http://jasonhaag.com (Web)
http://twitter.com/mobilejson (Twitter)
http://linkedin.com/in/jasonhaag (LinkedIn)


On Tue, Oct 27, 2015 at 2:44 PM, Haag, Jason <jhaa...@gmail.com> wrote:

> Thanks again for all of the help Kingsley and Hugh.
>
> The sponger appears to be working. Initially, I ran into some errors:
>
> XM003: XML parser detected an error: ERROR : Tag nesting error: name 'div'
> of end tag does not match the name 'br' of start tag at line 125 column 8
> at line 479 column 8 of source text </div>
>
> XM003: XML parser detected an error: ERROR : Tag nesting error: name
> 'body' of end tag does not match the name 'br' of start tag at line 90
> column 6 at line 489 column 9 of source text </body> -------^
>
> It appears as though the sponger is looking for strict xHTML. I updated my
> HTML to remove all instances of <meta>, <link> and <br> elements and it
> finally imported without any errors. None of these elements require end
> tags, but for some reason it was not sponging/importing unless I moved
> them.
>
> At first the RDF data did not seem to be there. But when I specified a
> graph IRI (e.g., http://xapi.vocab.pub/datasets/adl/verbs/index.html) in
> my SPARQL queries using the SPARQL endpoint in virtuoso the RDF data began
> showing up. If I first didn't specify a graph IRI in SPARQL nothing was
> returned. Does RDF data also get added to virtuoso through SPARQL graph
> queries? I thought this seemed odd. Perhaps there is a length of time to
> wait for the sponging process to end?
>
> I'm noticing some duplicates and that it is sponging more than just
> HTML/RDFa, but also the turtle and RDF/XML files on the server.
>
> Is there configuration setting for the sponger to only look for HTML/RDFa?
> Otherwise, it appears to be adding duplicates. Thanks again.
>
>
>
> -------------------------------------------------------
> +1.850.266.7100(office)
> +1.850.471.1300 (mobile)
> jhaag75 (skype)
> http://jasonhaag.com (Web)
> http://twitter.com/mobilejson (Twitter)
> http://linkedin.com/in/jasonhaag (LinkedIn)
>
>
> On Tue, Oct 27, 2015 at 10:27 AM, Haag, Jason <jhaa...@gmail.com> wrote:
>
>> Thank you for the clarification Kingsley! I will try configuring xHTML
>> with the settings you suggested now:
>>
>> add-html-meta=yes
>> get-feeds=no
>> preview-length=512
>> fallback-mode=no
>> rdfa=yes
>> reify_html5md=0
>> reify_rdfa=0
>> reify_jsonld=0
>> reify_all_grddl=0
>> reify_html=0
>> passthrough_mode=yes
>> loose=yes
>> reify_html_misc=no
>> reify_turtle=no
>>
>>
>> -------------------------------------------------------
>> +1.850.266.7100(office)
>> +1.850.471.1300 (mobile)
>> jhaag75 (skype)
>> http://jasonhaag.com (Web)
>> http://twitter.com/mobilejson (Twitter)
>> http://linkedin.com/in/jasonhaag (LinkedIn)
>>
>>
>> On Tue, Oct 27, 2015 at 10:14 AM, Kingsley Idehen <kide...@openlinksw.com
>> > wrote:
>>
>>> On 10/27/15 10:02 AM, Haag, Jason wrote:
>>>
>>> Thank you Hugh for the follow up! This is what I have installed. It
>>> looks like the same version of the cartridges, but not conductor. Should
>>> that matter?
>>>
>>> name
>>> VARCHAR title
>>> VARCHAR version
>>> VARCHAR build_date
>>> VARCHAR install_date
>>> VARCHAR  cartridges  Linked Data Cartridges  1.99_git747  2015-06-18
>>> 08:12  2015-10-21 18:39  conductor  Virtuoso Conductor  1.00.8752  
>>> 2015-10-19
>>> 16:40  2015-10-20 15:10
>>>
>>>
>>>
>>>
>>>
>>> Here's a link to what I'm seeing under extractor cartridges:
>>> https://drive.google.com/file/d/0BxhK5TH2EsphX0ppc3VjUXExdmM/view?usp=sharing
>>>
>>>  I don't see HTML and variants listed, but do see xHTML and something
>>> called HTML Table.
>>>
>>>
>>>
>>> There's a naming issue (being addressed), hence the confusion. The XHTML
>>> Cartridge is what's now know as HTML (and variants).
>>>
>>> --
>>> Regards,
>>>
>>> Kingsley Idehen     
>>> Founder & CEO
>>> OpenLink Software
>>> Company Web: http://www.openlinksw.com
>>> Personal Weblog 1: http://kidehen.blogspot.com
>>> Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
>>> Twitter Profile: https://twitter.com/kidehen
>>> Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
>>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>>> Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
>>>
>>>
>>
>
------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to