Following up on my original inquiry:

What is the best option for automating the import and update of
RDFa/HTML data on a regular basis into the virtuoso DB?

I'm able to use the crawler to import direct RDF/XML graph (.rdf)
URIs, but I receive the following error page when I use a HTML/RDFa
URI:

This page contains the following errors:

error on line 22 at column 8: Opening and ending tag mismatch: link
line 0 and head

Below is a rendering of the page up to the first error.


There are no further details. There are no HTML validation errors. The
options I checked during import include:

Semantic Web Crawling
Follow URLs outside target host
Accept RDF

When looking at the source of the HTML page, line 22 is where the
</head> ends. This is as far as the page rendered (to the
<title></title>). There are no errors with the HTML (I'm using HTML5),
but I'm curious if the issue might be if Virtuoso only works with
XHTML doctype declarations? Appreciate any ideas or experience you all
have to share on support for RDFa.




-------------------------------------------------------
+1.850.266.7100(office)
+1.850.471.1300 (mobile)
jhaag75 (skype)
http://jasonhaag.com (Web)
http://twitter.com/mobilejson (Twitter)
http://linkedin.com/in/jasonhaag (LinkedIn)



On Tue, Sep 29, 2015 at 9:57 AM, Haag, Jason <jhaa...@gmail.com> wrote:
> Following up on my original inquiry: I currently have several RDF
> datasets available on my server. Each data set has an RDF dump
> available as RDF/XML, JSON-LD, and Turtle. These dumps are generated
> automatically without virtuoso from an HTML page marked up using RDFa.
>
> What is the best option for automating the import of this data on a
> regular basis into the virtuoso DB? I would like to automatically
> import RDFa data ideally, but or even rdf/xml or turtle files would be
> fine too. I tried this with the attached settings, but the data
> doesn't appear in the database. What do I need to enable or change in
> my settings in order to automatically import RDF data? See attached
> screen captures. Thanks for any tips or advice!
>
>
>
> -------------------------------------------------------
> +1.850.266.7100(office)
> +1.850.471.1300 (mobile)
> jhaag75 (skype)
> http://jasonhaag.com (Web)
> http://twitter.com/mobilejson (Twitter)
> http://linkedin.com/in/jasonhaag (LinkedIn)
>
>
>
> On Mon, Sep 28, 2015 at 4:20 PM, Haag, Jason <jhaa...@gmail.com> wrote:
>> What would the steps/instructions be to set up an automatic import for
>> 7.2.1? The instructions and screens here don't match the new interface
>> and field options:
>> http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler
>>
>> For example, there is no longer a field for "Local WebDAV Identifier"
>> which was previously required.
>> -------------------------------------------------------
>> +1.850.266.7100(office)
>> +1.850.471.1300 (mobile)
>> jhaag75 (skype)
>> http://jasonhaag.com (Web)
>> http://twitter.com/mobilejson (Twitter)
>> http://linkedin.com/in/jasonhaag (LinkedIn)
>>
>>
>>
>> On Sat, Sep 26, 2015 at 5:39 PM, Paul Houle <ontolo...@gmail.com> wrote:
>>> I like the cloud solution of creating a new virtuoso system,  doing the
>>> load,  having plenty of time to test it,  then replacing the production
>>> instance with the new instance and retiring the production instance.
>>>
>>> The main advantage here is that there is no way a screw-up in the load
>>> procedure can trash the production system --  even if Virtuoso was entirely
>>> reliable,  as the data sources grow the rate of exceptional events (say you
>>> fill the disk) goes up.  The temporary server approach eliminates a lot of
>>> headaches and it is good cloud economics.  (if you run a server at AMZN for
>>> 1 hour a day to update,  the cost of your system only goes up by %4).
>>>
>>> I was having good luck with this approach until Virtuoso 7.2.0 came along
>>> and since then I've had problems similar in severity to what the N.I.H. was
>>> reporting,  it really looked like massive corruption of the data structures,
>>> 7.2.1 did not help.
>>>
>>> I don't know if these issues are fixed in the current TRUNK but if they are
>>> it would be nice to get an official release.
>>>
>>> On Fri, Sep 25, 2015 at 1:31 PM, Haag, Jason <jhaa...@gmail.com> wrote:
>>>>
>>>>
>>>> Hi Users,
>>>>
>>>> I'm trying to determine the best option for my situation for importing RDF
>>>> data into Virtuoso. Here's my situation:
>>>>
>>>> I currently have several RDF datasets available on my server. Each data
>>>> set has an RDF dump available as RDF/XML, JSON-LD, and Turtle. These dumps
>>>> are generated automatically without virtuoso from an HTML page marked up
>>>> using RDFa.
>>>>
>>>> What is the best option for automating the import of this data on a
>>>> regular basis into the virtuoso DB? The datasets may grow so it should not
>>>> just import the data once, but import on a regular basis, perhaps daily or
>>>> weekly.
>>>>
>>>> Based on what I've read in the documentation, this crawler option seems
>>>> like the most appropriate option for my situation:
>>>> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSetCrawlerJobsGuideDirectories
>>>>
>>>> Can anyone verify if this would be the best approach? Does anyone know if
>>>> the crawler supports RDFa/HTML or should it point to a specific directory
>>>> with only the RDF dump files?
>>>>
>>>> Thanks in advance!
>>>>
>>>> J Haag
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Virtuoso-users mailing list
>>>> Virtuoso-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>>>
>>>
>>>
>>>
>>> --
>>> Paul Houle
>>>
>>> Applying Schemas for Natural Language Processing, Distributed Systems,
>>> Classification and Text Mining and Data Lakes
>>>
>>> (607) 539 6254    paul.houle on Skype   ontolo...@gmail.com
>>>
>>> :BaseKB -- Query Freebase Data With SPARQL
>>> http://basekb.com/gold/
>>>
>>> Legal Entity Identifier Lookup
>>> https://legalentityidentifier.info/lei/lookup/
>>>
>>> Join our Data Lakes group on LinkedIn
>>> https://www.linkedin.com/grp/home?gid=8267275
>>>

------------------------------------------------------------------------------
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to