I'm loading the Turtle Wikidata RDF complete dump, split into pieces and
loaded with 10 active readers.   About half the time the load fails with one
or more of these errors.  The errors are always near the beginning of the
load---in the first group of 10 files to be loaded and near the beginning of
the files (generally in the first couple of hundred lines in a file of size
well over 1 GB).  No errors occur for any files beyond the first ten.

I could provide the files, but they total to about 340GB.

It sure looks as if there is some sort of bug when loading RDF language-tagged
strings, where a race condition means that two threads are trying to load the
same language tag into DB.DBA.RDF_LANGUAGE.  This would explain why the
problem occurs only at the beginning of the load, when the language tags are
being added to DB.DBA.RDF_LANGUAGE, and not later.  It would also explain why
the errors are different between different runs.  (The only other explanation
would be hardware errors, but this doesn't seem to be viable.)

It seems to me that a quick patch for this problem would be to change the
insert into a soft insert, but I don't know where to make this change in the 
code.

peter




On 12/11/18 7:11 PM, Hugh Williams wrote:
> Hi Peter,
> 
> The triple value do indeed appear to be valid, but the problem could be
> somewhere else in the dataset file and not necessarily on the reported line or
> line before it.
> 
> Is it a public dataset you are loading and if so can you provide a copy for
> local testing ?
> 
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software
> Home Page: http://www.openlinksw.com
> Community Support: https://community.openlinksw.com
> Weblogs (Blogs):
> Company Blog: https://medium.com/openlink-software-blog
> Virtuoso Blog: https://medium.com/virtuoso-blog
> Data Access Drivers
> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
> 
> 
> 
> 
>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider <pfpschnei...@gmail.com
>> <mailto:pfpschnei...@gmail.com>> wrote:
>>
>> I'm loading a bunch of Turtle files and I'm getting the error
>>
>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
>> DB.DBA.RDF_LANGUAGE
>>
>> The line in question looks fine:
>>
>>   "Wikimedia template"@ki,
>>
>> The line before it may indicate the issue
>>
>>    "Wikimedia template"@kg,
>>
>> Nonetheless this should be valid RDF so there appears to be a bug in Virtuoso
>> here.
>>
>> Is there any workaround?
>>
>>
>> This is in Virtuoso 07.20.3230.
>>
>> peter
>>
>>
>> _______________________________________________
>> Virtuoso-users mailing list
>> Virtuoso-users@lists.sourceforge.net
>> <mailto:Virtuoso-users@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
> 


_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to