I created some synthetic data that tickles the bug reliably on my machine with
a standard virtuoso.ini (just adding the directory for the files to the
allowed list).  I'm attaching the generator program for the files and a
loading script.

peter


On 12/18/18 9:46 AM, Peter F. Patel-Schneider wrote:
> I did a bit of digging and it sure looks as if there is a race condition in
> rdf_rl_lang_id in ttlpv.sql.   This code appears to check to see if the
> language tag is already in DB.DBA.RDF_LANGUAGE and adds it if not.  But
> another thread could do the same insert between the check and the insert, as
> far as I can tell.
> 
> It looks to me as if the right solution is to do a soft insert and a
> subsequent query instead of a hard insert.
> 
> However, I don't understand how locking works in SQL so there may be something
> that prevents another thread from interfering.
> 
> peter
> 
> 
> On 12/18/18 8:55 AM, Peter F. Patel-Schneider wrote:
>> I'm loading the Turtle Wikidata RDF complete dump, split into pieces and
>> loaded with 10 active readers.   About half the time the load fails with one
>> or more of these errors.  The errors are always near the beginning of the
>> load---in the first group of 10 files to be loaded and near the beginning of
>> the files (generally in the first couple of hundred lines in a file of size
>> well over 1 GB).  No errors occur for any files beyond the first ten.
>>
>> I could provide the files, but they total to about 340GB.
>>
>> It sure looks as if there is some sort of bug when loading RDF 
>> language-tagged
>> strings, where a race condition means that two threads are trying to load the
>> same language tag into DB.DBA.RDF_LANGUAGE.  This would explain why the
>> problem occurs only at the beginning of the load, when the language tags are
>> being added to DB.DBA.RDF_LANGUAGE, and not later.  It would also explain why
>> the errors are different between different runs.  (The only other explanation
>> would be hardware errors, but this doesn't seem to be viable.)
>>
>> It seems to me that a quick patch for this problem would be to change the
>> insert into a soft insert, but I don't know where to make this change in the 
>> code.
>>
>> peter
>>
>>
>>
>>
>> On 12/11/18 7:11 PM, Hugh Williams wrote:
>>> Hi Peter,
>>>
>>> The triple value do indeed appear to be valid, but the problem could be
>>> somewhere else in the dataset file and not necessarily on the reported line 
>>> or
>>> line before it.
>>>
>>> Is it a public dataset you are loading and if so can you provide a copy for
>>> local testing ?
>>>
>>> Best Regards
>>> Hugh Williams
>>> Professional Services
>>> OpenLink Software
>>> Home Page: http://www.openlinksw.com
>>> Community Support: https://community.openlinksw.com
>>> Weblogs (Blogs):
>>> Company Blog: https://medium.com/openlink-software-blog
>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>> Data Access Drivers
>>> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>>> Twitter  -- http://twitter.com/OpenLink
>>> Google+  -- http://plus.google.com/100570109519069333827/
>>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>>> Universal Data Access, Integration, and Management Technology Providers
>>>
>>>
>>>
>>>
>>>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider <pfpschnei...@gmail.com
>>>> <mailto:pfpschnei...@gmail.com>> wrote:
>>>>
>>>> I'm loading a bunch of Turtle files and I'm getting the error
>>>>
>>>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
>>>> DB.DBA.RDF_LANGUAGE
>>>>
>>>> The line in question looks fine:
>>>>
>>>>   "Wikimedia template"@ki,
>>>>
>>>> The line before it may indicate the issue
>>>>
>>>>    "Wikimedia template"@kg,
>>>>
>>>> Nonetheless this should be valid RDF so there appears to be a bug in 
>>>> Virtuoso
>>>> here.
>>>>
>>>> Is there any workaround?
>>>>
>>>>
>>>> This is in Virtuoso 07.20.3230.
>>>>
>>>> peter
>>>>
>>>>
>>>> _______________________________________________
>>>> Virtuoso-users mailing list
>>>> Virtuoso-users@lists.sourceforge.net
>>>> <mailto:Virtuoso-users@lists.sourceforge.net>
>>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>>
#!/usr/local/bin/python2.7                                                                                                                                                                                             

for x in range (0,20) :
    file = open('test{:0>2d}.ttl'.format(x),'w')

    for k in range(0,10) :

        file.write('<http://www.wikidata.org/entity/Q{:0>2d}{:0>2d}> <http://www.w3.org/2000/01/rdf-schema#label>\n'.format(x,k))
        for y in range (ord('a'),ord('z')+1) :
            for z in range (ord('a'),ord('z')+1) :
		file.write('    "description {:0>2d}{:0>3d}{:0>3d}"@l{:s}{:s},\n'.format(x,y,z,chr(y),chr(z)))
        file.write('  "JUNK".\n')
    file.close()

Attachment: test.sh
Description: application/shellscript

_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to