I did a bit of digging and it sure looks as if there is a race condition in rdf_rl_lang_id in ttlpv.sql. This code appears to check to see if the language tag is already in DB.DBA.RDF_LANGUAGE and adds it if not. But another thread could do the same insert between the check and the insert, as far as I can tell.
It looks to me as if the right solution is to do a soft insert and a subsequent query instead of a hard insert. However, I don't understand how locking works in SQL so there may be something that prevents another thread from interfering. peter On 12/18/18 8:55 AM, Peter F. Patel-Schneider wrote: > I'm loading the Turtle Wikidata RDF complete dump, split into pieces and > loaded with 10 active readers. About half the time the load fails with one > or more of these errors. The errors are always near the beginning of the > load---in the first group of 10 files to be loaded and near the beginning of > the files (generally in the first couple of hundred lines in a file of size > well over 1 GB). No errors occur for any files beyond the first ten. > > I could provide the files, but they total to about 340GB. > > It sure looks as if there is some sort of bug when loading RDF language-tagged > strings, where a race condition means that two threads are trying to load the > same language tag into DB.DBA.RDF_LANGUAGE. This would explain why the > problem occurs only at the beginning of the load, when the language tags are > being added to DB.DBA.RDF_LANGUAGE, and not later. It would also explain why > the errors are different between different runs. (The only other explanation > would be hardware errors, but this doesn't seem to be viable.) > > It seems to me that a quick patch for this problem would be to change the > insert into a soft insert, but I don't know where to make this change in the > code. > > peter > > > > > On 12/11/18 7:11 PM, Hugh Williams wrote: >> Hi Peter, >> >> The triple value do indeed appear to be valid, but the problem could be >> somewhere else in the dataset file and not necessarily on the reported line >> or >> line before it. >> >> Is it a public dataset you are loading and if so can you provide a copy for >> local testing ? >> >> Best Regards >> Hugh Williams >> Professional Services >> OpenLink Software >> Home Page: http://www.openlinksw.com >> Community Support: https://community.openlinksw.com >> Weblogs (Blogs): >> Company Blog: https://medium.com/openlink-software-blog >> Virtuoso Blog: https://medium.com/virtuoso-blog >> Data Access Drivers >> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers >> LinkedIn -- http://www.linkedin.com/company/openlink-software/ >> Twitter -- http://twitter.com/OpenLink >> Google+ -- http://plus.google.com/100570109519069333827/ >> Facebook -- http://www.facebook.com/OpenLinkSoftware >> Universal Data Access, Integration, and Management Technology Providers >> >> >> >> >>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider <pfpschnei...@gmail.com >>> <mailto:pfpschnei...@gmail.com>> wrote: >>> >>> I'm loading a bunch of Turtle files and I'm getting the error >>> >>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on >>> DB.DBA.RDF_LANGUAGE >>> >>> The line in question looks fine: >>> >>> "Wikimedia template"@ki, >>> >>> The line before it may indicate the issue >>> >>> "Wikimedia template"@kg, >>> >>> Nonetheless this should be valid RDF so there appears to be a bug in >>> Virtuoso >>> here. >>> >>> Is there any workaround? >>> >>> >>> This is in Virtuoso 07.20.3230. >>> >>> peter >>> >>> >>> _______________________________________________ >>> Virtuoso-users mailing list >>> Virtuoso-users@lists.sourceforge.net >>> <mailto:Virtuoso-users@lists.sourceforge.net> >>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >> _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users