Hi Peter, I generated the datasets from your python script and loaded them into a local Virtuoso open source multiple times but did not see any occurrences of the error:
SQL> select * from load_list; ll_file ll_graph ll_state ll_started ll_done ll_host ll_work_time ll_error VARCHAR NOT NULL VARCHAR INTEGER TIMESTAMP TIMESTAMP INTEGER INTEGER VARCHAR _______________________________________________________________________________ ./wikidata/test00.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.54 983316000 0 NULL NULL ./wikidata/test01.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 105660000 0 NULL NULL ./wikidata/test02.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 233562000 0 NULL NULL ./wikidata/test03.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 371457000 0 NULL NULL ./wikidata/test04.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 483846000 0 NULL NULL ./wikidata/test05.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 621974000 0 NULL NULL ./wikidata/test06.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 742255000 0 NULL NULL ./wikidata/test07.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 860062000 0 NULL NULL ./wikidata/test08.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.55 993561000 0 NULL NULL ./wikidata/test09.ttl http://test.nuance.com 2 2018.12.19 0:47.54 826749000 2018.12.19 0:47.56 140431000 0 NULL NULL ./wikidata/test10.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.54 985386000 0 NULL NULL ./wikidata/test11.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 109072000 0 NULL NULL ./wikidata/test12.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 230846000 0 NULL NULL ./wikidata/test13.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 375427000 0 NULL NULL ./wikidata/test14.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 486963000 0 NULL NULL ./wikidata/test15.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 624303000 0 NULL NULL ./wikidata/test16.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 745760000 0 NULL NULL ./wikidata/test17.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 862932000 0 NULL NULL ./wikidata/test18.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.55 995704000 0 NULL NULL ./wikidata/test19.ttl http://test.nuance.com 2 2018.12.19 0:47.54 827790000 2018.12.19 0:47.56 144745000 0 NULL NULL 20 Rows. -- 3 msec. SQL> sparql select count(*) from <http://test.nuance.com> where {?s ?p ?o}; callret-0 INTEGER _______________________________________________________________________________ 135402 1 Rows. -- 85 msec. SQL> status(''); REPORT VARCHAR _______________________________________________________________________________ OpenLink Virtuoso Server Version 07.20.3230-pthreads for Darwin as of Nov 10 2018 Started on: 2018-12-19 00:36 GMT+0 Best Regards Hugh Williams Professional Services OpenLink Software Home Page: http://www.openlinksw.com <http://www.openlinksw.com/> Community Support: https://community.openlinksw.com <https://community.openlinksw.com/> Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog <https://medium.com/openlink-software-blog> Virtuoso Blog: https://medium.com/virtuoso-blog <https://medium.com/virtuoso-blog> Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers <https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers> LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Google+ -- http://plus.google.com/100570109519069333827/ Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers > On 18 Dec 2018, at 17:24, Peter F. Patel-Schneider <pfpschnei...@gmail.com> > wrote: > > I created some synthetic data that tickles the bug reliably on my machine with > a standard virtuoso.ini (just adding the directory for the files to the > allowed list). I'm attaching the generator program for the files and a > loading script. > > peter > > > On 12/18/18 9:46 AM, Peter F. Patel-Schneider wrote: >> I did a bit of digging and it sure looks as if there is a race condition in >> rdf_rl_lang_id in ttlpv.sql. This code appears to check to see if the >> language tag is already in DB.DBA.RDF_LANGUAGE and adds it if not. But >> another thread could do the same insert between the check and the insert, as >> far as I can tell. >> >> It looks to me as if the right solution is to do a soft insert and a >> subsequent query instead of a hard insert. >> >> However, I don't understand how locking works in SQL so there may be >> something >> that prevents another thread from interfering. >> >> peter >> >> >> On 12/18/18 8:55 AM, Peter F. Patel-Schneider wrote: >>> I'm loading the Turtle Wikidata RDF complete dump, split into pieces and >>> loaded with 10 active readers. About half the time the load fails with one >>> or more of these errors. The errors are always near the beginning of the >>> load---in the first group of 10 files to be loaded and near the beginning of >>> the files (generally in the first couple of hundred lines in a file of size >>> well over 1 GB). No errors occur for any files beyond the first ten. >>> >>> I could provide the files, but they total to about 340GB. >>> >>> It sure looks as if there is some sort of bug when loading RDF >>> language-tagged >>> strings, where a race condition means that two threads are trying to load >>> the >>> same language tag into DB.DBA.RDF_LANGUAGE. This would explain why the >>> problem occurs only at the beginning of the load, when the language tags are >>> being added to DB.DBA.RDF_LANGUAGE, and not later. It would also explain >>> why >>> the errors are different between different runs. (The only other >>> explanation >>> would be hardware errors, but this doesn't seem to be viable.) >>> >>> It seems to me that a quick patch for this problem would be to change the >>> insert into a soft insert, but I don't know where to make this change in >>> the code. >>> >>> peter >>> >>> >>> >>> >>> On 12/11/18 7:11 PM, Hugh Williams wrote: >>>> Hi Peter, >>>> >>>> The triple value do indeed appear to be valid, but the problem could be >>>> somewhere else in the dataset file and not necessarily on the reported >>>> line or >>>> line before it. >>>> >>>> Is it a public dataset you are loading and if so can you provide a copy for >>>> local testing ? >>>> >>>> Best Regards >>>> Hugh Williams >>>> Professional Services >>>> OpenLink Software >>>> Home Page: http://www.openlinksw.com >>>> Community Support: https://community.openlinksw.com >>>> Weblogs (Blogs): >>>> Company Blog: https://medium.com/openlink-software-blog >>>> Virtuoso Blog: https://medium.com/virtuoso-blog >>>> Data Access Drivers >>>> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers >>>> LinkedIn -- http://www.linkedin.com/company/openlink-software/ >>>> Twitter -- http://twitter.com/OpenLink >>>> Google+ -- http://plus.google.com/100570109519069333827/ >>>> Facebook -- http://www.facebook.com/OpenLinkSoftware >>>> Universal Data Access, Integration, and Management Technology Providers >>>> >>>> >>>> >>>> >>>>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider <pfpschnei...@gmail.com >>>>> <mailto:pfpschnei...@gmail.com>> wrote: >>>>> >>>>> I'm loading a bunch of Turtle files and I'm getting the error >>>>> >>>>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on >>>>> DB.DBA.RDF_LANGUAGE >>>>> >>>>> The line in question looks fine: >>>>> >>>>> "Wikimedia template"@ki, >>>>> >>>>> The line before it may indicate the issue >>>>> >>>>> "Wikimedia template"@kg, >>>>> >>>>> Nonetheless this should be valid RDF so there appears to be a bug in >>>>> Virtuoso >>>>> here. >>>>> >>>>> Is there any workaround? >>>>> >>>>> >>>>> This is in Virtuoso 07.20.3230. >>>>> >>>>> peter >>>>> >>>>> >>>>> _______________________________________________ >>>>> Virtuoso-users mailing list >>>>> Virtuoso-users@lists.sourceforge.net >>>>> <mailto:Virtuoso-users@lists.sourceforge.net> >>>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >>>> > <generate.py><test.sh>_______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > <mailto:Virtuoso-users@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > <https://lists.sourceforge.net/lists/listinfo/virtuoso-users>
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users