Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-20 Thread Peter F. Patel-Schneider
Changing ttlpv.sql to use insert soft (and also making sure to use the
stored id) appears to fix the problem I am encountering.

I prepared a modified ttlpv.sql (attached) and reinstalled Virtuoso as
follows:

cd /home/virtuoso/test
killall virtuoso-t
rm -rf vos
cp ttlpv.sql ./virtuoso-opensource/libsrc/Wi
cd virtuoso-opensource/
make
make install prefix=/home/virtuoso/test/vos
cd ..
cp generate.py vos/share/virtuoso/vad
( cd vos/share/virtuoso/vad ; python2.7 ./generate.py )
./test.sh

With this change everything worked.

I expect that when using insert soft that rdf_rl_lang_id could be somewhat
shortened but I wanted to make the minimal change.


peter


ttlpv.sql
Description: application/sql
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-19 Thread Peter F. Patel-Schneider
Aah.  It's likely then that something in MacOS doesn't switch between threads
within ttlpv.sql.

It is also possible that there is a similar issue in ttlpv.sql having to do
with type ids (rdf_rl_type_id).  That code is very similar to the code where I
found the bug.

peter


On 12/19/18 11:05 AM, Hugh Williams wrote:
> Hi Peter,
> 
> My original testing was on Darwin (macOS), I have switched to CentOS 7 and
> have been able to recreate the issue on first run:
> 
> [hwilliams@localhost database]$ sh test.sh 
> Wed 19 Dec 15:43:33 GMT 2018: 3/ Start up Virtuoso with an empty database
> OpenLink Virtuoso Interactive SQL (Virtuoso)
> Version 07.20.3230 as of Oct 18 2018
> Type HELP; for help and EXIT; to exit.
> Connected to OpenLink Virtuoso
> Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
> Wed 19 Dec 15:43:39 GMT 2018: 4/ Load the dump files into Virtuoso
> test.sh: line 35: warning: here-document at line 18 delimited by end-of-file
> (wanted `EOF')
> ll_file                                                                      
>     ll_state    ll_error
> ___
> 
> /home/hwilliams/virtuoso/vos72/virtuoso-opensource/wikidata/test00.ttl        
>    2           23000 TURTLE RDF loader, line 507: SR197: Non unique primary
> key on DB.DBA.RDF_LANGUAGE.
> [hwilliams@localhost database]$ 
> 
> and sometimes does not occur ...
> 
> I shall report to development to look into and also provide them with the
> “ttlpv.sql” script with your proposed fix ...
> 
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software
> Home Page: http://www.openlinksw.com
> Community Support: https://community.openlinksw.com
> Weblogs (Blogs):
> Company Blog: https://medium.com/openlink-software-blog
> Virtuoso Blog: https://medium.com/virtuoso-blog
> Data Access Drivers
> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
> 
> 
> 
> 
>> On 19 Dec 2018, at 13:51, Peter F. Patel-Schneider > > wrote:
>>
>> That's not my experience on several machines.  I've experienced this bug on
>> two servers, one running CentOS release 6.9 (Final) and one running Ubuntu
>> 16.04.5 LTS.  The bug depends at least on loading in parallel *and*
>> encountering new language tags.  Loading the files a second time will almost
>> certainly result in no errors as the language tags will already have been
>> encountered and added in the previous partially completed run.
>>
>>
>> To check things out a bit further I tested on a new machine.  I did a fresh
>> download and install of Virtuoso Open Source (Version 07.20.3230 as of Dec
>> 19 2018) on my laptop (a four-core, eight-thread machine running Fedora 29
>> with kernel 4.19.8-300.fc29.x86_64).
>>
>> I downloaded, compiled, and installed Virtuoso as follows:
>>
>> cd /home/virtuoso/test
>> git clone git://github.com/openlink/virtuoso-opensource.git
>> cd virtuoso-opensource
>> ./autogen.sh
>> ./configure
>> make
>> make install prefix=/home/virtuoso/test/vos
>> cd ..
>>
>> I put my generation code in the directory .../share/virtuoso/vad/ and
>> creaated the download files there so that I did not have to change
>> virtuoso.ini at all.   I then ran a slightly modified version of the test
>> script (attached).
>>
>> cp generate.py vos/share/virtuoso/vad
>> ( cd vos/share/virtuoso/vad ; python2.7 ./generate.py )
>> ./test.sh
>>
>> The test script produced
>>
>> #!/bin/bash -v
>> # Installation directories for Virtuoso open source
>> vrun=/home/virtuoso/test/vos
>> vbin=${vrun}/bin
>> vdb=${vrun}/var/lib/virtuoso/db
>>
>> # Start up virtuoso as a daemon, waiting for initialization
>> ( cd ${vdb} ; ${vbin}/virtuoso-t +wait )
>>
>> ${vbin}/isql  dba dba PROMPT=OFF <> ld_dir ('/home/virtuoso/test/vos/share/virtuoso/vad', 'test*.ttl',
>> 'http://test.nuance.com');
>> rdf_loader_run() &
>> rdf_loader_run() &
>> rdf_loader_run() &
>> rdf_loader_run() &
>> rdf_loader_run() &
>> rdf_loader_run() &
>> rdf_loader_run() &
>> rdf_loader_run() &
>> rdf_loader_run() &
>> rdf_loader_run() &
>> wait_for_children;
>> checkpoint;
>> SELECT * FROM DB.DBA.load_list;
>> SPARQL SELECT count(*) FROM  WHERE {?s ?p ?o};
>> exit;
>> EOF
>> OpenLink Virtuoso Interactive SQL (Virtuoso)
>> Version 07.20.3230 as of Dec 19 2018
>> Type HELP; for help and EXIT; to exit.
>> Connected to OpenLink Virtuoso
>> Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
>>
>> Done. -- 1 msec.
>> OpenLink Virtuoso Interactive SQL (Virtuoso)
>> OpenLink Virtuoso Interactive SQL (Virtuoso)
>> OpenLink Virtuoso Interactive SQL (Virtuoso)
>> OpenLink Virtuoso Interactive SQL (Virtuoso)
>> Version 

Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-19 Thread Hugh Williams
Hi Peter,

My original testing was on Darwin (macOS), I have switched to CentOS 7 and have 
been able to recreate the issue on first run:

[hwilliams@localhost database]$ sh test.sh 
Wed 19 Dec 15:43:33 GMT 2018: 3/ Start up Virtuoso with an empty database
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Oct 18 2018
Type HELP; for help and EXIT; to exit.
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Wed 19 Dec 15:43:39 GMT 2018: 4/ Load the dump files into Virtuoso
test.sh: line 35: warning: here-document at line 18 delimited by end-of-file 
(wanted `EOF')
ll_file 
  ll_statell_error
___

/home/hwilliams/virtuoso/vos72/virtuoso-opensource/wikidata/test00.ttl  
  2   23000 TURTLE RDF loader, line 507: SR197: Non unique primary key 
on DB.DBA.RDF_LANGUAGE.
[hwilliams@localhost database]$ 

and sometimes does not occur ...

I shall report to development to look into and also provide them with the 
“ttlpv.sql” script with your proposed fix ...

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Home Page: http://www.openlinksw.com 
Community Support: https://community.openlinksw.com 

Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog 

Virtuoso Blog: https://medium.com/virtuoso-blog 

Data Access Drivers Blog: 
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers 

LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers




> On 19 Dec 2018, at 13:51, Peter F. Patel-Schneider  
> wrote:
> 
> That's not my experience on several machines.  I've experienced this bug on
> two servers, one running CentOS release 6.9 (Final) and one running Ubuntu
> 16.04.5 LTS.  The bug depends at least on loading in parallel *and*
> encountering new language tags.  Loading the files a second time will almost
> certainly result in no errors as the language tags will already have been
> encountered and added in the previous partially completed run.
> 
> 
> To check things out a bit further I tested on a new machine.  I did a fresh
> download and install of Virtuoso Open Source (Version 07.20.3230 as of Dec
> 19 2018) on my laptop (a four-core, eight-thread machine running Fedora 29
> with kernel 4.19.8-300.fc29.x86_64).
> 
> I downloaded, compiled, and installed Virtuoso as follows:
> 
> cd /home/virtuoso/test
> git clone git://github.com/openlink/virtuoso-opensource.git
> cd virtuoso-opensource
> ./autogen.sh
> ./configure
> make
> make install prefix=/home/virtuoso/test/vos
> cd ..
> 
> I put my generation code in the directory .../share/virtuoso/vad/ and
> creaated the download files there so that I did not have to change
> virtuoso.ini at all.   I then ran a slightly modified version of the test
> script (attached).
> 
> cp generate.py vos/share/virtuoso/vad
> ( cd vos/share/virtuoso/vad ; python2.7 ./generate.py )
> ./test.sh
> 
> The test script produced
> 
> #!/bin/bash -v
> # Installation directories for Virtuoso open source
> vrun=/home/virtuoso/test/vos
> vbin=${vrun}/bin
> vdb=${vrun}/var/lib/virtuoso/db
> 
> # Start up virtuoso as a daemon, waiting for initialization
> ( cd ${vdb} ; ${vbin}/virtuoso-t +wait )
> 
> ${vbin}/isql  dba dba PROMPT=OFF < ld_dir ('/home/virtuoso/test/vos/share/virtuoso/vad', 'test*.ttl',
> 'http://test.nuance.com');
> rdf_loader_run() &
> rdf_loader_run() &
> rdf_loader_run() &
> rdf_loader_run() &
> rdf_loader_run() &
> rdf_loader_run() &
> rdf_loader_run() &
> rdf_loader_run() &
> rdf_loader_run() &
> rdf_loader_run() &
> wait_for_children;
> checkpoint;
> SELECT * FROM DB.DBA.load_list;
> SPARQL SELECT count(*) FROM  WHERE {?s ?p ?o};
> exit;
> EOF
> OpenLink Virtuoso Interactive SQL (Virtuoso)
> Version 07.20.3230 as of Dec 19 2018
> Type HELP; for help and EXIT; to exit.
> Connected to OpenLink Virtuoso
> Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
> 
> Done. -- 1 msec.
> OpenLink Virtuoso Interactive SQL (Virtuoso)
> OpenLink Virtuoso Interactive SQL (Virtuoso)
> OpenLink Virtuoso Interactive SQL (Virtuoso)
> OpenLink Virtuoso Interactive SQL (Virtuoso)
> Version 07.20.3230 as of Dec 19 2018
> Version 07.20.3230 as of Dec 19 2018
> Version 07.20.3230 as of Dec 19 2018
> Type HELP; for help and EXIT; to exit.
> Type HELP; for help and EXIT; to exit.
> Type HELP; for help and EXIT; to exit.
> OpenLink Virtuoso Interactive SQL (Virtuoso)
> Version 07.20.3230 

Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-19 Thread Peter F. Patel-Schneider
That's not my experience on several machines.  I've experienced this bug on
two servers, one running CentOS release 6.9 (Final) and one running Ubuntu
16.04.5 LTS.  The bug depends at least on loading in parallel *and*
encountering new language tags.  Loading the files a second time will almost
certainly result in no errors as the language tags will already have been
encountered and added in the previous partially completed run.


To check things out a bit further I tested on a new machine.  I did a fresh
download and install of Virtuoso Open Source (Version 07.20.3230 as of Dec
19 2018) on my laptop (a four-core, eight-thread machine running Fedora 29
with kernel 4.19.8-300.fc29.x86_64).

I downloaded, compiled, and installed Virtuoso as follows:

cd /home/virtuoso/test
git clone git://github.com/openlink/virtuoso-opensource.git
cd virtuoso-opensource
./autogen.sh
./configure
make
make install prefix=/home/virtuoso/test/vos
cd ..

I put my generation code in the directory .../share/virtuoso/vad/ and
creaated the download files there so that I did not have to change
virtuoso.ini at all.   I then ran a slightly modified version of the test
script (attached).

cp generate.py vos/share/virtuoso/vad
( cd vos/share/virtuoso/vad ; python2.7 ./generate.py )
./test.sh

The test script produced

#!/bin/bash -v
# Installation directories for Virtuoso open source
vrun=/home/virtuoso/test/vos
vbin=${vrun}/bin
vdb=${vrun}/var/lib/virtuoso/db

# Start up virtuoso as a daemon, waiting for initialization
( cd ${vdb} ; ${vbin}/virtuoso-t +wait )

${vbin}/isql  dba dba PROMPT=OFF  WHERE {?s ?p ?o};
exit;
EOF
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver

Done. -- 1 msec.
OpenLink Virtuoso Interactive SQL (Virtuoso)
OpenLink Virtuoso Interactive SQL (Virtuoso)
OpenLink Virtuoso Interactive SQL (Virtuoso)
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Dec 19 2018
Version 07.20.3230 as of Dec 19 2018
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
Type HELP; for help and EXIT; to exit.
Type HELP; for help and EXIT; to exit.
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Connected to OpenLink Virtuoso
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
OpenLink Virtuoso Interactive SQL (Virtuoso)
Version 07.20.3230 as of Dec 19 2018
Type HELP; for help and EXIT; to exit.
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver
Connected to OpenLink Virtuoso
Driver: 07.20.3230 OpenLink Virtuoso ODBC Driver

Done. -- 0 msec.

Done. -- 0 msec.




Done. -- 42 msec.
Done. -- 42 msec.
Done. -- 43 msec.
Done. -- 42 msec.

Done. -- 41 msec.

Done. -- 41 msec.

Done. -- 58 msec.

Done. -- 265 msec.

Done. -- 68 msec.
ll_file
ll_graph
ll_statell_started   ll_done  ll_host
ll_work_time  ll_error
VARCHAR NOT NULL
VARCHAR
INTEGER TIMESTAMPTIMESTAMPINTEGER
INTEGER VARCHAR
___

/home/virtuoso/test/vos/share/virtuoso/vad/test00.ttl
http://test.nuance.com
2   2018.12.19 8:35.33 146481000  2018.12.19 8:35.33 148859000
 0   NULL23000 TURTLE RDF loader, line 17: SR197: Non unique
primary key on DB.DBA.RDF_LANGUAGE.

Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-18 Thread Hugh Williams
Hi Peter,

I generated the datasets from your python script and loaded them into a local 
Virtuoso open source multiple times but did not see any occurrences of the 
error:

SQL> select * from load_list;
ll_file 
  ll_graph  
ll_statell_started   ll_done  ll_host 
ll_work_time  ll_error
VARCHAR NOT NULL
  VARCHAR   
INTEGER TIMESTAMPTIMESTAMPINTEGER INTEGER   
  VARCHAR
___

./wikidata/test00.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.54 983316000  0   
NULLNULL
./wikidata/test01.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.55 10566  0   
NULLNULL
./wikidata/test02.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.55 233562000  0   
NULLNULL
./wikidata/test03.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.55 371457000  0   
NULLNULL
./wikidata/test04.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.55 483846000  0   
NULLNULL
./wikidata/test05.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.55 621974000  0   
NULLNULL
./wikidata/test06.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.55 742255000  0   
NULLNULL
./wikidata/test07.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.55 860062000  0   
NULLNULL
./wikidata/test08.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.55 993561000  0   
NULLNULL
./wikidata/test09.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 826749000  2018.12.19 0:47.56 140431000  0   
NULLNULL
./wikidata/test10.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 82779  2018.12.19 0:47.54 985386000  0   
NULLNULL
./wikidata/test11.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 82779  2018.12.19 0:47.55 109072000  0   
NULLNULL
./wikidata/test12.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 82779  2018.12.19 0:47.55 230846000  0   
NULLNULL
./wikidata/test13.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 82779  2018.12.19 0:47.55 375427000  0   
NULLNULL
./wikidata/test14.ttl   
  http://test.nuance.com
2   2018.12.19 0:47.54 82779  2018.12.19 0:47.55 486963000  0   
NULLNULL
./wikidata/test15.ttl   
  http://test.nuance.com
2   

Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-18 Thread Peter F. Patel-Schneider
I created some synthetic data that tickles the bug reliably on my machine with
a standard virtuoso.ini (just adding the directory for the files to the
allowed list).  I'm attaching the generator program for the files and a
loading script.

peter


On 12/18/18 9:46 AM, Peter F. Patel-Schneider wrote:
> I did a bit of digging and it sure looks as if there is a race condition in
> rdf_rl_lang_id in ttlpv.sql.   This code appears to check to see if the
> language tag is already in DB.DBA.RDF_LANGUAGE and adds it if not.  But
> another thread could do the same insert between the check and the insert, as
> far as I can tell.
> 
> It looks to me as if the right solution is to do a soft insert and a
> subsequent query instead of a hard insert.
> 
> However, I don't understand how locking works in SQL so there may be something
> that prevents another thread from interfering.
> 
> peter
> 
> 
> On 12/18/18 8:55 AM, Peter F. Patel-Schneider wrote:
>> I'm loading the Turtle Wikidata RDF complete dump, split into pieces and
>> loaded with 10 active readers.   About half the time the load fails with one
>> or more of these errors.  The errors are always near the beginning of the
>> load---in the first group of 10 files to be loaded and near the beginning of
>> the files (generally in the first couple of hundred lines in a file of size
>> well over 1 GB).  No errors occur for any files beyond the first ten.
>>
>> I could provide the files, but they total to about 340GB.
>>
>> It sure looks as if there is some sort of bug when loading RDF 
>> language-tagged
>> strings, where a race condition means that two threads are trying to load the
>> same language tag into DB.DBA.RDF_LANGUAGE.  This would explain why the
>> problem occurs only at the beginning of the load, when the language tags are
>> being added to DB.DBA.RDF_LANGUAGE, and not later.  It would also explain why
>> the errors are different between different runs.  (The only other explanation
>> would be hardware errors, but this doesn't seem to be viable.)
>>
>> It seems to me that a quick patch for this problem would be to change the
>> insert into a soft insert, but I don't know where to make this change in the 
>> code.
>>
>> peter
>>
>>
>>
>>
>> On 12/11/18 7:11 PM, Hugh Williams wrote:
>>> Hi Peter,
>>>
>>> The triple value do indeed appear to be valid, but the problem could be
>>> somewhere else in the dataset file and not necessarily on the reported line 
>>> or
>>> line before it.
>>>
>>> Is it a public dataset you are loading and if so can you provide a copy for
>>> local testing ?
>>>
>>> Best Regards
>>> Hugh Williams
>>> Professional Services
>>> OpenLink Software
>>> Home Page: http://www.openlinksw.com
>>> Community Support: https://community.openlinksw.com
>>> Weblogs (Blogs):
>>> Company Blog: https://medium.com/openlink-software-blog
>>> Virtuoso Blog: https://medium.com/virtuoso-blog
>>> Data Access Drivers
>>> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>>> Twitter  -- http://twitter.com/OpenLink
>>> Google+  -- http://plus.google.com/100570109519069333827/
>>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>>> Universal Data Access, Integration, and Management Technology Providers
>>>
>>>
>>>
>>>
 On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider >>> > wrote:

 I'm loading a bunch of Turtle files and I'm getting the error

 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
 DB.DBA.RDF_LANGUAGE

 The line in question looks fine:

   "Wikimedia template"@ki,

 The line before it may indicate the issue

    "Wikimedia template"@kg,

 Nonetheless this should be valid RDF so there appears to be a bug in 
 Virtuoso
 here.

 Is there any workaround?


 This is in Virtuoso 07.20.3230.

 peter


 ___
 Virtuoso-users mailing list
 Virtuoso-users@lists.sourceforge.net
 
 https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>>
#!/usr/local/bin/python2.7 

for x in range (0,20) :
file = open('test{:0>2d}.ttl'.format(x),'w')

for k in range(0,10) :

file.write('2d}{:0>2d}> \n'.format(x,k))
for y in range (ord('a'),ord('z')+1) :
for z in range (ord('a'),ord('z')+1) :
		file.write('"description {:0>2d}{:0>3d}{:0>3d}"@l{:s}{:s},\n'.format(x,y,z,chr(y),chr(z)))
file.write('  "JUNK".\n')
file.close()


test.sh
Description: application/shellscript

Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-18 Thread Peter F. Patel-Schneider
I did a bit of digging and it sure looks as if there is a race condition in
rdf_rl_lang_id in ttlpv.sql.   This code appears to check to see if the
language tag is already in DB.DBA.RDF_LANGUAGE and adds it if not.  But
another thread could do the same insert between the check and the insert, as
far as I can tell.

It looks to me as if the right solution is to do a soft insert and a
subsequent query instead of a hard insert.

However, I don't understand how locking works in SQL so there may be something
that prevents another thread from interfering.

peter


On 12/18/18 8:55 AM, Peter F. Patel-Schneider wrote:
> I'm loading the Turtle Wikidata RDF complete dump, split into pieces and
> loaded with 10 active readers.   About half the time the load fails with one
> or more of these errors.  The errors are always near the beginning of the
> load---in the first group of 10 files to be loaded and near the beginning of
> the files (generally in the first couple of hundred lines in a file of size
> well over 1 GB).  No errors occur for any files beyond the first ten.
> 
> I could provide the files, but they total to about 340GB.
> 
> It sure looks as if there is some sort of bug when loading RDF language-tagged
> strings, where a race condition means that two threads are trying to load the
> same language tag into DB.DBA.RDF_LANGUAGE.  This would explain why the
> problem occurs only at the beginning of the load, when the language tags are
> being added to DB.DBA.RDF_LANGUAGE, and not later.  It would also explain why
> the errors are different between different runs.  (The only other explanation
> would be hardware errors, but this doesn't seem to be viable.)
> 
> It seems to me that a quick patch for this problem would be to change the
> insert into a soft insert, but I don't know where to make this change in the 
> code.
> 
> peter
> 
> 
> 
> 
> On 12/11/18 7:11 PM, Hugh Williams wrote:
>> Hi Peter,
>>
>> The triple value do indeed appear to be valid, but the problem could be
>> somewhere else in the dataset file and not necessarily on the reported line 
>> or
>> line before it.
>>
>> Is it a public dataset you are loading and if so can you provide a copy for
>> local testing ?
>>
>> Best Regards
>> Hugh Williams
>> Professional Services
>> OpenLink Software
>> Home Page: http://www.openlinksw.com
>> Community Support: https://community.openlinksw.com
>> Weblogs (Blogs):
>> Company Blog: https://medium.com/openlink-software-blog
>> Virtuoso Blog: https://medium.com/virtuoso-blog
>> Data Access Drivers
>> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>> Twitter  -- http://twitter.com/OpenLink
>> Google+  -- http://plus.google.com/100570109519069333827/
>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>> Universal Data Access, Integration, and Management Technology Providers
>>
>>
>>
>>
>>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider >> > wrote:
>>>
>>> I'm loading a bunch of Turtle files and I'm getting the error
>>>
>>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
>>> DB.DBA.RDF_LANGUAGE
>>>
>>> The line in question looks fine:
>>>
>>>   "Wikimedia template"@ki,
>>>
>>> The line before it may indicate the issue
>>>
>>>    "Wikimedia template"@kg,
>>>
>>> Nonetheless this should be valid RDF so there appears to be a bug in 
>>> Virtuoso
>>> here.
>>>
>>> Is there any workaround?
>>>
>>>
>>> This is in Virtuoso 07.20.3230.
>>>
>>> peter
>>>
>>>
>>> ___
>>> Virtuoso-users mailing list
>>> Virtuoso-users@lists.sourceforge.net
>>> 
>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>


___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-18 Thread Peter F. Patel-Schneider
I'm loading the Turtle Wikidata RDF complete dump, split into pieces and
loaded with 10 active readers.   About half the time the load fails with one
or more of these errors.  The errors are always near the beginning of the
load---in the first group of 10 files to be loaded and near the beginning of
the files (generally in the first couple of hundred lines in a file of size
well over 1 GB).  No errors occur for any files beyond the first ten.

I could provide the files, but they total to about 340GB.

It sure looks as if there is some sort of bug when loading RDF language-tagged
strings, where a race condition means that two threads are trying to load the
same language tag into DB.DBA.RDF_LANGUAGE.  This would explain why the
problem occurs only at the beginning of the load, when the language tags are
being added to DB.DBA.RDF_LANGUAGE, and not later.  It would also explain why
the errors are different between different runs.  (The only other explanation
would be hardware errors, but this doesn't seem to be viable.)

It seems to me that a quick patch for this problem would be to change the
insert into a soft insert, but I don't know where to make this change in the 
code.

peter




On 12/11/18 7:11 PM, Hugh Williams wrote:
> Hi Peter,
> 
> The triple value do indeed appear to be valid, but the problem could be
> somewhere else in the dataset file and not necessarily on the reported line or
> line before it.
> 
> Is it a public dataset you are loading and if so can you provide a copy for
> local testing ?
> 
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software
> Home Page: http://www.openlinksw.com
> Community Support: https://community.openlinksw.com
> Weblogs (Blogs):
> Company Blog: https://medium.com/openlink-software-blog
> Virtuoso Blog: https://medium.com/virtuoso-blog
> Data Access Drivers
> Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
> 
> 
> 
> 
>> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider > > wrote:
>>
>> I'm loading a bunch of Turtle files and I'm getting the error
>>
>> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
>> DB.DBA.RDF_LANGUAGE
>>
>> The line in question looks fine:
>>
>>   "Wikimedia template"@ki,
>>
>> The line before it may indicate the issue
>>
>>    "Wikimedia template"@kg,
>>
>> Nonetheless this should be valid RDF so there appears to be a bug in Virtuoso
>> here.
>>
>> Is there any workaround?
>>
>>
>> This is in Virtuoso 07.20.3230.
>>
>> peter
>>
>>
>> ___
>> Virtuoso-users mailing list
>> Virtuoso-users@lists.sourceforge.net
>> 
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
> 


___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-11 Thread Hugh Williams
Hi Peter,

The triple value do indeed appear to be valid, but the problem could be 
somewhere else in the dataset file and not necessarily on the reported line or 
line before it.

Is it a public dataset you are loading and if so can you provide a copy for 
local testing ?

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Home Page: http://www.openlinksw.com 
Community Support: https://community.openlinksw.com 

Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog 

Virtuoso Blog: https://medium.com/virtuoso-blog 

Data Access Drivers Blog: 
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers 

LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers




> On 11 Dec 2018, at 17:45, Peter F. Patel-Schneider  
> wrote:
> 
> I'm loading a bunch of Turtle files and I'm getting the error
> 
> 2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
> DB.DBA.RDF_LANGUAGE
> 
> The line in question looks fine:
> 
>   "Wikimedia template"@ki,
> 
> The line before it may indicate the issue
> 
>"Wikimedia template"@kg,
> 
> Nonetheless this should be valid RDF so there appears to be a bug in Virtuoso
> here.
> 
> Is there any workaround?
> 
> 
> This is in Virtuoso 07.20.3230.
> 
> peter
> 
> 
> ___
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users

___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] strange error when bulk-loading Turtle files

2018-12-11 Thread Peter F. Patel-Schneider
I'm loading a bunch of Turtle files and I'm getting the error

2300 TURTLE RDF loader, line 1012: SR197: Non unique primary key on
DB.DBA.RDF_LANGUAGE

The line in question looks fine:

   "Wikimedia template"@ki,

The line before it may indicate the issue

"Wikimedia template"@kg,

Nonetheless this should be valid RDF so there appears to be a bug in Virtuoso
here.

Is there any workaround?


This is in Virtuoso 07.20.3230.

peter


___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users