Hi Hugh,

Le 10/03/2017 à 14:01, Hugh Williams a écrit :
> Hi Thomas,
>
> Is the ORCID dataset the only RDF datasets in the Virtuoso RDF Quad Store 
> currently, or are there others ?
>
> What is the size of the ORCID dataset ie triple count ?

I gave you wrong informations, because I misundertstood the process. 
Below are the correct details of our INSERT procedure from ORACLE db :

- dataset is from ORCID 2016 XML download available on this page 
https://orcid.org/content/download-file ("The file contains the public 
information associated with each user's ORCID record. Each record is 
included as a separate file in both JSON and XML. " 
https://figshare.com/articles/ORCID_Public_Data_File_2016/4134027).

They are uploaded inside ORACLE as XML records.

- then in a ORACLE PL/SQL procedure we apply "on the fly" an XSLT 
stylesheet (using Oracle XMLTRANSFORM efficient XSLT transform engine) 
to have an RDX/XML file for each ORCID XML record in the ORACLE table

- next in the process we use Jena tools to generate also "on the fly" 
TRIPLES from this RDF/XML results

- these are the  triples we're finally inserting via a JDBC "SPARQL 
INSERT DATA INTO GRAPH..." call to virtuoso from the PL/SQL Oracle 
procedure via virtuoso JDBC driver (and not ORACLE jdbc driver, my 
mistake, as you guessed)

-checking release of JDBC driver is > java -cp virtjdbc3.jar 
virtuoso.jdbc3.Driver
OpenLink Virtuoso(TM) Driver for JDBC(TM) Version 3.x [Build 3.62]

(the driver is embedded  inside ORACLE java JVM)

Thanks in advance if you have suggestions.

Last "statistics" on the graph size give  : 182 405 784 triples


this:Dataset a void:Dataset ;
  rdfs:seeAlso <http://hub.abes.fr/referentiel/ORCID/2016> ;
  rdfs:label "" ;
  void:sparqlEndpoint <http://idrefplus.v102.abes.fr:8890/sparql> ;
  void:triples 182405784 ;
  void:classes 13 ;
  void:entities 43946633 ;
  void:distinctSubjects 43922470 ;
  void:properties 32 ;
  void:distinctObjects 56509541 .

this:sameAsLinks a void:Linkset ;
  void:inDataset this:Dataset ;
  void:triples 759462 ;



>
> I would definitely suggest setting swappiness to 10 to reduce swapping to 
> disk which should speed inserts rates.
done
>
> Looking at you status() command output I see "Clients: 4177045 connects, max 
> 3 concurrent”  indicating more than 4 million SQL connections have been made 
> to Virtuoso since it was started on 9th Mar . What is making that many 
> connections, it is this insertion process
yes, it is
> or are there other clients reading from the instance also ?
none for the moment, instance is private
>   Apart from that the status() output looks fine with please of unused 
> Buffers for database working set size to be increased and still fit in memory 
> ,
don't really understand the point about buffers, but also noticed the 
use is not "maximized", because there are no other clients reading from 
the instance I suppose ?
> no deadlock and only one pending transaction which is one of your inserts.
>
> You talk about the Oracle JDBC Driver but I still don’t see its relevance as 
> ultimately your insertions to Virtuoso must be done one of its client 
> interfaces / services ie either the /sparql endpoint or the Virtuoso JDBC 
> driver I would presume, thus which is it ?
my mistake, as I said driver is > java -cp virtjdbc3.jar 
virtuoso.jdbc3.Driver
OpenLink Virtuoso(TM) Driver for JDBC(TM) Version 3.x [Build 3.62]
>
> The "DEFINE sql:log-enable 2” pragma being passed in the SPARQL insert 
> queries does set row by row auto-commit and turn off transaction logging, 
> which is the fastest transaction mode for write operations, see:
>
>       http://docs.openlinksw.com/virtuoso/fn_log_enable/
ok, thanks, a good point

Thomas
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
> Weblog   -- http://www.openlinksw.com/blogs/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
>
>
>> On 10 Mar 2017, at 10:54, Thomas Michaux <mich...@abes.fr> wrote:
>>
>> Hi,
>>
>> thanks Hugh, we reached 110 932 303 triples loaded from our ORCID dataset 
>> since yesterday, and still loading...
>>
>>
>>
>> Virtuoso process use VmSize: 32227664kB 32708 of memory of :
>>
>> KiB Mem : 32780296 total,   243972 free, 29985320 used,  2551004 buff/cache
>> KiB Swap:  2097148 total,  1734244 free,   362904 used.  2241196 avail Mem
>>
>> previous 4h logs :
>>
>> ...
>>
>> 06:03:28 Checkpoint started
>> 06:04:11 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310055817.trx
>> 06:28:41 Checkpoint started
>> 06:28:44 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310062412.trx
>> 06:52:58 Checkpoint started
>> 06:53:16 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310064844.trx
>> 07:17:14 Checkpoint started
>> 07:17:18 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310071317.trx
>> 07:39:58 Write load high relative to disk write throughput.  Flushing at     
>>   5.5 MB/s while application is making dirty pages at       1.5 MB/s. Doing 
>> a second flushing pass before checkpoint
>> 07:41:10 Checkpoint started
>> 07:41:17 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310073719.trx
>> 08:04:53 Checkpoint started
>> 08:04:56 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310080117.trx
>> 08:27:35 Write load high relative to disk write throughput.  Flushing at     
>>   5.7 MB/s while application is making dirty pages at       1.7 MB/s. Doing 
>> a second flushing pass before checkpoint
>> 08:28:45 Checkpoint started
>> 08:29:02 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310082457.trx
>> 08:51:43 Write load high relative to disk write throughput.  Flushing at     
>>   5.4 MB/s while application is making dirty pages at       1.7 MB/s. Doing 
>> a second flushing pass before checkpoint
>> 08:52:57 Checkpoint started
>> 08:53:01 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310084902.trx
>> 09:15:40 Write load high relative to disk write throughput.  Flushing at     
>>   5.6 MB/s while application is making dirty pages at       1.9 MB/s. Doing 
>> a second flushing pass before checkpoint
>> 09:16:59 Checkpoint started
>> 09:17:13 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310091301.trx
>> 09:39:57 Write load high relative to disk write throughput.  Flushing at     
>>   5.4 MB/s while application is making dirty pages at       1.7 MB/s. Doing 
>> a second flushing pass before checkpoint
>> 09:41:13 Checkpoint started
>> 09:41:16 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310093714.trx
>> 10:04:13 Write load high relative to disk write throughput.  Flushing at     
>>   5.2 MB/s while application is making dirty pages at       1.6 MB/s. Doing 
>> a second flushing pass before checkpoint
>> 10:05:38 Checkpoint started
>> 10:05:52 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310100118.trx
>> 10:28:52 Write load high relative to disk write throughput.  Flushing at     
>>   5.1 MB/s while application is making dirty pages at       1.8 MB/s. Doing 
>> a second flushing pass before checkpoint
>> 10:30:31 Checkpoint started
>> 10:30:34 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310102554.trx
>> 10:53:32 Write load high relative to disk write throughput.  Flushing at     
>>   5.2 MB/s while application is making dirty pages at       1.4 MB/s. Doing 
>> a second flushing pass before checkpoint
>> 10:54:43 Checkpoint started
>> 10:55:03 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310105036.trx
>> 11:19:29 Checkpoint started
>> 11:20:01 Checkpoint finished, new log is 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310111504.trx
>>
>>
>> here is the output of "status()" :
>>
>> SQL> status();
>> REPORT
>> VARCHAR
>> _______________________________________________________________________________
>>
>> OpenLink Virtuoso  Server
>> Version 07.20.3217-pthreads for Linux as of Feb 10 2017
>> Started on: 2017-03-09 12:33 GMT+1
>>
>> Database Status:
>>   File size 0, 1000960 pages, 247031 free.
>>   2720000 buffers, 447219 used, 112398 dirty 4 wired down, repl age 13435443 
>> 0 w. io 3 w/crsr.
>>   Disk Usage: 2212080 reads avg 0 msec, 0% r 0% w last  176 s, 12791013 
>> writes flush       8.82 MB,
>>     1221 read ahead, batch = 156.  Autocompact 722034 in 631152 out, 12% 
>> saved col ac: 7230338 in 3% saved.
>> Gate:  5993 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
>> Log = 
>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310105036.trx,
>>  90073727 bytes
>> 558107 pages have been changed since last backup (in checkpoint state)
>> Current backup timestamp: 0x0000-0x00-0x00
>> Last backup date: unknown
>> Clients: 4177045 connects, max 3 concurrent
>> RPC: 25061533 calls, -4177308 pending, 2 max until now, 0 queued, 37 burst 
>> reads (0%), 0 second 5M large, 298M max
>> Checkpoint Remap 132107 pages, 0 mapped back. 554 s atomic time.
>>     DB master 1000960 total 247030 free 132107 remap 44169 mapped back
>>    temp  165120 total 160375 free
>>
>> Lock Status: 0 deadlocks of which 0 2r1w, 28 waits,
>>    Currently 2 threads running 0 threads waiting 0 threads in vdb.
>> Pending:
>>   1100: IER 10.34.10.171
>>       1: IER 10.34.10.171
>>
>> Client 1111:4175445:  Account: dba, 364 bytes in, 359 bytes out, 1 stmts.
>> PID: 25646, OS: unix, Application: unknown, IP#: 127.0.0.1
>> Transaction status: PENDING, 1 threads.
>> Locks:
>>
>> Client 1111:4177046:  Account: ABES, 2728 bytes in, 361 bytes out, 2 stmts.
>> Transaction status: PENDING, 0 threads.
>> Locks:
>>
>>
>> Running Statements:
>> Time (msec) Text
>>            8 sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH 
>> <http://hub.abes.fr/refere
>>           76 status()
>>
>>
>> Hash indexes
>>
>>
>> 44 Rows. -- 77 msec.
>>
>>
>>
>> Le 10/03/2017 à 02:03, Hugh Williams a écrit :
>>> Hi Thomas,
>>>
>>> What is this JDBC Connector from Oracle that is being used for the inserts 
>>> in RDF/XML form ?
>> Oracle 12.1 brings it's own jdk  1.6.0_37, so if i'm right ojdbc6.jar Thin 
>> Driver or OCI Driver :
>>
>> "Oracle JDBC Drivers release 12.1.0.1.0 production Readme.txt :
>> Driver Versions
>> ---------------
>>
>> These are the driver versions in the 12R1 release:
>>
>>   - JDBC Thin Driver 12R1
>>     100% Java client-side JDBC driver for use in client applications,
>>     middle-tier servers and applets.
>>
>>   - JDBC OCI Driver 12R1
>>     Client-side JDBC driver for use on a machine where OCI 12R1
>>     is installed.
>>
>>   - JDBC Thin Server-side Driver 12R1
>>     JDBC driver for use in Java program in the database to access
>>     remote Oracle databases.
>>
>>   - JDBC Server-side Internal Driver 12R1
>>     Server-side JDBC driver for use by Java Stored procedures.  This
>>     driver used to be called the "JDBC Kprb Driver".
>>
>>
>>
>>> What is the ORCID dataset being used as the only one I see is in N-Triple 
>>> format from 2014 at:
>>>
>>>     https://datahub.io/dataset/orcid_2014_dataset
>> will ask for this
>>> Performing inserts with transaction would consume more memory maintaining 
>>> the transaction than with log_enable(2) which auto commits without 
>>> transaction logging in memory.
>> is it possible to have autocommit enabled the way we perform sparql INSERTs 
>> ? we used DEFINE sql:log-enable 2 in the query
>>> The  O_DIRECT param set in your INI file is an old param for which no real 
>>> benefit has been seen on current OS’es and on a Linux system setting 
>>> swappiness as detailed at:
>>>
>>>     
>>> https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning#Linux-only%20--%20”swappiness";
>>>
>>> Would give better results.
>> ok, known this, thought it was done but raised back to 30 after check, will 
>> find a way to fix it @ 10.
>>
>>> There is also no real need to set ColumnStore = 1 as for as the RDF_QUAD 
>>> tables is column store by default in Virtuoso 7 , so that setting would 
>>> only have effect on default SQL table creation
>>>
>>> If you still have problems, can you provide a copy of your virtuoso.log 
>>> file and the output of the “status();” command for review ...
>>>
>>> Best Regards
>>> Hugh Williams
>>> Professional Services
>>> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>>> Weblog   -- http://www.openlinksw.com/blogs/
>>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>>> Twitter  -- http://twitter.com/OpenLink
>>> Google+  -- http://plus.google.com/100570109519069333827/
>>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>>> Universal Data Access, Integration, and Management Technology Providers
>>>
>>>
>>>
>>>> On 9 Mar 2017, at 17:28, Thomas Michaux <mich...@abes.fr> wrote:
>>>>
>>>> Hello,
>>>>
>>>> We are loading ORCID 2016 in a V7 instance (Version 07.20.3217-pthreads 
>>>> for Linux as of Feb 10 2017), we DO NOT want to use the bulk loader, 
>>>> instead we are providing SPARQL inserts of RDF/XML files via JDBC 
>>>> connector from Oracle.
>>>>
>>>> Virtuoso is hosted on 8 cores, 32Gb platform.
>>>>
>>>> We successfully inserted 75 633 079 triples until virtuoso.log signals 
>>>> performances problems on "disk write throughput", is there something else 
>>>> to optimize in the virtuoso.ini while we are in this "loading" phase (no 
>>>> SPARQL "read" query from clients at the moment ) ?
>>>>
>>>> We've already done :
>>>>
>>>> - full text indexation has been delayed ( DB.DBA.VT_BATCH_UPDATE ( 
>>>> 'DB.DBA.RDF_OBJ', 'ON', 8640 ); )
>>>> - MaxCheckpointRemap = 505856 ( it's larger than 25% of total pages)
>>>> - UnremapQuota       = 0
>>>> - DefaultIsolation   = 2
>>>> - O_DIRECT                 = 1 (we are on XFS filesystem)
>>>> - ColumnStore              = 1 (we started from a new, fresh .db, deleted 
>>>> all previous existing .db, .trx)
>>>>
>>>> Can we do something at transaction level ? We commit each JDBC insert as 
>>>> short as possible (1 insert-> 1 commit), query is :
>>>>
>>>> "'sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH '||graphe ||' { 
>>>> '|| var_clob_line|| ' }'"
>>>>
>>>> I can see that free memory slowly decrease, and finally the server hang.
>>>>
>>>> Thanks for your help ! (Attached is virtuoso.ini)
>>>>
>>>> Thomas
>>>> <virtuoso.ini>------------------------------------------------------------------------------
>>>> Announcing the Oxford Dictionaries API! The API offers world-renowned
>>>> dictionary content that is easy and intuitive to access. Sign up for an
>>>> account today to start using our lexical data to power your apps and
>>>> projects. Get started today and enter our developer competition.
>>>> http://sdm.link/oxford_______________________________________________
>>>> Virtuoso-users mailing list
>>>> Virtuoso-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>> ------------------------------------------------------------------------------
>> Announcing the Oxford Dictionaries API! The API offers world-renowned
>> dictionary content that is easy and intuitive to access. Sign up for an
>> account today to start using our lexical data to power your apps and
>> projects. Get started today and enter our developer competition.
>> http://sdm.link/oxford_______________________________________________
>> Virtuoso-users mailing list
>> Virtuoso-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to