Hi Hugh, Le 10/03/2017 à 14:01, Hugh Williams a écrit : > Hi Thomas, > > Is the ORCID dataset the only RDF datasets in the Virtuoso RDF Quad Store > currently, or are there others ? > > What is the size of the ORCID dataset ie triple count ?
I gave you wrong informations, because I misundertstood the process. Below are the correct details of our INSERT procedure from ORACLE db : - dataset is from ORCID 2016 XML download available on this page https://orcid.org/content/download-file ("The file contains the public information associated with each user's ORCID record. Each record is included as a separate file in both JSON and XML. " https://figshare.com/articles/ORCID_Public_Data_File_2016/4134027). They are uploaded inside ORACLE as XML records. - then in a ORACLE PL/SQL procedure we apply "on the fly" an XSLT stylesheet (using Oracle XMLTRANSFORM efficient XSLT transform engine) to have an RDX/XML file for each ORCID XML record in the ORACLE table - next in the process we use Jena tools to generate also "on the fly" TRIPLES from this RDF/XML results - these are the triples we're finally inserting via a JDBC "SPARQL INSERT DATA INTO GRAPH..." call to virtuoso from the PL/SQL Oracle procedure via virtuoso JDBC driver (and not ORACLE jdbc driver, my mistake, as you guessed) -checking release of JDBC driver is > java -cp virtjdbc3.jar virtuoso.jdbc3.Driver OpenLink Virtuoso(TM) Driver for JDBC(TM) Version 3.x [Build 3.62] (the driver is embedded inside ORACLE java JVM) Thanks in advance if you have suggestions. Last "statistics" on the graph size give : 182 405 784 triples this:Dataset a void:Dataset ; rdfs:seeAlso <http://hub.abes.fr/referentiel/ORCID/2016> ; rdfs:label "" ; void:sparqlEndpoint <http://idrefplus.v102.abes.fr:8890/sparql> ; void:triples 182405784 ; void:classes 13 ; void:entities 43946633 ; void:distinctSubjects 43922470 ; void:properties 32 ; void:distinctObjects 56509541 . this:sameAsLinks a void:Linkset ; void:inDataset this:Dataset ; void:triples 759462 ; > > I would definitely suggest setting swappiness to 10 to reduce swapping to > disk which should speed inserts rates. done > > Looking at you status() command output I see "Clients: 4177045 connects, max > 3 concurrent” indicating more than 4 million SQL connections have been made > to Virtuoso since it was started on 9th Mar . What is making that many > connections, it is this insertion process yes, it is > or are there other clients reading from the instance also ? none for the moment, instance is private > Apart from that the status() output looks fine with please of unused > Buffers for database working set size to be increased and still fit in memory > , don't really understand the point about buffers, but also noticed the use is not "maximized", because there are no other clients reading from the instance I suppose ? > no deadlock and only one pending transaction which is one of your inserts. > > You talk about the Oracle JDBC Driver but I still don’t see its relevance as > ultimately your insertions to Virtuoso must be done one of its client > interfaces / services ie either the /sparql endpoint or the Virtuoso JDBC > driver I would presume, thus which is it ? my mistake, as I said driver is > java -cp virtjdbc3.jar virtuoso.jdbc3.Driver OpenLink Virtuoso(TM) Driver for JDBC(TM) Version 3.x [Build 3.62] > > The "DEFINE sql:log-enable 2” pragma being passed in the SPARQL insert > queries does set row by row auto-commit and turn off transaction logging, > which is the fastest transaction mode for write operations, see: > > http://docs.openlinksw.com/virtuoso/fn_log_enable/ ok, thanks, a good point Thomas > > Best Regards > Hugh Williams > Professional Services > OpenLink Software, Inc. // http://www.openlinksw.com/ > Weblog -- http://www.openlinksw.com/blogs/ > LinkedIn -- http://www.linkedin.com/company/openlink-software/ > Twitter -- http://twitter.com/OpenLink > Google+ -- http://plus.google.com/100570109519069333827/ > Facebook -- http://www.facebook.com/OpenLinkSoftware > Universal Data Access, Integration, and Management Technology Providers > > > >> On 10 Mar 2017, at 10:54, Thomas Michaux <mich...@abes.fr> wrote: >> >> Hi, >> >> thanks Hugh, we reached 110 932 303 triples loaded from our ORCID dataset >> since yesterday, and still loading... >> >> >> >> Virtuoso process use VmSize: 32227664kB 32708 of memory of : >> >> KiB Mem : 32780296 total, 243972 free, 29985320 used, 2551004 buff/cache >> KiB Swap: 2097148 total, 1734244 free, 362904 used. 2241196 avail Mem >> >> previous 4h logs : >> >> ... >> >> 06:03:28 Checkpoint started >> 06:04:11 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310055817.trx >> 06:28:41 Checkpoint started >> 06:28:44 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310062412.trx >> 06:52:58 Checkpoint started >> 06:53:16 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310064844.trx >> 07:17:14 Checkpoint started >> 07:17:18 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310071317.trx >> 07:39:58 Write load high relative to disk write throughput. Flushing at >> 5.5 MB/s while application is making dirty pages at 1.5 MB/s. Doing >> a second flushing pass before checkpoint >> 07:41:10 Checkpoint started >> 07:41:17 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310073719.trx >> 08:04:53 Checkpoint started >> 08:04:56 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310080117.trx >> 08:27:35 Write load high relative to disk write throughput. Flushing at >> 5.7 MB/s while application is making dirty pages at 1.7 MB/s. Doing >> a second flushing pass before checkpoint >> 08:28:45 Checkpoint started >> 08:29:02 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310082457.trx >> 08:51:43 Write load high relative to disk write throughput. Flushing at >> 5.4 MB/s while application is making dirty pages at 1.7 MB/s. Doing >> a second flushing pass before checkpoint >> 08:52:57 Checkpoint started >> 08:53:01 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310084902.trx >> 09:15:40 Write load high relative to disk write throughput. Flushing at >> 5.6 MB/s while application is making dirty pages at 1.9 MB/s. Doing >> a second flushing pass before checkpoint >> 09:16:59 Checkpoint started >> 09:17:13 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310091301.trx >> 09:39:57 Write load high relative to disk write throughput. Flushing at >> 5.4 MB/s while application is making dirty pages at 1.7 MB/s. Doing >> a second flushing pass before checkpoint >> 09:41:13 Checkpoint started >> 09:41:16 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310093714.trx >> 10:04:13 Write load high relative to disk write throughput. Flushing at >> 5.2 MB/s while application is making dirty pages at 1.6 MB/s. Doing >> a second flushing pass before checkpoint >> 10:05:38 Checkpoint started >> 10:05:52 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310100118.trx >> 10:28:52 Write load high relative to disk write throughput. Flushing at >> 5.1 MB/s while application is making dirty pages at 1.8 MB/s. Doing >> a second flushing pass before checkpoint >> 10:30:31 Checkpoint started >> 10:30:34 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310102554.trx >> 10:53:32 Write load high relative to disk write throughput. Flushing at >> 5.2 MB/s while application is making dirty pages at 1.4 MB/s. Doing >> a second flushing pass before checkpoint >> 10:54:43 Checkpoint started >> 10:55:03 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310105036.trx >> 11:19:29 Checkpoint started >> 11:20:01 Checkpoint finished, new log is >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310111504.trx >> >> >> here is the output of "status()" : >> >> SQL> status(); >> REPORT >> VARCHAR >> _______________________________________________________________________________ >> >> OpenLink Virtuoso Server >> Version 07.20.3217-pthreads for Linux as of Feb 10 2017 >> Started on: 2017-03-09 12:33 GMT+1 >> >> Database Status: >> File size 0, 1000960 pages, 247031 free. >> 2720000 buffers, 447219 used, 112398 dirty 4 wired down, repl age 13435443 >> 0 w. io 3 w/crsr. >> Disk Usage: 2212080 reads avg 0 msec, 0% r 0% w last 176 s, 12791013 >> writes flush 8.82 MB, >> 1221 read ahead, batch = 156. Autocompact 722034 in 631152 out, 12% >> saved col ac: 7230338 in 3% saved. >> Gate: 5993 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap. >> Log = >> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310105036.trx, >> 90073727 bytes >> 558107 pages have been changed since last backup (in checkpoint state) >> Current backup timestamp: 0x0000-0x00-0x00 >> Last backup date: unknown >> Clients: 4177045 connects, max 3 concurrent >> RPC: 25061533 calls, -4177308 pending, 2 max until now, 0 queued, 37 burst >> reads (0%), 0 second 5M large, 298M max >> Checkpoint Remap 132107 pages, 0 mapped back. 554 s atomic time. >> DB master 1000960 total 247030 free 132107 remap 44169 mapped back >> temp 165120 total 160375 free >> >> Lock Status: 0 deadlocks of which 0 2r1w, 28 waits, >> Currently 2 threads running 0 threads waiting 0 threads in vdb. >> Pending: >> 1100: IER 10.34.10.171 >> 1: IER 10.34.10.171 >> >> Client 1111:4175445: Account: dba, 364 bytes in, 359 bytes out, 1 stmts. >> PID: 25646, OS: unix, Application: unknown, IP#: 127.0.0.1 >> Transaction status: PENDING, 1 threads. >> Locks: >> >> Client 1111:4177046: Account: ABES, 2728 bytes in, 361 bytes out, 2 stmts. >> Transaction status: PENDING, 0 threads. >> Locks: >> >> >> Running Statements: >> Time (msec) Text >> 8 sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH >> <http://hub.abes.fr/refere >> 76 status() >> >> >> Hash indexes >> >> >> 44 Rows. -- 77 msec. >> >> >> >> Le 10/03/2017 à 02:03, Hugh Williams a écrit : >>> Hi Thomas, >>> >>> What is this JDBC Connector from Oracle that is being used for the inserts >>> in RDF/XML form ? >> Oracle 12.1 brings it's own jdk 1.6.0_37, so if i'm right ojdbc6.jar Thin >> Driver or OCI Driver : >> >> "Oracle JDBC Drivers release 12.1.0.1.0 production Readme.txt : >> Driver Versions >> --------------- >> >> These are the driver versions in the 12R1 release: >> >> - JDBC Thin Driver 12R1 >> 100% Java client-side JDBC driver for use in client applications, >> middle-tier servers and applets. >> >> - JDBC OCI Driver 12R1 >> Client-side JDBC driver for use on a machine where OCI 12R1 >> is installed. >> >> - JDBC Thin Server-side Driver 12R1 >> JDBC driver for use in Java program in the database to access >> remote Oracle databases. >> >> - JDBC Server-side Internal Driver 12R1 >> Server-side JDBC driver for use by Java Stored procedures. This >> driver used to be called the "JDBC Kprb Driver". >> >> >> >>> What is the ORCID dataset being used as the only one I see is in N-Triple >>> format from 2014 at: >>> >>> https://datahub.io/dataset/orcid_2014_dataset >> will ask for this >>> Performing inserts with transaction would consume more memory maintaining >>> the transaction than with log_enable(2) which auto commits without >>> transaction logging in memory. >> is it possible to have autocommit enabled the way we perform sparql INSERTs >> ? we used DEFINE sql:log-enable 2 in the query >>> The O_DIRECT param set in your INI file is an old param for which no real >>> benefit has been seen on current OS’es and on a Linux system setting >>> swappiness as detailed at: >>> >>> >>> https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning#Linux-only%20--%20”swappiness" >>> >>> Would give better results. >> ok, known this, thought it was done but raised back to 30 after check, will >> find a way to fix it @ 10. >> >>> There is also no real need to set ColumnStore = 1 as for as the RDF_QUAD >>> tables is column store by default in Virtuoso 7 , so that setting would >>> only have effect on default SQL table creation >>> >>> If you still have problems, can you provide a copy of your virtuoso.log >>> file and the output of the “status();” command for review ... >>> >>> Best Regards >>> Hugh Williams >>> Professional Services >>> OpenLink Software, Inc. // http://www.openlinksw.com/ >>> Weblog -- http://www.openlinksw.com/blogs/ >>> LinkedIn -- http://www.linkedin.com/company/openlink-software/ >>> Twitter -- http://twitter.com/OpenLink >>> Google+ -- http://plus.google.com/100570109519069333827/ >>> Facebook -- http://www.facebook.com/OpenLinkSoftware >>> Universal Data Access, Integration, and Management Technology Providers >>> >>> >>> >>>> On 9 Mar 2017, at 17:28, Thomas Michaux <mich...@abes.fr> wrote: >>>> >>>> Hello, >>>> >>>> We are loading ORCID 2016 in a V7 instance (Version 07.20.3217-pthreads >>>> for Linux as of Feb 10 2017), we DO NOT want to use the bulk loader, >>>> instead we are providing SPARQL inserts of RDF/XML files via JDBC >>>> connector from Oracle. >>>> >>>> Virtuoso is hosted on 8 cores, 32Gb platform. >>>> >>>> We successfully inserted 75 633 079 triples until virtuoso.log signals >>>> performances problems on "disk write throughput", is there something else >>>> to optimize in the virtuoso.ini while we are in this "loading" phase (no >>>> SPARQL "read" query from clients at the moment ) ? >>>> >>>> We've already done : >>>> >>>> - full text indexation has been delayed ( DB.DBA.VT_BATCH_UPDATE ( >>>> 'DB.DBA.RDF_OBJ', 'ON', 8640 ); ) >>>> - MaxCheckpointRemap = 505856 ( it's larger than 25% of total pages) >>>> - UnremapQuota = 0 >>>> - DefaultIsolation = 2 >>>> - O_DIRECT = 1 (we are on XFS filesystem) >>>> - ColumnStore = 1 (we started from a new, fresh .db, deleted >>>> all previous existing .db, .trx) >>>> >>>> Can we do something at transaction level ? We commit each JDBC insert as >>>> short as possible (1 insert-> 1 commit), query is : >>>> >>>> "'sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH '||graphe ||' { >>>> '|| var_clob_line|| ' }'" >>>> >>>> I can see that free memory slowly decrease, and finally the server hang. >>>> >>>> Thanks for your help ! (Attached is virtuoso.ini) >>>> >>>> Thomas >>>> <virtuoso.ini>------------------------------------------------------------------------------ >>>> Announcing the Oxford Dictionaries API! The API offers world-renowned >>>> dictionary content that is easy and intuitive to access. Sign up for an >>>> account today to start using our lexical data to power your apps and >>>> projects. Get started today and enter our developer competition. >>>> http://sdm.link/oxford_______________________________________________ >>>> Virtuoso-users mailing list >>>> Virtuoso-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >> ------------------------------------------------------------------------------ >> Announcing the Oxford Dictionaries API! The API offers world-renowned >> dictionary content that is easy and intuitive to access. Sign up for an >> account today to start using our lexical data to power your apps and >> projects. Get started today and enter our developer competition. >> http://sdm.link/oxford_______________________________________________ >> Virtuoso-users mailing list >> Virtuoso-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/virtuoso-users ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users