Hello, FYI, virtuoso still loading but we needed to increase memory ressources, now the process use almost 40GB of ram :
[devel@tulipe-test2 ~]$ ./memcheck-virtuoso.sh 2017-03-15T17:54 VmSize: 41273424kB 5883 stats for the graph <http://hub.abes.fr/referentiel/ORCID/2016> (forget to mention, it's the only graph in db) : 239 451 028 triples this:Dataset a void:Dataset ; rdfs:seeAlso <http://hub.abes.fr/referentiel/ORCID/2016> ; rdfs:label "" ; void:sparqlEndpoint <http://idrefplus.v102.abes.fr:8890/sparql> ; void:triples 239451028 ; void:classes 13 ; void:entities 57692917 ; void:distinctSubjects 57650847 ; void:properties 32 ; void:distinctObjects 72219514 . this:sameAsLinks a void:Linkset ; void:inDataset this:Dataset ; void:triples 997389 ; void:linkPredicate owl:sameAs . Le 14/03/2017 à 10:05, Thomas Michaux a écrit : > Hi Hugh, > > Le 10/03/2017 à 14:01, Hugh Williams a écrit : >> Hi Thomas, >> >> Is the ORCID dataset the only RDF datasets in the Virtuoso RDF Quad Store >> currently, or are there others ? >> >> What is the size of the ORCID dataset ie triple count ? > I gave you wrong informations, because I misundertstood the process. > Below are the correct details of our INSERT procedure from ORACLE db : > > - dataset is from ORCID 2016 XML download available on this page > https://orcid.org/content/download-file ("The file contains the public > information associated with each user's ORCID record. Each record is > included as a separate file in both JSON and XML. " > https://figshare.com/articles/ORCID_Public_Data_File_2016/4134027). > > They are uploaded inside ORACLE as XML records. > > - then in a ORACLE PL/SQL procedure we apply "on the fly" an XSLT > stylesheet (using Oracle XMLTRANSFORM efficient XSLT transform engine) > to have an RDX/XML file for each ORCID XML record in the ORACLE table > > - next in the process we use Jena tools to generate also "on the fly" > TRIPLES from this RDF/XML results > > - these are the triples we're finally inserting via a JDBC "SPARQL > INSERT DATA INTO GRAPH..." call to virtuoso from the PL/SQL Oracle > procedure via virtuoso JDBC driver (and not ORACLE jdbc driver, my > mistake, as you guessed) > > -checking release of JDBC driver is > java -cp virtjdbc3.jar > virtuoso.jdbc3.Driver > OpenLink Virtuoso(TM) Driver for JDBC(TM) Version 3.x [Build 3.62] > > (the driver is embedded inside ORACLE java JVM) > > Thanks in advance if you have suggestions. > > Last "statistics" on the graph size give : 182 405 784 triples > > > this:Dataset a void:Dataset ; > rdfs:seeAlso <http://hub.abes.fr/referentiel/ORCID/2016> ; > rdfs:label "" ; > void:sparqlEndpoint <http://idrefplus.v102.abes.fr:8890/sparql> ; > void:triples 182405784 ; > void:classes 13 ; > void:entities 43946633 ; > void:distinctSubjects 43922470 ; > void:properties 32 ; > void:distinctObjects 56509541 . > > this:sameAsLinks a void:Linkset ; > void:inDataset this:Dataset ; > void:triples 759462 ; > > > >> I would definitely suggest setting swappiness to 10 to reduce swapping to >> disk which should speed inserts rates. > done >> Looking at you status() command output I see "Clients: 4177045 connects, max >> 3 concurrent” indicating more than 4 million SQL connections have been made >> to Virtuoso since it was started on 9th Mar . What is making that many >> connections, it is this insertion process > yes, it is >> or are there other clients reading from the instance also ? > none for the moment, instance is private >> Apart from that the status() output looks fine with please of unused >> Buffers for database working set size to be increased and still fit in >> memory , > don't really understand the point about buffers, but also noticed the > use is not "maximized", because there are no other clients reading from > the instance I suppose ? >> no deadlock and only one pending transaction which is one of your inserts. >> >> You talk about the Oracle JDBC Driver but I still don’t see its relevance as >> ultimately your insertions to Virtuoso must be done one of its client >> interfaces / services ie either the /sparql endpoint or the Virtuoso JDBC >> driver I would presume, thus which is it ? > my mistake, as I said driver is > java -cp virtjdbc3.jar > virtuoso.jdbc3.Driver > OpenLink Virtuoso(TM) Driver for JDBC(TM) Version 3.x [Build 3.62] >> The "DEFINE sql:log-enable 2” pragma being passed in the SPARQL insert >> queries does set row by row auto-commit and turn off transaction logging, >> which is the fastest transaction mode for write operations, see: >> >> http://docs.openlinksw.com/virtuoso/fn_log_enable/ > ok, thanks, a good point > > Thomas >> Best Regards >> Hugh Williams >> Professional Services >> OpenLink Software, Inc. // http://www.openlinksw.com/ >> Weblog -- http://www.openlinksw.com/blogs/ >> LinkedIn -- http://www.linkedin.com/company/openlink-software/ >> Twitter -- http://twitter.com/OpenLink >> Google+ -- http://plus.google.com/100570109519069333827/ >> Facebook -- http://www.facebook.com/OpenLinkSoftware >> Universal Data Access, Integration, and Management Technology Providers >> >> >> >>> On 10 Mar 2017, at 10:54, Thomas Michaux <mich...@abes.fr> wrote: >>> >>> Hi, >>> >>> thanks Hugh, we reached 110 932 303 triples loaded from our ORCID dataset >>> since yesterday, and still loading... >>> >>> >>> >>> Virtuoso process use VmSize: 32227664kB 32708 of memory of : >>> >>> KiB Mem : 32780296 total, 243972 free, 29985320 used, 2551004 buff/cache >>> KiB Swap: 2097148 total, 1734244 free, 362904 used. 2241196 avail Mem >>> >>> previous 4h logs : >>> >>> ... >>> >>> 06:03:28 Checkpoint started >>> 06:04:11 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310055817.trx >>> 06:28:41 Checkpoint started >>> 06:28:44 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310062412.trx >>> 06:52:58 Checkpoint started >>> 06:53:16 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310064844.trx >>> 07:17:14 Checkpoint started >>> 07:17:18 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310071317.trx >>> 07:39:58 Write load high relative to disk write throughput. Flushing at >>> 5.5 MB/s while application is making dirty pages at 1.5 MB/s. >>> Doing a second flushing pass before checkpoint >>> 07:41:10 Checkpoint started >>> 07:41:17 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310073719.trx >>> 08:04:53 Checkpoint started >>> 08:04:56 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310080117.trx >>> 08:27:35 Write load high relative to disk write throughput. Flushing at >>> 5.7 MB/s while application is making dirty pages at 1.7 MB/s. >>> Doing a second flushing pass before checkpoint >>> 08:28:45 Checkpoint started >>> 08:29:02 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310082457.trx >>> 08:51:43 Write load high relative to disk write throughput. Flushing at >>> 5.4 MB/s while application is making dirty pages at 1.7 MB/s. >>> Doing a second flushing pass before checkpoint >>> 08:52:57 Checkpoint started >>> 08:53:01 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310084902.trx >>> 09:15:40 Write load high relative to disk write throughput. Flushing at >>> 5.6 MB/s while application is making dirty pages at 1.9 MB/s. >>> Doing a second flushing pass before checkpoint >>> 09:16:59 Checkpoint started >>> 09:17:13 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310091301.trx >>> 09:39:57 Write load high relative to disk write throughput. Flushing at >>> 5.4 MB/s while application is making dirty pages at 1.7 MB/s. >>> Doing a second flushing pass before checkpoint >>> 09:41:13 Checkpoint started >>> 09:41:16 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310093714.trx >>> 10:04:13 Write load high relative to disk write throughput. Flushing at >>> 5.2 MB/s while application is making dirty pages at 1.6 MB/s. >>> Doing a second flushing pass before checkpoint >>> 10:05:38 Checkpoint started >>> 10:05:52 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310100118.trx >>> 10:28:52 Write load high relative to disk write throughput. Flushing at >>> 5.1 MB/s while application is making dirty pages at 1.8 MB/s. >>> Doing a second flushing pass before checkpoint >>> 10:30:31 Checkpoint started >>> 10:30:34 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310102554.trx >>> 10:53:32 Write load high relative to disk write throughput. Flushing at >>> 5.2 MB/s while application is making dirty pages at 1.4 MB/s. >>> Doing a second flushing pass before checkpoint >>> 10:54:43 Checkpoint started >>> 10:55:03 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310105036.trx >>> 11:19:29 Checkpoint started >>> 11:20:01 Checkpoint finished, new log is >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310111504.trx >>> >>> >>> here is the output of "status()" : >>> >>> SQL> status(); >>> REPORT >>> VARCHAR >>> _______________________________________________________________________________ >>> >>> OpenLink Virtuoso Server >>> Version 07.20.3217-pthreads for Linux as of Feb 10 2017 >>> Started on: 2017-03-09 12:33 GMT+1 >>> >>> Database Status: >>> File size 0, 1000960 pages, 247031 free. >>> 2720000 buffers, 447219 used, 112398 dirty 4 wired down, repl age >>> 13435443 0 w. io 3 w/crsr. >>> Disk Usage: 2212080 reads avg 0 msec, 0% r 0% w last 176 s, 12791013 >>> writes flush 8.82 MB, >>> 1221 read ahead, batch = 156. Autocompact 722034 in 631152 out, 12% >>> saved col ac: 7230338 in 3% saved. >>> Gate: 5993 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap. >>> Log = >>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310105036.trx, >>> 90073727 bytes >>> 558107 pages have been changed since last backup (in checkpoint state) >>> Current backup timestamp: 0x0000-0x00-0x00 >>> Last backup date: unknown >>> Clients: 4177045 connects, max 3 concurrent >>> RPC: 25061533 calls, -4177308 pending, 2 max until now, 0 queued, 37 burst >>> reads (0%), 0 second 5M large, 298M max >>> Checkpoint Remap 132107 pages, 0 mapped back. 554 s atomic time. >>> DB master 1000960 total 247030 free 132107 remap 44169 mapped back >>> temp 165120 total 160375 free >>> >>> Lock Status: 0 deadlocks of which 0 2r1w, 28 waits, >>> Currently 2 threads running 0 threads waiting 0 threads in vdb. >>> Pending: >>> 1100: IER 10.34.10.171 >>> 1: IER 10.34.10.171 >>> >>> Client 1111:4175445: Account: dba, 364 bytes in, 359 bytes out, 1 stmts. >>> PID: 25646, OS: unix, Application: unknown, IP#: 127.0.0.1 >>> Transaction status: PENDING, 1 threads. >>> Locks: >>> >>> Client 1111:4177046: Account: ABES, 2728 bytes in, 361 bytes out, 2 stmts. >>> Transaction status: PENDING, 0 threads. >>> Locks: >>> >>> >>> Running Statements: >>> Time (msec) Text >>> 8 sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH >>> <http://hub.abes.fr/refere >>> 76 status() >>> >>> >>> Hash indexes >>> >>> >>> 44 Rows. -- 77 msec. >>> >>> >>> >>> Le 10/03/2017 à 02:03, Hugh Williams a écrit : >>>> Hi Thomas, >>>> >>>> What is this JDBC Connector from Oracle that is being used for the inserts >>>> in RDF/XML form ? >>> Oracle 12.1 brings it's own jdk 1.6.0_37, so if i'm right ojdbc6.jar Thin >>> Driver or OCI Driver : >>> >>> "Oracle JDBC Drivers release 12.1.0.1.0 production Readme.txt : >>> Driver Versions >>> --------------- >>> >>> These are the driver versions in the 12R1 release: >>> >>> - JDBC Thin Driver 12R1 >>> 100% Java client-side JDBC driver for use in client applications, >>> middle-tier servers and applets. >>> >>> - JDBC OCI Driver 12R1 >>> Client-side JDBC driver for use on a machine where OCI 12R1 >>> is installed. >>> >>> - JDBC Thin Server-side Driver 12R1 >>> JDBC driver for use in Java program in the database to access >>> remote Oracle databases. >>> >>> - JDBC Server-side Internal Driver 12R1 >>> Server-side JDBC driver for use by Java Stored procedures. This >>> driver used to be called the "JDBC Kprb Driver". >>> >>> >>> >>>> What is the ORCID dataset being used as the only one I see is in N-Triple >>>> format from 2014 at: >>>> >>>> https://datahub.io/dataset/orcid_2014_dataset >>> will ask for this >>>> Performing inserts with transaction would consume more memory maintaining >>>> the transaction than with log_enable(2) which auto commits without >>>> transaction logging in memory. >>> is it possible to have autocommit enabled the way we perform sparql INSERTs >>> ? we used DEFINE sql:log-enable 2 in the query >>>> The O_DIRECT param set in your INI file is an old param for which no real >>>> benefit has been seen on current OS’es and on a Linux system setting >>>> swappiness as detailed at: >>>> >>>> >>>> https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning#Linux-only%20--%20”swappiness" >>>> >>>> Would give better results. >>> ok, known this, thought it was done but raised back to 30 after check, will >>> find a way to fix it @ 10. >>> >>>> There is also no real need to set ColumnStore = 1 as for as the RDF_QUAD >>>> tables is column store by default in Virtuoso 7 , so that setting would >>>> only have effect on default SQL table creation >>>> >>>> If you still have problems, can you provide a copy of your virtuoso.log >>>> file and the output of the “status();” command for review ... >>>> >>>> Best Regards >>>> Hugh Williams >>>> Professional Services >>>> OpenLink Software, Inc. // http://www.openlinksw.com/ >>>> Weblog -- http://www.openlinksw.com/blogs/ >>>> LinkedIn -- http://www.linkedin.com/company/openlink-software/ >>>> Twitter -- http://twitter.com/OpenLink >>>> Google+ -- http://plus.google.com/100570109519069333827/ >>>> Facebook -- http://www.facebook.com/OpenLinkSoftware >>>> Universal Data Access, Integration, and Management Technology Providers >>>> >>>> >>>> >>>>> On 9 Mar 2017, at 17:28, Thomas Michaux <mich...@abes.fr> wrote: >>>>> >>>>> Hello, >>>>> >>>>> We are loading ORCID 2016 in a V7 instance (Version 07.20.3217-pthreads >>>>> for Linux as of Feb 10 2017), we DO NOT want to use the bulk loader, >>>>> instead we are providing SPARQL inserts of RDF/XML files via JDBC >>>>> connector from Oracle. >>>>> >>>>> Virtuoso is hosted on 8 cores, 32Gb platform. >>>>> >>>>> We successfully inserted 75 633 079 triples until virtuoso.log signals >>>>> performances problems on "disk write throughput", is there something else >>>>> to optimize in the virtuoso.ini while we are in this "loading" phase (no >>>>> SPARQL "read" query from clients at the moment ) ? >>>>> >>>>> We've already done : >>>>> >>>>> - full text indexation has been delayed ( DB.DBA.VT_BATCH_UPDATE ( >>>>> 'DB.DBA.RDF_OBJ', 'ON', 8640 ); ) >>>>> - MaxCheckpointRemap = 505856 ( it's larger than 25% of total pages) >>>>> - UnremapQuota = 0 >>>>> - DefaultIsolation = 2 >>>>> - O_DIRECT = 1 (we are on XFS filesystem) >>>>> - ColumnStore = 1 (we started from a new, fresh .db, deleted >>>>> all previous existing .db, .trx) >>>>> >>>>> Can we do something at transaction level ? We commit each JDBC insert as >>>>> short as possible (1 insert-> 1 commit), query is : >>>>> >>>>> "'sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH '||graphe ||' { >>>>> '|| var_clob_line|| ' }'" >>>>> >>>>> I can see that free memory slowly decrease, and finally the server hang. >>>>> >>>>> Thanks for your help ! (Attached is virtuoso.ini) >>>>> >>>>> Thomas >>>>> <virtuoso.ini>------------------------------------------------------------------------------ >>>>> Announcing the Oxford Dictionaries API! The API offers world-renowned >>>>> dictionary content that is easy and intuitive to access. Sign up for an >>>>> account today to start using our lexical data to power your apps and >>>>> projects. Get started today and enter our developer competition. >>>>> http://sdm.link/oxford_______________________________________________ >>>>> Virtuoso-users mailing list >>>>> Virtuoso-users@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >>> ------------------------------------------------------------------------------ >>> Announcing the Oxford Dictionaries API! The API offers world-renowned >>> dictionary content that is easy and intuitive to access. Sign up for an >>> account today to start using our lexical data to power your apps and >>> projects. Get started today and enter our developer competition. >>> http://sdm.link/oxford_______________________________________________ >>> Virtuoso-users mailing list >>> Virtuoso-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users