Hello,

FYI, virtuoso still loading but we needed to increase memory ressources, 
now the process use almost 40GB of ram :

[devel@tulipe-test2 ~]$ ./memcheck-virtuoso.sh
2017-03-15T17:54 VmSize: 41273424kB 5883

stats for the graph <http://hub.abes.fr/referentiel/ORCID/2016> (forget 
to mention, it's the only graph in db) :

239 451 028 triples


this:Dataset a void:Dataset ;
  rdfs:seeAlso <http://hub.abes.fr/referentiel/ORCID/2016> ;
  rdfs:label "" ;
  void:sparqlEndpoint <http://idrefplus.v102.abes.fr:8890/sparql> ;
  void:triples 239451028 ;
  void:classes 13 ;
  void:entities 57692917 ;
  void:distinctSubjects 57650847 ;
  void:properties 32 ;
  void:distinctObjects 72219514 .

this:sameAsLinks a void:Linkset ;
  void:inDataset this:Dataset ;
  void:triples 997389 ;
  void:linkPredicate owl:sameAs .


Le 14/03/2017 à 10:05, Thomas Michaux a écrit :
> Hi Hugh,
>
> Le 10/03/2017 à 14:01, Hugh Williams a écrit :
>> Hi Thomas,
>>
>> Is the ORCID dataset the only RDF datasets in the Virtuoso RDF Quad Store 
>> currently, or are there others ?
>>
>> What is the size of the ORCID dataset ie triple count ?
> I gave you wrong informations, because I misundertstood the process.
> Below are the correct details of our INSERT procedure from ORACLE db :
>
> - dataset is from ORCID 2016 XML download available on this page
> https://orcid.org/content/download-file ("The file contains the public
> information associated with each user's ORCID record. Each record is
> included as a separate file in both JSON and XML. "
> https://figshare.com/articles/ORCID_Public_Data_File_2016/4134027).
>
> They are uploaded inside ORACLE as XML records.
>
> - then in a ORACLE PL/SQL procedure we apply "on the fly" an XSLT
> stylesheet (using Oracle XMLTRANSFORM efficient XSLT transform engine)
> to have an RDX/XML file for each ORCID XML record in the ORACLE table
>
> - next in the process we use Jena tools to generate also "on the fly"
> TRIPLES from this RDF/XML results
>
> - these are the  triples we're finally inserting via a JDBC "SPARQL
> INSERT DATA INTO GRAPH..." call to virtuoso from the PL/SQL Oracle
> procedure via virtuoso JDBC driver (and not ORACLE jdbc driver, my
> mistake, as you guessed)
>
> -checking release of JDBC driver is > java -cp virtjdbc3.jar
> virtuoso.jdbc3.Driver
> OpenLink Virtuoso(TM) Driver for JDBC(TM) Version 3.x [Build 3.62]
>
> (the driver is embedded  inside ORACLE java JVM)
>
> Thanks in advance if you have suggestions.
>
> Last "statistics" on the graph size give  : 182 405 784 triples
>
>
> this:Dataset a void:Dataset ;
>    rdfs:seeAlso <http://hub.abes.fr/referentiel/ORCID/2016> ;
>    rdfs:label "" ;
>    void:sparqlEndpoint <http://idrefplus.v102.abes.fr:8890/sparql> ;
>    void:triples 182405784 ;
>    void:classes 13 ;
>    void:entities 43946633 ;
>    void:distinctSubjects 43922470 ;
>    void:properties 32 ;
>    void:distinctObjects 56509541 .
>
> this:sameAsLinks a void:Linkset ;
>    void:inDataset this:Dataset ;
>    void:triples 759462 ;
>
>
>
>> I would definitely suggest setting swappiness to 10 to reduce swapping to 
>> disk which should speed inserts rates.
> done
>> Looking at you status() command output I see "Clients: 4177045 connects, max 
>> 3 concurrent”  indicating more than 4 million SQL connections have been made 
>> to Virtuoso since it was started on 9th Mar . What is making that many 
>> connections, it is this insertion process
> yes, it is
>> or are there other clients reading from the instance also ?
> none for the moment, instance is private
>>    Apart from that the status() output looks fine with please of unused 
>> Buffers for database working set size to be increased and still fit in 
>> memory ,
> don't really understand the point about buffers, but also noticed the
> use is not "maximized", because there are no other clients reading from
> the instance I suppose ?
>> no deadlock and only one pending transaction which is one of your inserts.
>>
>> You talk about the Oracle JDBC Driver but I still don’t see its relevance as 
>> ultimately your insertions to Virtuoso must be done one of its client 
>> interfaces / services ie either the /sparql endpoint or the Virtuoso JDBC 
>> driver I would presume, thus which is it ?
> my mistake, as I said driver is > java -cp virtjdbc3.jar
> virtuoso.jdbc3.Driver
> OpenLink Virtuoso(TM) Driver for JDBC(TM) Version 3.x [Build 3.62]
>> The "DEFINE sql:log-enable 2” pragma being passed in the SPARQL insert 
>> queries does set row by row auto-commit and turn off transaction logging, 
>> which is the fastest transaction mode for write operations, see:
>>
>>      http://docs.openlinksw.com/virtuoso/fn_log_enable/
> ok, thanks, a good point
>
> Thomas
>> Best Regards
>> Hugh Williams
>> Professional Services
>> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>> Weblog   -- http://www.openlinksw.com/blogs/
>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>> Twitter  -- http://twitter.com/OpenLink
>> Google+  -- http://plus.google.com/100570109519069333827/
>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>> Universal Data Access, Integration, and Management Technology Providers
>>
>>
>>
>>> On 10 Mar 2017, at 10:54, Thomas Michaux <mich...@abes.fr> wrote:
>>>
>>> Hi,
>>>
>>> thanks Hugh, we reached 110 932 303 triples loaded from our ORCID dataset 
>>> since yesterday, and still loading...
>>>
>>>
>>>
>>> Virtuoso process use VmSize: 32227664kB 32708 of memory of :
>>>
>>> KiB Mem : 32780296 total,   243972 free, 29985320 used,  2551004 buff/cache
>>> KiB Swap:  2097148 total,  1734244 free,   362904 used.  2241196 avail Mem
>>>
>>> previous 4h logs :
>>>
>>> ...
>>>
>>> 06:03:28 Checkpoint started
>>> 06:04:11 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310055817.trx
>>> 06:28:41 Checkpoint started
>>> 06:28:44 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310062412.trx
>>> 06:52:58 Checkpoint started
>>> 06:53:16 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310064844.trx
>>> 07:17:14 Checkpoint started
>>> 07:17:18 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310071317.trx
>>> 07:39:58 Write load high relative to disk write throughput.  Flushing at    
>>>    5.5 MB/s while application is making dirty pages at       1.5 MB/s. 
>>> Doing a second flushing pass before checkpoint
>>> 07:41:10 Checkpoint started
>>> 07:41:17 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310073719.trx
>>> 08:04:53 Checkpoint started
>>> 08:04:56 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310080117.trx
>>> 08:27:35 Write load high relative to disk write throughput.  Flushing at    
>>>    5.7 MB/s while application is making dirty pages at       1.7 MB/s. 
>>> Doing a second flushing pass before checkpoint
>>> 08:28:45 Checkpoint started
>>> 08:29:02 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310082457.trx
>>> 08:51:43 Write load high relative to disk write throughput.  Flushing at    
>>>    5.4 MB/s while application is making dirty pages at       1.7 MB/s. 
>>> Doing a second flushing pass before checkpoint
>>> 08:52:57 Checkpoint started
>>> 08:53:01 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310084902.trx
>>> 09:15:40 Write load high relative to disk write throughput.  Flushing at    
>>>    5.6 MB/s while application is making dirty pages at       1.9 MB/s. 
>>> Doing a second flushing pass before checkpoint
>>> 09:16:59 Checkpoint started
>>> 09:17:13 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310091301.trx
>>> 09:39:57 Write load high relative to disk write throughput.  Flushing at    
>>>    5.4 MB/s while application is making dirty pages at       1.7 MB/s. 
>>> Doing a second flushing pass before checkpoint
>>> 09:41:13 Checkpoint started
>>> 09:41:16 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310093714.trx
>>> 10:04:13 Write load high relative to disk write throughput.  Flushing at    
>>>    5.2 MB/s while application is making dirty pages at       1.6 MB/s. 
>>> Doing a second flushing pass before checkpoint
>>> 10:05:38 Checkpoint started
>>> 10:05:52 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310100118.trx
>>> 10:28:52 Write load high relative to disk write throughput.  Flushing at    
>>>    5.1 MB/s while application is making dirty pages at       1.8 MB/s. 
>>> Doing a second flushing pass before checkpoint
>>> 10:30:31 Checkpoint started
>>> 10:30:34 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310102554.trx
>>> 10:53:32 Write load high relative to disk write throughput.  Flushing at    
>>>    5.2 MB/s while application is making dirty pages at       1.4 MB/s. 
>>> Doing a second flushing pass before checkpoint
>>> 10:54:43 Checkpoint started
>>> 10:55:03 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310105036.trx
>>> 11:19:29 Checkpoint started
>>> 11:20:01 Checkpoint finished, new log is 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310111504.trx
>>>
>>>
>>> here is the output of "status()" :
>>>
>>> SQL> status();
>>> REPORT
>>> VARCHAR
>>> _______________________________________________________________________________
>>>
>>> OpenLink Virtuoso  Server
>>> Version 07.20.3217-pthreads for Linux as of Feb 10 2017
>>> Started on: 2017-03-09 12:33 GMT+1
>>>
>>> Database Status:
>>>    File size 0, 1000960 pages, 247031 free.
>>>    2720000 buffers, 447219 used, 112398 dirty 4 wired down, repl age 
>>> 13435443 0 w. io 3 w/crsr.
>>>    Disk Usage: 2212080 reads avg 0 msec, 0% r 0% w last  176 s, 12791013 
>>> writes flush       8.82 MB,
>>>      1221 read ahead, batch = 156.  Autocompact 722034 in 631152 out, 12% 
>>> saved col ac: 7230338 in 3% saved.
>>> Gate:  5993 2nd in reads, 0 gate write waits, 0 in while read 0 busy scrap.
>>> Log = 
>>> /usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170310105036.trx,
>>>  90073727 bytes
>>> 558107 pages have been changed since last backup (in checkpoint state)
>>> Current backup timestamp: 0x0000-0x00-0x00
>>> Last backup date: unknown
>>> Clients: 4177045 connects, max 3 concurrent
>>> RPC: 25061533 calls, -4177308 pending, 2 max until now, 0 queued, 37 burst 
>>> reads (0%), 0 second 5M large, 298M max
>>> Checkpoint Remap 132107 pages, 0 mapped back. 554 s atomic time.
>>>      DB master 1000960 total 247030 free 132107 remap 44169 mapped back
>>>     temp  165120 total 160375 free
>>>
>>> Lock Status: 0 deadlocks of which 0 2r1w, 28 waits,
>>>     Currently 2 threads running 0 threads waiting 0 threads in vdb.
>>> Pending:
>>>    1100: IER 10.34.10.171
>>>        1: IER 10.34.10.171
>>>
>>> Client 1111:4175445:  Account: dba, 364 bytes in, 359 bytes out, 1 stmts.
>>> PID: 25646, OS: unix, Application: unknown, IP#: 127.0.0.1
>>> Transaction status: PENDING, 1 threads.
>>> Locks:
>>>
>>> Client 1111:4177046:  Account: ABES, 2728 bytes in, 361 bytes out, 2 stmts.
>>> Transaction status: PENDING, 0 threads.
>>> Locks:
>>>
>>>
>>> Running Statements:
>>> Time (msec) Text
>>>             8 sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH 
>>> <http://hub.abes.fr/refere
>>>            76 status()
>>>
>>>
>>> Hash indexes
>>>
>>>
>>> 44 Rows. -- 77 msec.
>>>
>>>
>>>
>>> Le 10/03/2017 à 02:03, Hugh Williams a écrit :
>>>> Hi Thomas,
>>>>
>>>> What is this JDBC Connector from Oracle that is being used for the inserts 
>>>> in RDF/XML form ?
>>> Oracle 12.1 brings it's own jdk  1.6.0_37, so if i'm right ojdbc6.jar Thin 
>>> Driver or OCI Driver :
>>>
>>> "Oracle JDBC Drivers release 12.1.0.1.0 production Readme.txt :
>>> Driver Versions
>>> ---------------
>>>
>>> These are the driver versions in the 12R1 release:
>>>
>>>    - JDBC Thin Driver 12R1
>>>      100% Java client-side JDBC driver for use in client applications,
>>>      middle-tier servers and applets.
>>>
>>>    - JDBC OCI Driver 12R1
>>>      Client-side JDBC driver for use on a machine where OCI 12R1
>>>      is installed.
>>>
>>>    - JDBC Thin Server-side Driver 12R1
>>>      JDBC driver for use in Java program in the database to access
>>>      remote Oracle databases.
>>>
>>>    - JDBC Server-side Internal Driver 12R1
>>>      Server-side JDBC driver for use by Java Stored procedures.  This
>>>      driver used to be called the "JDBC Kprb Driver".
>>>
>>>
>>>
>>>> What is the ORCID dataset being used as the only one I see is in N-Triple 
>>>> format from 2014 at:
>>>>
>>>>    https://datahub.io/dataset/orcid_2014_dataset
>>> will ask for this
>>>> Performing inserts with transaction would consume more memory maintaining 
>>>> the transaction than with log_enable(2) which auto commits without 
>>>> transaction logging in memory.
>>> is it possible to have autocommit enabled the way we perform sparql INSERTs 
>>> ? we used DEFINE sql:log-enable 2 in the query
>>>> The  O_DIRECT param set in your INI file is an old param for which no real 
>>>> benefit has been seen on current OS’es and on a Linux system setting 
>>>> swappiness as detailed at:
>>>>
>>>>    
>>>> https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning#Linux-only%20--%20”swappiness";
>>>>
>>>> Would give better results.
>>> ok, known this, thought it was done but raised back to 30 after check, will 
>>> find a way to fix it @ 10.
>>>
>>>> There is also no real need to set ColumnStore = 1 as for as the RDF_QUAD 
>>>> tables is column store by default in Virtuoso 7 , so that setting would 
>>>> only have effect on default SQL table creation
>>>>
>>>> If you still have problems, can you provide a copy of your virtuoso.log 
>>>> file and the output of the “status();” command for review ...
>>>>
>>>> Best Regards
>>>> Hugh Williams
>>>> Professional Services
>>>> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>>>> Weblog   -- http://www.openlinksw.com/blogs/
>>>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>>>> Twitter  -- http://twitter.com/OpenLink
>>>> Google+  -- http://plus.google.com/100570109519069333827/
>>>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>>>> Universal Data Access, Integration, and Management Technology Providers
>>>>
>>>>
>>>>
>>>>> On 9 Mar 2017, at 17:28, Thomas Michaux <mich...@abes.fr> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We are loading ORCID 2016 in a V7 instance (Version 07.20.3217-pthreads 
>>>>> for Linux as of Feb 10 2017), we DO NOT want to use the bulk loader, 
>>>>> instead we are providing SPARQL inserts of RDF/XML files via JDBC 
>>>>> connector from Oracle.
>>>>>
>>>>> Virtuoso is hosted on 8 cores, 32Gb platform.
>>>>>
>>>>> We successfully inserted 75 633 079 triples until virtuoso.log signals 
>>>>> performances problems on "disk write throughput", is there something else 
>>>>> to optimize in the virtuoso.ini while we are in this "loading" phase (no 
>>>>> SPARQL "read" query from clients at the moment ) ?
>>>>>
>>>>> We've already done :
>>>>>
>>>>> - full text indexation has been delayed ( DB.DBA.VT_BATCH_UPDATE ( 
>>>>> 'DB.DBA.RDF_OBJ', 'ON', 8640 ); )
>>>>> - MaxCheckpointRemap = 505856 ( it's larger than 25% of total pages)
>>>>> - UnremapQuota       = 0
>>>>> - DefaultIsolation   = 2
>>>>> - O_DIRECT                 = 1 (we are on XFS filesystem)
>>>>> - ColumnStore              = 1 (we started from a new, fresh .db, deleted 
>>>>> all previous existing .db, .trx)
>>>>>
>>>>> Can we do something at transaction level ? We commit each JDBC insert as 
>>>>> short as possible (1 insert-> 1 commit), query is :
>>>>>
>>>>> "'sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH '||graphe ||' { 
>>>>> '|| var_clob_line|| ' }'"
>>>>>
>>>>> I can see that free memory slowly decrease, and finally the server hang.
>>>>>
>>>>> Thanks for your help ! (Attached is virtuoso.ini)
>>>>>
>>>>> Thomas
>>>>> <virtuoso.ini>------------------------------------------------------------------------------
>>>>> Announcing the Oxford Dictionaries API! The API offers world-renowned
>>>>> dictionary content that is easy and intuitive to access. Sign up for an
>>>>> account today to start using our lexical data to power your apps and
>>>>> projects. Get started today and enter our developer competition.
>>>>> http://sdm.link/oxford_______________________________________________
>>>>> Virtuoso-users mailing list
>>>>> Virtuoso-users@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>> ------------------------------------------------------------------------------
>>> Announcing the Oxford Dictionaries API! The API offers world-renowned
>>> dictionary content that is easy and intuitive to access. Sign up for an
>>> account today to start using our lexical data to power your apps and
>>> projects. Get started today and enter our developer competition.
>>> http://sdm.link/oxford_______________________________________________
>>> Virtuoso-users mailing list
>>> Virtuoso-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to