Re: [Virtuoso-users] Write load high relative to disk write throughput / intensive JDBC sparql INSERT DATA INTO GRAPH

2017-03-09 Thread Hugh Williams
Hi Thomas,

What is this JDBC Connector from Oracle that is being used for the inserts in 
RDF/XML form ?

What is the ORCID dataset being used as the only one I see is in N-Triple 
format from 2014 at:

https://datahub.io/dataset/orcid_2014_dataset 


Performing inserts with transaction would consume more memory maintaining the 
transaction than with log_enable(2) which auto commits without transaction 
logging in memory.

The  O_DIRECT param set in your INI file is an old param for which no real 
benefit has been seen on current OS’es and on a Linux system setting swappiness 
as detailed at:


https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning#Linux-only%20--%20”swappiness";

Would give better results.

There is also no real need to set ColumnStore = 1 as for as the RDF_QUAD tables 
is column store by default in Virtuoso 7 , so that setting would only have 
effect on default SQL table creation

If you still have problems, can you provide a copy of your virtuoso.log file 
and the output of the “status();” command for review ...

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.  //  http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers



> On 9 Mar 2017, at 17:28, Thomas Michaux  wrote:
> 
> Hello,
> 
> We are loading ORCID 2016 in a V7 instance (Version 07.20.3217-pthreads for 
> Linux as of Feb 10 2017), we DO NOT want to use the bulk loader, instead we 
> are providing SPARQL inserts of RDF/XML files via JDBC connector from Oracle.
> 
> Virtuoso is hosted on 8 cores, 32Gb platform.
> 
> We successfully inserted 75 633 079 triples until virtuoso.log signals 
> performances problems on "disk write throughput", is there something else to 
> optimize in the virtuoso.ini while we are in this "loading" phase (no SPARQL 
> "read" query from clients at the moment ) ?
> 
> We've already done :
> 
> - full text indexation has been delayed ( DB.DBA.VT_BATCH_UPDATE ( 
> 'DB.DBA.RDF_OBJ', 'ON', 8640 ); ) 
> - MaxCheckpointRemap = 505856 ( it's larger than 25% of total pages)
> - UnremapQuota   = 0
> - DefaultIsolation   = 2
> - O_DIRECT = 1 (we are on XFS filesystem)
> - ColumnStore  = 1 (we started from a new, fresh .db, deleted all 
> previous existing .db, .trx)
> 
> Can we do something at transaction level ? We commit each JDBC insert as 
> short as possible (1 insert-> 1 commit), query is :
> 
> "'sparql DEFINE sql:log-enable 2 INSERT DATA INTO GRAPH '||graphe ||' { '|| 
> var_clob_line|| ' }'"
> 
> I can see that free memory slowly decrease, and finally the server hang.
> 
> Thanks for your help ! (Attached is virtuoso.ini)
> 
> Thomas
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford___
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users



smime.p7s
Description: S/MIME cryptographic signature
--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Geospatial query

2017-03-09 Thread Hugh Williams
Hi 

You links load blank pages  ... I was expecting you to data and query to enable 
local recreation ...

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.  //  http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers



> On 8 Mar 2017, at 09:10, Pasquale Di Donato  
> wrote:
> 
> Hi Hugh,
> 
> we are publishing administrative units and use case is "Give me the admin 
> unit @POINT". That's clear that @POINT there is only one administrative units 
> of a specific type.
> E.g. with the following query I'm asking the admin unit "Canton" @POINT:
> 
> https://tinyurl.com/z24uapm 
> 
> Then I get one resource: correct.
> 
> Same query at another point, but returning 2 resources as a consequence of 
> the fact that st_contains uses BBOX: not correct for me
> 
> https://tinyurl.com/hh6o834 
> 
> Ciao
> Pasquale
> 
> 
> 
> 
> 
> 
> 
> On Fri, Mar 3, 2017 at 1:23 AM, Hugh Williams  > wrote:
> Hi 
> 
> Yes, the st_contains functions does work within a bounded box as detailed at:
> 
>   
> https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtGeoSPARQLEnhancementDocs#Virtuoso
>  Geo Spatial geometry functions 
> 
> 
> Why is this a problem for you, do you have a test case to demonstrate ?
> 
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.  //  http://www.openlinksw.com/ 
> 
> Weblog   -- http://www.openlinksw.com/blogs/ 
> 
> LinkedIn -- http://www.linkedin.com/company/openlink-software/ 
> 
> Twitter  -- http://twitter.com/OpenLink 
> Google+  -- http://plus.google.com/100570109519069333827/ 
> 
> Facebook -- http://www.facebook.com/OpenLinkSoftware 
> 
> Universal Data Access, Integration, and Management Technology Providers
> 
> 
> 
>> On 27 Feb 2017, at 15:36, Pasquale Di Donato > > wrote:
>> 
>> Dear list,
>> 
>> I'm trying to select which spatial object is at a given point with the 
>> function bif:st_contains.
>> The result is more than 1 geometry: this let me assume that bif:st_contains 
>> works with bounding boxed instead of real geometries? 
>> 
>> Cheers
>> Pasquale
>> 
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org ! 
>> http://sdm.link/slashdot___ 
>> 
>> Virtuoso-users mailing list
>> Virtuoso-users@lists.sourceforge.net 
>> 
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users 
>> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature
--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] Write load high relative to disk write throughput / intensive JDBC sparql INSERT DATA INTO GRAPH

2017-03-09 Thread Thomas Michaux

Hello,

We are loading ORCID 2016 in a V7 instance (Version 07.20.3217-pthreads 
for Linux as of Feb 10 2017), we DO NOT want to use the bulk loader, 
instead we are providing SPARQL inserts of RDF/XML files via JDBC 
connector from Oracle.


Virtuoso is hosted on 8 cores, 32Gb platform.

We successfully inserted 75 633 079 triples until virtuoso.log signals 
performances problems on "disk write throughput", is there something 
else to optimize in the virtuoso.ini while we are in this "loading" 
phase (no SPARQL "read" query from clients at the moment ) ?


We've already done :

- full text indexation has been delayed ( DB.DBA.VT_BATCH_UPDATE ( 
'DB.DBA.RDF_OBJ', 'ON', 8640 ); )

- MaxCheckpointRemap = 505856 ( it's larger than 25% of total pages)
- UnremapQuota   = 0
- DefaultIsolation   = 2
- O_DIRECT = 1 (we are on XFS filesystem)
- ColumnStore  = 1 (we started from a new, fresh .db, 
deleted all previous existing .db, .trx)


Can we do something at transaction level ? We commit each JDBC insert as 
short as possible (1 insert-> 1 commit), query is :


"'sparql *DEFINE sql:log-enable 2* INSERT DATA INTO GRAPH '||graphe ||' 
{ '|| var_clob_line|| ' }'"


I can see that free memory slowly decrease, and finally the server hang.

Thanks for your help ! (Attached is virtuoso.ini)

Thomas
;
;  virtuoso.ini
;
;  Configuration file for the OpenLink Virtuoso VDBMS Server
;
;  To learn more about this product, or any other product in our
;  portfolio, please check out our web site at:
;
;  http://virtuoso.openlinksw.com/
;
;  or contact us at:
;
;  general.informat...@openlinksw.com
;
;  If you have any technical questions, please contact our support
;  staff at:
;
;  technical.supp...@openlinksw.com
;
;
;  Database setup
;
[Database]
DatabaseFile   = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.db
ErrorLogFile   = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.log
LockFile   = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.lck
TransactionFile= 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170309162914.trx
;TransactionFile= /LN_Hupe/virtuoso20151207171442.trx
xa_persistent_file = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.pxa
ErrorLogLevel  = 7
FileExtend = 200
MaxCheckpointRemap = 505856
UnremapQuota   = 0
DefaultIsolation   = 2
Striping   = 0
TempStorage= TempDatabase

[TempDatabase]
DatabaseFile   = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso-temp.db
TransactionFile= 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso-temp.trx
MaxCheckpointRemap = 2000
Striping   = 0

;
;  Server parameters
;
[Parameters]
ServerPort   = 
LiteMode = 0
DisableUnixSocket= 1
DisableTcpSocket = 0
;SSLServerPort  = 2111
;SSLCertificate = cert.pem
;SSLPrivateKey  = pk.pem
;X509ClientVerify   = 0
;X509ClientVerifyDepth  = 0
;X509ClientVerifyCAFile = ca.pem
MaxClientConnections = 10
CheckpointInterval   = 20
O_DIRECT = 1
CaseMode = 2
MaxStaticCursorRows  = 5000
CheckpointAuditTrail = 1
AllowOSCalls = 0
SchedulerInterval= 10
;DirsAllowed  = ., 
/usr/local/virtuoso-opensource/share/virtuoso/vad, /home/devel, /LN_Hupe, 
/LN_Hupe/dumpviaf
;production
DirsAllowed  = ., 
/usr/local/virtuoso-opensource/share/virtuoso/vad, /home/devel/logs
ThreadCleanupInterval= 1
ThreadThreshold  = 10
ResourcesCleanupInterval = 1
FreeTextBatchSize= 10
SingleCPU= 0
VADInstallDir= /usr/local/virtuoso-opensource/share/virtuoso/vad/
PrefixResultNames= 0
RdfFreeTextRulesSize = 100
IndexTreeMaps= 256
MaxMemPoolSize   = 2
PrefixResultNames= 0
MacSpotlight = 0
IndexTreeMaps= 64
MaxQueryMem  = 3G   ; memory allocated to query processor
VectorSize   = 1000 ; initial parallel query vector (array of query 
operations) size
MaxVectorSize= 100  ; query vector size threshold.
AdjustVectorSize = 0
ThreadsPerQuery  = 8
AsyncQueueMaxThreads = 10
ColumnStore  = 1
;server side query logging
;At run time, this may be enabled or disabled with prof_enable (), overriding 
the specification of the ini file
;QueryLog = virtuoso.qrl
;;
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;;
;; Uncomment next two lines if there is 2 GB system memory free
;NumberOfBuffers  = 17
;MaxDirtyBuffers  = 13
;; Uncomment next two lines if there is 4 GB system memory free
;NumberOfBuffers  = 34
;