Hi Andrejs, 

So you are basically saying the Freebase RDF dataset dump has been split into 
files of 10,000,000 triples each which you are loading with the Virtuoso RDF 
Bulk loader  (rdf_loader_run) under the control of a Java application, and 120 
ie 1.2 billion triples of the datasets are loading before the "Virtuoso 
Communications Link Failure (timeout) " error occurs in the app ?

You then indicate the Virtuoso server is restarted but the load fails after the 
first batch, is this then loading the 121st split file ie continuing from where 
the load left off as would be indicated by the ll_status column of the 
load_list table or is the load started from scratch ?

What does the "status()"; run from isql report, in particular as the number of 
"buffers" allocated and used, as I suspect the system is running out of memory 
loading the Freebase datasets which seem to have larger than average literal 
values  and a lot of duplicate data which consume more than the average and 
thus require more then 10 bytes per QUAD of storage. In testing we have 
performed internally it was found that Freebase need about 100GB RAM to load 
completely in memory for best performance, loading in about 7000 sec ie about 2 
hrs on a single server machine.

Having said that the Virtuoso server should not crash the load should just take 
a lot longer as the data will continuously be swapping to and from disk as the 
memory (buffers) are all used up.  If SSD's are available for faster access 
then the load will complete a lot quicker than if using normal disks. Is any 
error reported in the "vrtuoso.log" or /var/log/messages file as to why the 
server crashed/shutdown unexpectedly ?

I note you have disk striping enabled across disks with 2 stripes and 2 
segments. How many rdf_loader_run() processes  do you have running during the 
load, as if you have 8 VCPUs then you should be running  3 or 4 of these for 
best parallel loading of the datassets ?

Note I would also suggest the the use of the Virtuoso LDMeter functions which 
can monitor the load rates of the Virtuoso RDF Bulk loader process as detailed 
at:

        
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideLDMeterUtility
 

and should show how the load rates decrease as all the buffers are consumed and 
data is having to be swapped to and from disk ...

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.      //              http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers

On 7 Apr 2014, at 14:10, Andrejs Abele <andr...@sindicetech.com> wrote:

> Read 300 IOPS 120 MB/s 180 MB/s
> Write 1500 IOPS 90 MB/s 120 MB/s
> 
> 
> Hi,
> 
> I'm running virtuoso 07.10.3207 in google cloud.
> Instance : 8VCPU, 30 GB memmory (n1-standard-8), 10 GB boot disk + 2x 1000 GB 
> storage disks.
> Using batch loading, I'm trying to load freebase.
> One batch contains 10 000 000 triples. After loading 120 batches, loading 
> fails with this error.
> 
> Apr 04, 2014 22:10:50 PM 
> org.springframework.beans.factory.xml.XmlBeanDefinitionReader 
> loadBeanDefinitions
> INFO: Loading XML bean definitions from class path resource 
> [org/springframework/jdbc/support/sql-error-codes.xml]
> Apr 04, 2014 22:10:51 PM 
> org.springframework.jdbc.support.SQLErrorCodesFactory <init>
> INFO: SQLErrorCodes loaded: [DB2, Derby, H2, HSQL, Informix, MS-SQL, MySQL, 
> Oracle, PostgreSQL, Sybase]
> 22:39:52.052 [main] ERROR c.s.f.l.j.VirtuosoJDBCHelperTemplateBased - error 
> running loader
> org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL 
> grammar [rdf_loader_run()]; nested exception is 
> virtuoso.jdbc4.VirtuosoException: Virtuoso Communications Link Failure 
> (timeout) : Read timed out
>         at 
> org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:97)
>  ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
>  ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80)
>  ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80)
>  ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:407) 
> ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:428) 
> ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> com.sindicetech.fbutils.loader.jdbc.VirtuosoJDBCHelperTemplateBased.runLoader(VirtuosoJDBCHelperTemplateBased.java:131)
>  ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at com.sindicetech.fbutils.loader.Miniloader.load(Miniloader.java:87) 
> [loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at com.sindicetech.fbutils.loader.Miniloader.main(Miniloader.java:67) 
> [loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
> Caused by: virtuoso.jdbc4.VirtuosoException: Virtuoso Communications Link 
> Failure (timeout) : Read timed out
>         at virtuoso.jdbc4.VirtuosoFuture.nextResult(Unknown Source) 
> ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at virtuoso.jdbc4.VirtuosoResultSet.process_result(Unknown Source) 
> ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at virtuoso.jdbc4.VirtuosoResultSet.<init>(Unknown Source) 
> ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at virtuoso.jdbc4.VirtuosoStatement.sendQuery(Unknown Source) 
> ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at virtuoso.jdbc4.VirtuosoStatement.execute(Unknown Source) 
> ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.springframework.jdbc.core.JdbcTemplate$1ExecuteStatementCallback.doInStatement(JdbcTemplate.java:421)
>  ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         at 
> org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:396) 
> ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>         ... 4 common frames omitted
> an error accurred during load of file 
> /data2/virtuoso-load/freebase-dump-2014-03-09-good-0121.gz : 
> com.sindicetech.fbutils.loader.jdbc.VirtuosoJDBCException: error running 
> loader
> 
> And in virtuoso logs I get
> ...
> 21:35:28 * Monitor: High disk read (2)
> 21:37:48 * Monitor: High disk read (2)
> 21:39:53 * Monitor: High disk read (2)
> 21:41:53 * Monitor: High disk read (2)
> 21:43:53 * Monitor: High disk read (2)
> 21:45:27 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 21:45:53 * Monitor: High disk read (2)
> 21:47:27 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 21:47:53 * Monitor: High disk read (2)
> 21:49:27 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 21:49:53 * Monitor: High disk read (2)
> 21:51:47 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 21:51:53 * Monitor: High disk read (2)
> 21:53:48 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 21:53:54 * Monitor: High disk read (2)
> 21:55:54 * Monitor: High disk read (2)
> 21:57:04 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> 21:57:54 * Monitor: High disk read (2)
> 21:59:54 * Monitor: High disk read (2)
> 22:01:54 * Monitor: High disk read (2)
> 22:02:40 * Monitor: CPU% is low while there are large numbers of runnable 
> threads
> ...
> 
> I tried to restart virtuoso, but after restart, loading failed at first batch.
> Based on Google documentation 1000 disk should have:
> Read 300 IOPS 120 MB/s 180 MB/s
> Write 1500 IOPS 90 MB/s 120 MB/s
> 
> My virtuoso.ini
> 
> ;
> ;  virtuoso.ini
> 
> ;  Database setup
> ;
> [Database]
> DatabaseFile                    = /usr/local/virtuosoCtl/virtuoso.db
> ErrorLogFile                    = /usr/local/virtuosoCtl/virtuoso.log
> LockFile                        = /usr/local/virtuosoCtl/virtuoso.lck
> TransactionFile                 = /usr/local/virtuosoCtl/virtuoso.trx
> xa_persistent_file              = /usr/local/virtuosoCtl/virtuoso.pxa
> ErrorLogLevel                   = 7
> FileExtend                      = 200
> MaxCheckpointRemap              = 3000000
> Striping                        = 1
> TempStorage                     = TempDatabase
> 
> 
> [TempDatabase]
> DatabaseFile                    = /usr/local/virtuosoCtl/virtuoso-temp.db
> TransactionFile                 = /usr/local/virtuosoCtl/virtuoso-temp.trx
> MaxCheckpointRemap              = 15000
> Striping                        = 1
> 
> 
> ;
> ;  Server parameters
> ;
> [Parameters]
> ServerPort                      = 1111
> LiteMode                        = 0
> DisableUnixSocket               = 1
> DisableTcpSocket                = 0
> MaxClientConnections            = 10
> CheckpointInterval              = 60
> O_DIRECT                        = 0
> CaseMode                        = 2
> MaxStaticCursorRows             = 5000
> CheckpointAuditTrail            = 0
> AllowOSCalls                    = 0
> SchedulerInterval               = 10
> DirsAllowed                     = .,/data2/virtuoso-load
> ThreadCleanupInterval           = 0
> ThreadThreshold                 = 10
> ResourcesCleanupInterval        = 0
> FreeTextBatchSize               = 100000
> SingleCPU                       = 0
> VADInstallDir                   = 
> /usr/local/virtuoso-opensource/share/virtuoso/vad/
> PrefixResultNames               = 0
> RdfFreeTextRulesSize            = 100
> IndexTreeMaps                   = 256
> MaxMemPoolSize                  = 200000000
> PrefixResultNames               = 0
> MacSpotlight                    = 0
> IndexTreeMaps                   = 64
> MaxQueryMem                     = 2G            ; memory allocated to query 
> processor
> VectorSize                      = 1000          ; initial parallel query 
> vector (array of query operations) size
> MaxVectorSize                   = 1000000       ; query vector size threshold.
> AdjustVectorSize                = 0
> ThreadsPerQuery                 = 4
> AsyncQueueMaxThreads            = 10
> ;;
> 
> 
> NumberOfBuffers          = 2400000
> MaxDirtyBuffers          = 1800000
> 
> 
> [HTTPServer]
> ServerPort                      = 8890
> ServerRoot                      = 
> /usr/local/virtuoso-opensource/var/lib/virtuoso/vsp
> MaxClientConnections            = 10
> DavRoot                         = DAV
> EnabledDavVSP                   = 0
> HTTPProxyEnabled                = 0
> TempASPXDir                     = 0
> DefaultMailServer               = localhost:25
> ServerThreads                   = 10
> MaxKeepAlives                   = 10
> KeepAliveTimeout                = 10
> MaxCachedProxyConnections       = 10
> ProxyConnectionCacheTimeout     = 15
> HTTPThreadSize                  = 280000
> HttpPrintWarningsInOutput       = 0
> Charset                         = UTF-8
> ;HTTPLogFile                    = logs/http.log
> MaintenancePage                 = atomic.html
> EnabledGzipContent              = 1
> 
> 
> [AutoRepair]
> BadParentLinks                  = 0
> 
> [Client]
> SQL_PREFETCH_ROWS               = 100
> SQL_PREFETCH_BYTES              = 16000
> SQL_QUERY_TIMEOUT               = 0
> SQL_TXN_TIMEOUT                 = 0
> ;SQL_NO_CHAR_C_ESCAPE           = 1
> ;SQL_UTF8_EXECS                 = 0
> ;SQL_NO_SYSTEM_TABLES           = 0
> ;SQL_BINARY_TIMESTAMP           = 1
> ;SQL_ENCRYPTION_ON_PASSWORD     = -1
> 
> [VDB]
> ArrayOptimization               = 0
> NumArrayParameters              = 10
> VDBDisconnectTimeout            = 1000
> KeepConnectionOnFixedThread     = 0
> 
> [Replication]
> ServerName                      = db-VIRTUOSO-JOSEF
> ServerEnable                    = 1
> QueueMax                        = 50000
> ;
> ;  Striping setup
> 
> ;
> [Striping]
> Segment1                        = 56G, /data1/virtuosodb/db-seg1-1.db, 
> /data2/virtuosodb/db-seg1-2.db
> Segment2                        = 56G, /data1/virtuosodb/db-seg2-1.db, 
> /data2/virtuosodb/db-seg2-2.db
> 
> [TempStriping]
> Segment1                        = 840M, /data1/virtuosodb/tmp-seg1-1.db, 
> /data2/virtuosodb/tmp-seg1-2.db
> 
> [Zero Config]
> ServerName                      = virtuoso (VIRTUOSO-JOSEF)
> ;ServerDSN                      = ZDSN
> ;SSLServerName                  =
> ;SSLServerDSN                   =
> 
> [URIQA]
> DynamicLocal                    = 0
> DefaultHost                     = localhost:8890
> 
> [SPARQL]
> ;ExternalQuerySource            = 1
> ;ExternalXsltSource             = 1
> ;DefaultGraph                   = http://localhost:8890/dataspace
> ;ImmutableGraphs                = http://localhost:8890/dataspace
> ResultSetMaxRows                = 10000
> MaxQueryCostEstimationTime      = 400   ; in seconds
> MaxQueryExecutionTime           = 60    ; in seconds
> DefaultQuery                    = select distinct ?Concept where {[] a 
> ?Concept} LIMIT 100
> DeferInferenceRulesInit         = 0  ; controls inference rules loading
> ;PingService                    = http://rpc.pingthesemanticweb.com/
> 
> 
> [Plugins]
> LoadPath                        = 
> /usr/local/virtuoso-opensource/lib/virtuoso/hosting
> Load1                           = plain, wikiv
> Load2                           = plain, mediawiki
> Load3                           = plain, creolewiki
> 
> Any help is highly appreciated.
> 
> Best regards,
> Andrejs
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment 
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees_APR_______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to