Hi Andrejs, So you are basically saying the Freebase RDF dataset dump has been split into files of 10,000,000 triples each which you are loading with the Virtuoso RDF Bulk loader (rdf_loader_run) under the control of a Java application, and 120 ie 1.2 billion triples of the datasets are loading before the "Virtuoso Communications Link Failure (timeout) " error occurs in the app ?
You then indicate the Virtuoso server is restarted but the load fails after the first batch, is this then loading the 121st split file ie continuing from where the load left off as would be indicated by the ll_status column of the load_list table or is the load started from scratch ? What does the "status()"; run from isql report, in particular as the number of "buffers" allocated and used, as I suspect the system is running out of memory loading the Freebase datasets which seem to have larger than average literal values and a lot of duplicate data which consume more than the average and thus require more then 10 bytes per QUAD of storage. In testing we have performed internally it was found that Freebase need about 100GB RAM to load completely in memory for best performance, loading in about 7000 sec ie about 2 hrs on a single server machine. Having said that the Virtuoso server should not crash the load should just take a lot longer as the data will continuously be swapping to and from disk as the memory (buffers) are all used up. If SSD's are available for faster access then the load will complete a lot quicker than if using normal disks. Is any error reported in the "vrtuoso.log" or /var/log/messages file as to why the server crashed/shutdown unexpectedly ? I note you have disk striping enabled across disks with 2 stripes and 2 segments. How many rdf_loader_run() processes do you have running during the load, as if you have 8 VCPUs then you should be running 3 or 4 of these for best parallel loading of the datassets ? Note I would also suggest the the use of the Virtuoso LDMeter functions which can monitor the load rates of the Virtuoso RDF Bulk loader process as detailed at: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideLDMeterUtility and should show how the load rates decrease as all the buffers are consumed and data is having to be swapped to and from disk ... Best Regards Hugh Williams Professional Services OpenLink Software, Inc. // http://www.openlinksw.com/ Weblog -- http://www.openlinksw.com/blogs/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Google+ -- http://plus.google.com/100570109519069333827/ Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers On 7 Apr 2014, at 14:10, Andrejs Abele <andr...@sindicetech.com> wrote: > Read 300 IOPS 120 MB/s 180 MB/s > Write 1500 IOPS 90 MB/s 120 MB/s > > > Hi, > > I'm running virtuoso 07.10.3207 in google cloud. > Instance : 8VCPU, 30 GB memmory (n1-standard-8), 10 GB boot disk + 2x 1000 GB > storage disks. > Using batch loading, I'm trying to load freebase. > One batch contains 10 000 000 triples. After loading 120 batches, loading > fails with this error. > > Apr 04, 2014 22:10:50 PM > org.springframework.beans.factory.xml.XmlBeanDefinitionReader > loadBeanDefinitions > INFO: Loading XML bean definitions from class path resource > [org/springframework/jdbc/support/sql-error-codes.xml] > Apr 04, 2014 22:10:51 PM > org.springframework.jdbc.support.SQLErrorCodesFactory <init> > INFO: SQLErrorCodes loaded: [DB2, Derby, H2, HSQL, Informix, MS-SQL, MySQL, > Oracle, PostgreSQL, Sybase] > 22:39:52.052 [main] ERROR c.s.f.l.j.VirtuosoJDBCHelperTemplateBased - error > running loader > org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL > grammar [rdf_loader_run()]; nested exception is > virtuoso.jdbc4.VirtuosoException: Virtuoso Communications Link Failure > (timeout) : Read timed out > at > org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:97) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:407) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:428) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at > com.sindicetech.fbutils.loader.jdbc.VirtuosoJDBCHelperTemplateBased.runLoader(VirtuosoJDBCHelperTemplateBased.java:131) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at com.sindicetech.fbutils.loader.Miniloader.load(Miniloader.java:87) > [loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at com.sindicetech.fbutils.loader.Miniloader.main(Miniloader.java:67) > [loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > Caused by: virtuoso.jdbc4.VirtuosoException: Virtuoso Communications Link > Failure (timeout) : Read timed out > at virtuoso.jdbc4.VirtuosoFuture.nextResult(Unknown Source) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at virtuoso.jdbc4.VirtuosoResultSet.process_result(Unknown Source) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at virtuoso.jdbc4.VirtuosoResultSet.<init>(Unknown Source) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at virtuoso.jdbc4.VirtuosoStatement.sendQuery(Unknown Source) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at virtuoso.jdbc4.VirtuosoStatement.execute(Unknown Source) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.springframework.jdbc.core.JdbcTemplate$1ExecuteStatementCallback.doInStatement(JdbcTemplate.java:421) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:396) > ~[loader-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na] > ... 4 common frames omitted > an error accurred during load of file > /data2/virtuoso-load/freebase-dump-2014-03-09-good-0121.gz : > com.sindicetech.fbutils.loader.jdbc.VirtuosoJDBCException: error running > loader > > And in virtuoso logs I get > ... > 21:35:28 * Monitor: High disk read (2) > 21:37:48 * Monitor: High disk read (2) > 21:39:53 * Monitor: High disk read (2) > 21:41:53 * Monitor: High disk read (2) > 21:43:53 * Monitor: High disk read (2) > 21:45:27 * Monitor: CPU% is low while there are large numbers of runnable > threads > 21:45:53 * Monitor: High disk read (2) > 21:47:27 * Monitor: CPU% is low while there are large numbers of runnable > threads > 21:47:53 * Monitor: High disk read (2) > 21:49:27 * Monitor: CPU% is low while there are large numbers of runnable > threads > 21:49:53 * Monitor: High disk read (2) > 21:51:47 * Monitor: CPU% is low while there are large numbers of runnable > threads > 21:51:53 * Monitor: High disk read (2) > 21:53:48 * Monitor: CPU% is low while there are large numbers of runnable > threads > 21:53:54 * Monitor: High disk read (2) > 21:55:54 * Monitor: High disk read (2) > 21:57:04 * Monitor: CPU% is low while there are large numbers of runnable > threads > 21:57:54 * Monitor: High disk read (2) > 21:59:54 * Monitor: High disk read (2) > 22:01:54 * Monitor: High disk read (2) > 22:02:40 * Monitor: CPU% is low while there are large numbers of runnable > threads > ... > > I tried to restart virtuoso, but after restart, loading failed at first batch. > Based on Google documentation 1000 disk should have: > Read 300 IOPS 120 MB/s 180 MB/s > Write 1500 IOPS 90 MB/s 120 MB/s > > My virtuoso.ini > > ; > ; virtuoso.ini > > ; Database setup > ; > [Database] > DatabaseFile = /usr/local/virtuosoCtl/virtuoso.db > ErrorLogFile = /usr/local/virtuosoCtl/virtuoso.log > LockFile = /usr/local/virtuosoCtl/virtuoso.lck > TransactionFile = /usr/local/virtuosoCtl/virtuoso.trx > xa_persistent_file = /usr/local/virtuosoCtl/virtuoso.pxa > ErrorLogLevel = 7 > FileExtend = 200 > MaxCheckpointRemap = 3000000 > Striping = 1 > TempStorage = TempDatabase > > > [TempDatabase] > DatabaseFile = /usr/local/virtuosoCtl/virtuoso-temp.db > TransactionFile = /usr/local/virtuosoCtl/virtuoso-temp.trx > MaxCheckpointRemap = 15000 > Striping = 1 > > > ; > ; Server parameters > ; > [Parameters] > ServerPort = 1111 > LiteMode = 0 > DisableUnixSocket = 1 > DisableTcpSocket = 0 > MaxClientConnections = 10 > CheckpointInterval = 60 > O_DIRECT = 0 > CaseMode = 2 > MaxStaticCursorRows = 5000 > CheckpointAuditTrail = 0 > AllowOSCalls = 0 > SchedulerInterval = 10 > DirsAllowed = .,/data2/virtuoso-load > ThreadCleanupInterval = 0 > ThreadThreshold = 10 > ResourcesCleanupInterval = 0 > FreeTextBatchSize = 100000 > SingleCPU = 0 > VADInstallDir = > /usr/local/virtuoso-opensource/share/virtuoso/vad/ > PrefixResultNames = 0 > RdfFreeTextRulesSize = 100 > IndexTreeMaps = 256 > MaxMemPoolSize = 200000000 > PrefixResultNames = 0 > MacSpotlight = 0 > IndexTreeMaps = 64 > MaxQueryMem = 2G ; memory allocated to query > processor > VectorSize = 1000 ; initial parallel query > vector (array of query operations) size > MaxVectorSize = 1000000 ; query vector size threshold. > AdjustVectorSize = 0 > ThreadsPerQuery = 4 > AsyncQueueMaxThreads = 10 > ;; > > > NumberOfBuffers = 2400000 > MaxDirtyBuffers = 1800000 > > > [HTTPServer] > ServerPort = 8890 > ServerRoot = > /usr/local/virtuoso-opensource/var/lib/virtuoso/vsp > MaxClientConnections = 10 > DavRoot = DAV > EnabledDavVSP = 0 > HTTPProxyEnabled = 0 > TempASPXDir = 0 > DefaultMailServer = localhost:25 > ServerThreads = 10 > MaxKeepAlives = 10 > KeepAliveTimeout = 10 > MaxCachedProxyConnections = 10 > ProxyConnectionCacheTimeout = 15 > HTTPThreadSize = 280000 > HttpPrintWarningsInOutput = 0 > Charset = UTF-8 > ;HTTPLogFile = logs/http.log > MaintenancePage = atomic.html > EnabledGzipContent = 1 > > > [AutoRepair] > BadParentLinks = 0 > > [Client] > SQL_PREFETCH_ROWS = 100 > SQL_PREFETCH_BYTES = 16000 > SQL_QUERY_TIMEOUT = 0 > SQL_TXN_TIMEOUT = 0 > ;SQL_NO_CHAR_C_ESCAPE = 1 > ;SQL_UTF8_EXECS = 0 > ;SQL_NO_SYSTEM_TABLES = 0 > ;SQL_BINARY_TIMESTAMP = 1 > ;SQL_ENCRYPTION_ON_PASSWORD = -1 > > [VDB] > ArrayOptimization = 0 > NumArrayParameters = 10 > VDBDisconnectTimeout = 1000 > KeepConnectionOnFixedThread = 0 > > [Replication] > ServerName = db-VIRTUOSO-JOSEF > ServerEnable = 1 > QueueMax = 50000 > ; > ; Striping setup > > ; > [Striping] > Segment1 = 56G, /data1/virtuosodb/db-seg1-1.db, > /data2/virtuosodb/db-seg1-2.db > Segment2 = 56G, /data1/virtuosodb/db-seg2-1.db, > /data2/virtuosodb/db-seg2-2.db > > [TempStriping] > Segment1 = 840M, /data1/virtuosodb/tmp-seg1-1.db, > /data2/virtuosodb/tmp-seg1-2.db > > [Zero Config] > ServerName = virtuoso (VIRTUOSO-JOSEF) > ;ServerDSN = ZDSN > ;SSLServerName = > ;SSLServerDSN = > > [URIQA] > DynamicLocal = 0 > DefaultHost = localhost:8890 > > [SPARQL] > ;ExternalQuerySource = 1 > ;ExternalXsltSource = 1 > ;DefaultGraph = http://localhost:8890/dataspace > ;ImmutableGraphs = http://localhost:8890/dataspace > ResultSetMaxRows = 10000 > MaxQueryCostEstimationTime = 400 ; in seconds > MaxQueryExecutionTime = 60 ; in seconds > DefaultQuery = select distinct ?Concept where {[] a > ?Concept} LIMIT 100 > DeferInferenceRulesInit = 0 ; controls inference rules loading > ;PingService = http://rpc.pingthesemanticweb.com/ > > > [Plugins] > LoadPath = > /usr/local/virtuoso-opensource/lib/virtuoso/hosting > Load1 = plain, wikiv > Load2 = plain, mediawiki > Load3 = plain, creolewiki > > Any help is highly appreciated. > > Best regards, > Andrejs > ------------------------------------------------------------------------------ > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees_APR_______________________________________________ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users ------------------------------------------------------------------------------ Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test & Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users