Hi Rob, seems like Fuseki doesn't handle gzip. I created the file with `tar -cvzf tar_test.ttl.gz large_input.ttl` so it should be a standard gzip. Uploading fails with the following log:
[2018-09-13 11:23:33] Fuseki ERROR [line: 1, col: 1 ] Out of place: [KEYWORD:PaxHeader] [2018-09-13 11:23:50] Fuseki INFO [9] 400 Parse error: [line: 1, col: 1 ] Out of place: [KEYWORD:PaxHeader] (16.674 s) Markus > Am 13.09.2018 um 11:59 schrieb Rob Vesse <[email protected]>: > > Markus > > Jena in general should transparently recognize and handle files with a .gz > extension provided they follow the standard approach of appending this after > the normal file extension i.e. .ttl.gz I checked the Fuseki code and GZipped > uploads should be supported > > From Jena 3.8.0 support is also provided for BZip2 files with a .bz2 > extension and Snappy compressed files with a .sz extension. Although looking > at the Fuseki code not sure this is wired up into Fuseki currently. > > Rob > > On 13/09/2018, 10:49, "Markus Neumann" <[email protected]> wrote: > > Hi Rob, > >> Am 13.09.2018 um 11:41 schrieb Rob Vesse <[email protected]>: >> >> Markus >> >> Comments inline: >> >> On 12/09/2018, 16:09, "Markus Neumann" <[email protected]> wrote: >> >> Hi, >> >> we are running a Fuseki server that will hold about 2.2 * 10^9 triples of >> meteorological data eventually. >> I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 >> on a 900GB SSD. >> >> Now I face several performance issues: >> 1. Inserting data: >> It takes more than one hour to upload the measurements of a month >> (7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of >> fuseki) >> Is there a way to do this faster? >> >> At a minimum try GZipping the file and uploading it in GZipped form to >> reduce the amount of data transferred over the network. It is possible that >> your bottleneck here is actually network upload bandwith rather than >> anything with Jena itself. I would expect GZip to substantially reduce the >> file size and hopefully improve your load times. > > I am uploading on the server itself to localhost, so network should not be > any issue (or am I missing something?). > Can fuseki handle gzipped ttl files? > >> Secondly TDB is typically reported to achieve load speeds of up to around >> 200k triples/second, although that if for offline bulk loads with SSDs. >> Even if we assume you could achieve only 25k triples/second that would >> suggest a theoretical load time of approximately 11 minutes. If you can >> setup your system so the TDB database is written to an SSD that will improve >> your performance to some extent. > We are on a SSD setup. 11minutes would be something, we could live with... >> >> Thirdly TDB is multi reader single writer (MRSW) concurrency so if you have >> a lot of reads happening while trying to upload, which is a write operation, >> the write operation will be forced to wait for active readers to finish >> before proceeding which may introduce some delays. > I'm aware of that, but as we are not in production mode yet, there are no > request delaying the upload. >> >> So yes I think you should be able to get faster load times. >> >> 2. Updating data: >> We get new model runs 5 times per day. This is data for the next 10 >> days, that needs to be updated every time. >> My idea was to create a named graph "forecast" that holds the latest >> version of this data. >> Every time a new model run arrives, I create a new temporary graph to >> upload the data to. Once this is finished, I move the temporary graph to >> "forecast". >> This seems to do the work twice as it takes 1 hour for the upload an 1 >> hour for the move. >> >> Yes this is exactly what happens, the database that backs Fuseki, TDB, is a >> quads store so it is storing each triple as a quad of GSPO where G is the >> graph name. So when you move the temporary graph it has to copy all the >> quads from the source graph to the target graph and then delete that source >> graph. > Thanks for that input. I will have to figure something else here... >> >> Rob >> > > Markus > > > >
