On 30/08/12 14:04, Andy Seaborne wrote:
On 30/08/12 03:01, Yuhan Zhang wrote:
actually, the file I used was 260MB... I  tried something smaller than
1MB
and it worked.

seems like the s-put ruby script is stream-friendly.. do I have to break
large files into parts?
  what's the recommended way to load large files?

The Fuseki server is defensive - it reads the body of the PUT into a
temporary in-memory model to make sure it's going to be able to parse
everything, then, when the end of the body is reached, it adds the graph
to the model.

All the web operations are defensive and check their inputs so that a
bad request does not lead to half a request being serviced.

TDB Transactions could overcome that to some extent but they are not
fully scalable and Fuseki does not (currently) know that TDB
transactions are fully ACID, so the transaction itself can be used to
catch bad data and not mess up the database.

A way to bulk load data is to load the database offline using the
bulkloader (tdbloader).  The bulkloader knows how to cheat - it
manipulates the internal tables directly.

So either split the file up, or bulk load the database off list.

If split, use DELETE or PUT empty first then POST,POST,POST to append data.

     Andy

Did the log file have anything in it?

If the hangs for you, it might be because it's gone into GC hell - if memory is close to running out, the GC is running a lot and the system runs very slowly. A small amount heap makes a big difference at this point.

If you are on a 64 bit system, you can try with a larger heap.

I loaded 1e6 triples of BSBM data (the input is 246M of N-triples) with a 2G heap.

Try (on Linux)

JVM_ARGS=-Xmx3000M fuseki-server  ...

Loading is not instantaneous though. There are simply a lot of bytes to move around let alone be careful about.

        Andy


Thank you.

Yuhan

On Wed, Aug 29, 2012 at 6:13 PM, Yuhan Zhang <[email protected]>
wrote:

Hi all,

I'm experimenting with fuseki, and reached some trouble at loading data.

I followed the getting started tutorial successfully with the books.ttl
file. but when feeding a small .ttl file (1.2MB) from dbpedia, the
script
is causing system halt:
http://downloads.dbpedia.org/3.8/bg/geo_coordinates_bg.ttl.bz2

The server was started with the following setting: fuseki-server
--update
--loc=/tmp/ds /ds
The data was loaded in this way: ruby s-put
http://localhost:3030/ds/datadefault geo_coordinates_bg.ttl

I'm using the latest distribution: jena-fuseki-0.2.4


Thank you.

Yuhan






Reply via email to