Still waiting for the answer to:
[[
Does the size stabilise?
If not, do some files stabilise in size and other not?
]]
On 16/02/15 18:14, Trevor Donaldson wrote:
Hi think my question got lost. Is it correct to add millions of triples to
the model and then persist the model once using putmodel. I didn't want to
get a timeout or anything like that.
Updates don't time out. Small numbers of millions is no big deal in a
single operation append operation (INSERT DATA or POST / addModel).
In your loop you have:
Resource subject =
ResourceFactory.createResource("http://example.org/task/"+i);
so a new URI is generated every time. Nodes are not recovered on delete
(too expensive to reference count them - see earlier in the thread).
Batching updates may help performance.
On Feb 13, 2015 5:11 PM, "Trevor Donaldson" <[email protected]> wrote:
I am using Fuseki2. I thought it manages the transactions for me. Is this
not the case?
"Manages" in the sense that each HTTP interaction is a transaction.
HTTP is a stateless protocol.
I was using datasetfactory to interact with fuseki.
Was that meant to be DatasetAccessorFactory?
Andy
On Feb 13, 2015 12:10 PM, "Andy Seaborne" <[email protected]> wrote:
This may be related:
https://issues.apache.org/jira/browse/JENA-804
I say "may" because the exact patterns of use deep affect the outcome. In
JENA-804 it is across transaction boundaries, which your "putModel" isn't.
(Are you really running without transactions?)
Andy
On 13/02/15 16:56, Andy Seaborne wrote:
Does the size stabilise?
If not, do some files stabilise in size and other not?
There are two places for growth:
nodes - does the new data have new RDF terms in it? Old terms are not
deleted, just left round to be reused so if you are adding terms, the
node table can grow. (Terms are not reference counted - that would be
very expensive for sucgh a small data item.)
TDB (current version) does not properly reuse freed up space in indexes
but should do within a transaction. put is delete-add and some space
should be reused
A proper fix to reuse across transactions may require a database format
change but I haven't had time to workout the details though off the top
of my head, much use should be doable by moving the free chain
management onto the main database on a transaction as its single-active
writer. The code is currently too cautious about old generation readers
which I now see it need not be.
Andy
On 12/02/15 17:51, Trevor Donaldson wrote:
Any thoughts anyone? If I change my model every hour with new data or
data
to replace. Lets say over a period of inserting years worth of triples
should I persist potentially millions of triples at one time using
putModel? Committing one time seems to be the only way to not mitigate
against the directory growing exponentially.
On Thu, Feb 12, 2015 at 9:53 AM, Trevor Donaldson <[email protected]>
wrote:
Damian,
I am using du -ksh ./* on the databases directory.
I am getting
25M ./test_store
On Thu, Feb 12, 2015 at 9:35 AM, Damian Steer <[email protected]>
wrote:
On 12/02/15 13:49, Trevor Donaldson wrote:
On Thu, Feb 12, 2015 at 6:32 AM, Trevor Donaldson
<[email protected]
wrote:
Hi,
I am in the middle of updating our store from RDB to TDB. I have
noticed
a significant size increase in the amount of storage needed.
Currently RDB
is able to hold all the data I need (4 third party services and 4
years of
their data) and it equals ~ 12G. I started inserting data from 1
third
party service, only 4 months of their data into TDB and the TDB
database
size has already reached 15G. Is this behavior expected?
Hi Trevor,
How are you measuring the space used? TDB files tend to be sparse, so
the disk use reported can be unreliable. Example from my system:
6.2M [...] 264M [...] GOSP.dat
The first number (6.2M) is essentially the disk space taken, the
second
(264M!) is the 'length' of the file.
Damian
--
Damian Steer
Senior Technical Researcher
Research IT
+44 (0) 117 928 7057