Re: [Neo4j] Size on disk, and number of properties

2011-09-08 Thread Chris Gioran
Hi Assem

The logical log file is where all changes to be performed to the store
are written out before they are actually applied - also referred to as
the Write Ahead Log. The file that hosts this is bound to a specific
size above which a rotation happens: a new file is created to host the
WAL, all pending transactions are moved over, the store files are
flushed and the old file marked as unused.
Setting the configuration option keep_logical_logs to false
deletes the old log files instead of keeping them around when that
size limit is hit and a rotation is triggered.

Hope that clear things up.

cheers,
CG

On Wed, Sep 7, 2011 at 8:58 PM, Aseem Kishore aseem.kish...@gmail.com wrote:
 Thanks Johan!

 Configuring Neo4j with keep_logical_logs=false the logs will
 automatically be deleted upon rotation.


 What does upon rotation mean here?

 Aseem

 On Wed, Sep 7, 2011 at 5:56 AM, Johan Svensson jo...@neotechnology.comwrote:

 Removing the log files ending with .vversion number at runtime is
 perfectly safe to do but will turn off the ability to do incremental
 backups. You can however still perform live full backups.

 Configuring Neo4j with keep_logical_logs=false the logs will
 automatically be deleted upon rotation.

 -Johan

 On Sat, Sep 3, 2011 at 1:49 AM, Aseem Kishore aseem.kish...@gmail.com
 wrote:
  Thanks for the insights Johan!
 
  Regarding the existing disk space then, by far the bulk of it is from the
  logs. Is there a way to prune or garbage collect them? Is simply deleting
  the files safe? Should the db be off if I do that? Etc.
 
  Thanks much!
 
  Aseem
 
  On Tue, Aug 30, 2011 at 2:47 AM, Johan Svensson jo...@neotechnology.com
 wrote:
 
  Hi Aseem,
 
  This is actually expected behavior when performing file copy of
  running db and starting up with default configuration. If you remove
  the files ending with .id in the db directory on the local snapshot
  and start up setting rebuild_idgenerators_fast=false you should see
  the accurate amount of nodes, relationships and properties.
 
  Regarding the amount of properties not matching this could be due to a
  non clean shutdown on the production system. We are planing on
  improving this in the near future by allowing for more aggressive
  reuse of ids for properties. This will specifically improve things for
  workloads that perform a lot of property updates.
 
  -Johan
 
  On Tue, Aug 30, 2011 at 10:05 AM, Aseem Kishore 
 aseem.kish...@gmail.com
  wrote:
   Hey guys,
  
   We do offline backups of our db on a semi-regular basis (every few
 days),
   where we (1) stop the running db, (2) copy its data directory and (3)
   restart the db.
  
   A few times early on, we did running backups -- but not the proper
  online
   way -- where we simply copied the data directory while the db was
 still
   running. (We did this during times where we were confident no requests
  were
   hitting the db.)
  
   We noticed that every time we did the running backup, the number of
   properties the web admin reported -- and the space on disk of the db
 --
   would jump up quite a bit. We stopped doing that recently.
  
   But even now, both these numbers have gotten quite a bit higher than
 we
   expect to, and strangely, they seem to differ highly between the
 running
  db
   and the copies.
  
   What could be causing all of this?
  
   Here are our current numbers:
  
   *Production*
   - 2,338 nodes
   - 4,473 rels
   - 114,231 props (higher than we would expect it to be, but not by an
  order
   of magnitude)
   - *1.39 GB!* -- this is way unexpected, particularly since our db
 used
  to
   be in the ~10 KB ballpark, and we certainly haven't experienced hockey
  stick
   growth yet ;) The logical log only takes up 57 KB (0%) btw.
  
  
   *Local snapshot*
   - 2,338 nodes
   - 4,473 rels
   - *2,607,892 props!!!* -- ???
   - *1.37 GB!* -- equally surprisingly high, but also interesting that
  it's
   less than the production db's size. 0 KB logical logs.
  
  
   I looked around the wiki and searched this mailing list but didn't
 find
  much
   clues here. But as requested on another thread, here's the output of
 `ls
  -lh
   data/graph.db/`:
  
   total 1474520
   -rw-r--r--   1 aseemk  staff    11B Aug 30 00:46 active_tx_log
   drwxr-xr-x  52 aseemk  staff   1.7K Aug 30 00:46 index/
   -rw-r--r--   1 aseemk  staff   343B Aug 30 00:46 index.db
   -rw-r--r--   1 aseemk  staff   854K Aug 30 00:46 messages.log
   -rw-r--r--   1 aseemk  staff    36B Aug 30 00:46 neostore
   -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46 neostore.id
   -rw-r--r--   1 aseemk  staff    26K Aug 30 00:46 neostore.nodestore.db
   -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
  neostore.nodestore.db.id
   -rw-r--r--   1 aseemk  staff    62M Aug 30 00:46
  neostore.propertystore.db
   -rw-r--r--   1 aseemk  staff   133B Aug 30 00:46
   neostore.propertystore.db.arrays
   -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
   neostore.propertystore.db.arrays.id
   

Re: [Neo4j] Size on disk, and number of properties

2011-09-08 Thread Aseem Kishore
That is awesomely helpful info. Thank you very much!

Aseem

On Thu, Sep 8, 2011 at 12:55 AM, Chris Gioran 
chris.gio...@neotechnology.com wrote:

 Hi Assem

 The logical log file is where all changes to be performed to the store
 are written out before they are actually applied - also referred to as
 the Write Ahead Log. The file that hosts this is bound to a specific
 size above which a rotation happens: a new file is created to host the
 WAL, all pending transactions are moved over, the store files are
 flushed and the old file marked as unused.
 Setting the configuration option keep_logical_logs to false
 deletes the old log files instead of keeping them around when that
 size limit is hit and a rotation is triggered.

 Hope that clear things up.

 cheers,
 CG

 On Wed, Sep 7, 2011 at 8:58 PM, Aseem Kishore aseem.kish...@gmail.com
 wrote:
  Thanks Johan!
 
  Configuring Neo4j with keep_logical_logs=false the logs will
  automatically be deleted upon rotation.
 
 
  What does upon rotation mean here?
 
  Aseem
 
  On Wed, Sep 7, 2011 at 5:56 AM, Johan Svensson jo...@neotechnology.com
 wrote:
 
  Removing the log files ending with .vversion number at runtime is
  perfectly safe to do but will turn off the ability to do incremental
  backups. You can however still perform live full backups.
 
  Configuring Neo4j with keep_logical_logs=false the logs will
  automatically be deleted upon rotation.
 
  -Johan
 
  On Sat, Sep 3, 2011 at 1:49 AM, Aseem Kishore aseem.kish...@gmail.com
  wrote:
   Thanks for the insights Johan!
  
   Regarding the existing disk space then, by far the bulk of it is from
 the
   logs. Is there a way to prune or garbage collect them? Is simply
 deleting
   the files safe? Should the db be off if I do that? Etc.
  
   Thanks much!
  
   Aseem
  
   On Tue, Aug 30, 2011 at 2:47 AM, Johan Svensson 
 jo...@neotechnology.com
  wrote:
  
   Hi Aseem,
  
   This is actually expected behavior when performing file copy of
   running db and starting up with default configuration. If you remove
   the files ending with .id in the db directory on the local snapshot
   and start up setting rebuild_idgenerators_fast=false you should see
   the accurate amount of nodes, relationships and properties.
  
   Regarding the amount of properties not matching this could be due to
 a
   non clean shutdown on the production system. We are planing on
   improving this in the near future by allowing for more aggressive
   reuse of ids for properties. This will specifically improve things
 for
   workloads that perform a lot of property updates.
  
   -Johan
  
   On Tue, Aug 30, 2011 at 10:05 AM, Aseem Kishore 
  aseem.kish...@gmail.com
   wrote:
Hey guys,
   
We do offline backups of our db on a semi-regular basis (every few
  days),
where we (1) stop the running db, (2) copy its data directory and
 (3)
restart the db.
   
A few times early on, we did running backups -- but not the proper
   online
way -- where we simply copied the data directory while the db was
  still
running. (We did this during times where we were confident no
 requests
   were
hitting the db.)
   
We noticed that every time we did the running backup, the number of
properties the web admin reported -- and the space on disk of the
 db
  --
would jump up quite a bit. We stopped doing that recently.
   
But even now, both these numbers have gotten quite a bit higher
 than
  we
expect to, and strangely, they seem to differ highly between the
  running
   db
and the copies.
   
What could be causing all of this?
   
Here are our current numbers:
   
*Production*
- 2,338 nodes
- 4,473 rels
- 114,231 props (higher than we would expect it to be, but not by
 an
   order
of magnitude)
- *1.39 GB!* -- this is way unexpected, particularly since our db
  used
   to
be in the ~10 KB ballpark, and we certainly haven't experienced
 hockey
   stick
growth yet ;) The logical log only takes up 57 KB (0%) btw.
   
   
*Local snapshot*
- 2,338 nodes
- 4,473 rels
- *2,607,892 props!!!* -- ???
- *1.37 GB!* -- equally surprisingly high, but also interesting
 that
   it's
less than the production db's size. 0 KB logical logs.
   
   
I looked around the wiki and searched this mailing list but didn't
  find
   much
clues here. But as requested on another thread, here's the output
 of
  `ls
   -lh
data/graph.db/`:
   
total 1474520
-rw-r--r--   1 aseemk  staff11B Aug 30 00:46 active_tx_log
drwxr-xr-x  52 aseemk  staff   1.7K Aug 30 00:46 index/
-rw-r--r--   1 aseemk  staff   343B Aug 30 00:46 index.db
-rw-r--r--   1 aseemk  staff   854K Aug 30 00:46 messages.log
-rw-r--r--   1 aseemk  staff36B Aug 30 00:46 neostore
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46 neostore.id
-rw-r--r--   1 aseemk  staff26K Aug 30 00:46
 neostore.nodestore.db
-rw-r--r--   1 aseemk  staff 9B Aug 30 

Re: [Neo4j] Size on disk, and number of properties

2011-09-07 Thread Johan Svensson
Removing the log files ending with .vversion number at runtime is
perfectly safe to do but will turn off the ability to do incremental
backups. You can however still perform live full backups.

Configuring Neo4j with keep_logical_logs=false the logs will
automatically be deleted upon rotation.

-Johan

On Sat, Sep 3, 2011 at 1:49 AM, Aseem Kishore aseem.kish...@gmail.com wrote:
 Thanks for the insights Johan!

 Regarding the existing disk space then, by far the bulk of it is from the
 logs. Is there a way to prune or garbage collect them? Is simply deleting
 the files safe? Should the db be off if I do that? Etc.

 Thanks much!

 Aseem

 On Tue, Aug 30, 2011 at 2:47 AM, Johan Svensson 
 jo...@neotechnology.comwrote:

 Hi Aseem,

 This is actually expected behavior when performing file copy of
 running db and starting up with default configuration. If you remove
 the files ending with .id in the db directory on the local snapshot
 and start up setting rebuild_idgenerators_fast=false you should see
 the accurate amount of nodes, relationships and properties.

 Regarding the amount of properties not matching this could be due to a
 non clean shutdown on the production system. We are planing on
 improving this in the near future by allowing for more aggressive
 reuse of ids for properties. This will specifically improve things for
 workloads that perform a lot of property updates.

 -Johan

 On Tue, Aug 30, 2011 at 10:05 AM, Aseem Kishore aseem.kish...@gmail.com
 wrote:
  Hey guys,
 
  We do offline backups of our db on a semi-regular basis (every few days),
  where we (1) stop the running db, (2) copy its data directory and (3)
  restart the db.
 
  A few times early on, we did running backups -- but not the proper
 online
  way -- where we simply copied the data directory while the db was still
  running. (We did this during times where we were confident no requests
 were
  hitting the db.)
 
  We noticed that every time we did the running backup, the number of
  properties the web admin reported -- and the space on disk of the db --
  would jump up quite a bit. We stopped doing that recently.
 
  But even now, both these numbers have gotten quite a bit higher than we
  expect to, and strangely, they seem to differ highly between the running
 db
  and the copies.
 
  What could be causing all of this?
 
  Here are our current numbers:
 
  *Production*
  - 2,338 nodes
  - 4,473 rels
  - 114,231 props (higher than we would expect it to be, but not by an
 order
  of magnitude)
  - *1.39 GB!* -- this is way unexpected, particularly since our db used
 to
  be in the ~10 KB ballpark, and we certainly haven't experienced hockey
 stick
  growth yet ;) The logical log only takes up 57 KB (0%) btw.
 
 
  *Local snapshot*
  - 2,338 nodes
  - 4,473 rels
  - *2,607,892 props!!!* -- ???
  - *1.37 GB!* -- equally surprisingly high, but also interesting that
 it's
  less than the production db's size. 0 KB logical logs.
 
 
  I looked around the wiki and searched this mailing list but didn't find
 much
  clues here. But as requested on another thread, here's the output of `ls
 -lh
  data/graph.db/`:
 
  total 1474520
  -rw-r--r--   1 aseemk  staff    11B Aug 30 00:46 active_tx_log
  drwxr-xr-x  52 aseemk  staff   1.7K Aug 30 00:46 index/
  -rw-r--r--   1 aseemk  staff   343B Aug 30 00:46 index.db
  -rw-r--r--   1 aseemk  staff   854K Aug 30 00:46 messages.log
  -rw-r--r--   1 aseemk  staff    36B Aug 30 00:46 neostore
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46 neostore.id
  -rw-r--r--   1 aseemk  staff    26K Aug 30 00:46 neostore.nodestore.db
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.nodestore.db.id
  -rw-r--r--   1 aseemk  staff    62M Aug 30 00:46
 neostore.propertystore.db
  -rw-r--r--   1 aseemk  staff   133B Aug 30 00:46
  neostore.propertystore.db.arrays
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
  neostore.propertystore.db.arrays.id
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
  neostore.propertystore.db.id
  -rw-r--r--   1 aseemk  staff   1.0K Aug 30 00:46
  neostore.propertystore.db.index
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
  neostore.propertystore.db.index.id
  -rw-r--r--   1 aseemk  staff   4.0K Aug 30 00:46
  neostore.propertystore.db.index.keys
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
  neostore.propertystore.db.index.keys.id
  -rw-r--r--   1 aseemk  staff    69M Aug 30 00:46
  neostore.propertystore.db.strings
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
  neostore.propertystore.db.strings.id
  -rw-r--r--   1 aseemk  staff   144K Aug 30 00:46
  neostore.relationshipstore.db
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
  neostore.relationshipstore.db.id
  -rw-r--r--   1 aseemk  staff    55B Aug 30 00:46
  neostore.relationshiptypestore.db
  -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
  neostore.relationshiptypestore.db.id
  -rw-r--r--   1 aseemk  staff   602B Aug 30 00:46
  neostore.relationshiptypestore.db.names
  -rw-r--r--   

Re: [Neo4j] Size on disk, and number of properties

2011-09-07 Thread Aseem Kishore
Thanks Johan!

Configuring Neo4j with keep_logical_logs=false the logs will
 automatically be deleted upon rotation.


What does upon rotation mean here?

Aseem

On Wed, Sep 7, 2011 at 5:56 AM, Johan Svensson jo...@neotechnology.comwrote:

 Removing the log files ending with .vversion number at runtime is
 perfectly safe to do but will turn off the ability to do incremental
 backups. You can however still perform live full backups.

 Configuring Neo4j with keep_logical_logs=false the logs will
 automatically be deleted upon rotation.

 -Johan

 On Sat, Sep 3, 2011 at 1:49 AM, Aseem Kishore aseem.kish...@gmail.com
 wrote:
  Thanks for the insights Johan!
 
  Regarding the existing disk space then, by far the bulk of it is from the
  logs. Is there a way to prune or garbage collect them? Is simply deleting
  the files safe? Should the db be off if I do that? Etc.
 
  Thanks much!
 
  Aseem
 
  On Tue, Aug 30, 2011 at 2:47 AM, Johan Svensson jo...@neotechnology.com
 wrote:
 
  Hi Aseem,
 
  This is actually expected behavior when performing file copy of
  running db and starting up with default configuration. If you remove
  the files ending with .id in the db directory on the local snapshot
  and start up setting rebuild_idgenerators_fast=false you should see
  the accurate amount of nodes, relationships and properties.
 
  Regarding the amount of properties not matching this could be due to a
  non clean shutdown on the production system. We are planing on
  improving this in the near future by allowing for more aggressive
  reuse of ids for properties. This will specifically improve things for
  workloads that perform a lot of property updates.
 
  -Johan
 
  On Tue, Aug 30, 2011 at 10:05 AM, Aseem Kishore 
 aseem.kish...@gmail.com
  wrote:
   Hey guys,
  
   We do offline backups of our db on a semi-regular basis (every few
 days),
   where we (1) stop the running db, (2) copy its data directory and (3)
   restart the db.
  
   A few times early on, we did running backups -- but not the proper
  online
   way -- where we simply copied the data directory while the db was
 still
   running. (We did this during times where we were confident no requests
  were
   hitting the db.)
  
   We noticed that every time we did the running backup, the number of
   properties the web admin reported -- and the space on disk of the db
 --
   would jump up quite a bit. We stopped doing that recently.
  
   But even now, both these numbers have gotten quite a bit higher than
 we
   expect to, and strangely, they seem to differ highly between the
 running
  db
   and the copies.
  
   What could be causing all of this?
  
   Here are our current numbers:
  
   *Production*
   - 2,338 nodes
   - 4,473 rels
   - 114,231 props (higher than we would expect it to be, but not by an
  order
   of magnitude)
   - *1.39 GB!* -- this is way unexpected, particularly since our db
 used
  to
   be in the ~10 KB ballpark, and we certainly haven't experienced hockey
  stick
   growth yet ;) The logical log only takes up 57 KB (0%) btw.
  
  
   *Local snapshot*
   - 2,338 nodes
   - 4,473 rels
   - *2,607,892 props!!!* -- ???
   - *1.37 GB!* -- equally surprisingly high, but also interesting that
  it's
   less than the production db's size. 0 KB logical logs.
  
  
   I looked around the wiki and searched this mailing list but didn't
 find
  much
   clues here. But as requested on another thread, here's the output of
 `ls
  -lh
   data/graph.db/`:
  
   total 1474520
   -rw-r--r--   1 aseemk  staff11B Aug 30 00:46 active_tx_log
   drwxr-xr-x  52 aseemk  staff   1.7K Aug 30 00:46 index/
   -rw-r--r--   1 aseemk  staff   343B Aug 30 00:46 index.db
   -rw-r--r--   1 aseemk  staff   854K Aug 30 00:46 messages.log
   -rw-r--r--   1 aseemk  staff36B Aug 30 00:46 neostore
   -rw-r--r--   1 aseemk  staff 9B Aug 30 00:46 neostore.id
   -rw-r--r--   1 aseemk  staff26K Aug 30 00:46 neostore.nodestore.db
   -rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
  neostore.nodestore.db.id
   -rw-r--r--   1 aseemk  staff62M Aug 30 00:46
  neostore.propertystore.db
   -rw-r--r--   1 aseemk  staff   133B Aug 30 00:46
   neostore.propertystore.db.arrays
   -rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
   neostore.propertystore.db.arrays.id
   -rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
   neostore.propertystore.db.id
   -rw-r--r--   1 aseemk  staff   1.0K Aug 30 00:46
   neostore.propertystore.db.index
   -rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
   neostore.propertystore.db.index.id
   -rw-r--r--   1 aseemk  staff   4.0K Aug 30 00:46
   neostore.propertystore.db.index.keys
   -rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
   neostore.propertystore.db.index.keys.id
   -rw-r--r--   1 aseemk  staff69M Aug 30 00:46
   neostore.propertystore.db.strings
   -rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
   neostore.propertystore.db.strings.id
   -rw-r--r--   1 aseemk  staff   144K Aug 30 00:46
   

[Neo4j] Size on disk, and number of properties

2011-08-30 Thread Aseem Kishore
Hey guys,

We do offline backups of our db on a semi-regular basis (every few days),
where we (1) stop the running db, (2) copy its data directory and (3)
restart the db.

A few times early on, we did running backups -- but not the proper online
way -- where we simply copied the data directory while the db was still
running. (We did this during times where we were confident no requests were
hitting the db.)

We noticed that every time we did the running backup, the number of
properties the web admin reported -- and the space on disk of the db --
would jump up quite a bit. We stopped doing that recently.

But even now, both these numbers have gotten quite a bit higher than we
expect to, and strangely, they seem to differ highly between the running db
and the copies.

What could be causing all of this?

Here are our current numbers:

*Production*
- 2,338 nodes
- 4,473 rels
- 114,231 props (higher than we would expect it to be, but not by an order
of magnitude)
- *1.39 GB!* -- this is way unexpected, particularly since our db used to
be in the ~10 KB ballpark, and we certainly haven't experienced hockey stick
growth yet ;) The logical log only takes up 57 KB (0%) btw.


*Local snapshot*
- 2,338 nodes
- 4,473 rels
- *2,607,892 props!!!* -- ???
- *1.37 GB!* -- equally surprisingly high, but also interesting that it's
less than the production db's size. 0 KB logical logs.


I looked around the wiki and searched this mailing list but didn't find much
clues here. But as requested on another thread, here's the output of `ls -lh
data/graph.db/`:

total 1474520
-rw-r--r--   1 aseemk  staff11B Aug 30 00:46 active_tx_log
drwxr-xr-x  52 aseemk  staff   1.7K Aug 30 00:46 index/
-rw-r--r--   1 aseemk  staff   343B Aug 30 00:46 index.db
-rw-r--r--   1 aseemk  staff   854K Aug 30 00:46 messages.log
-rw-r--r--   1 aseemk  staff36B Aug 30 00:46 neostore
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46 neostore.id
-rw-r--r--   1 aseemk  staff26K Aug 30 00:46 neostore.nodestore.db
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46 neostore.nodestore.db.id
-rw-r--r--   1 aseemk  staff62M Aug 30 00:46 neostore.propertystore.db
-rw-r--r--   1 aseemk  staff   133B Aug 30 00:46
neostore.propertystore.db.arrays
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
neostore.propertystore.db.arrays.id
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
neostore.propertystore.db.id
-rw-r--r--   1 aseemk  staff   1.0K Aug 30 00:46
neostore.propertystore.db.index
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
neostore.propertystore.db.index.id
-rw-r--r--   1 aseemk  staff   4.0K Aug 30 00:46
neostore.propertystore.db.index.keys
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
neostore.propertystore.db.index.keys.id
-rw-r--r--   1 aseemk  staff69M Aug 30 00:46
neostore.propertystore.db.strings
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
neostore.propertystore.db.strings.id
-rw-r--r--   1 aseemk  staff   144K Aug 30 00:46
neostore.relationshipstore.db
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
neostore.relationshipstore.db.id
-rw-r--r--   1 aseemk  staff55B Aug 30 00:46
neostore.relationshiptypestore.db
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
neostore.relationshiptypestore.db.id
-rw-r--r--   1 aseemk  staff   602B Aug 30 00:46
neostore.relationshiptypestore.db.names
-rw-r--r--   1 aseemk  staff 9B Aug 30 00:46
neostore.relationshiptypestore.db.names.id
-rw-r--r--   1 aseemk  staff16B Aug 30 00:46 nioneo_logical.log.1
-rw-r--r--   1 aseemk  staff 4B Aug 30 00:46 nioneo_logical.log.active
-rw-r--r--   1 aseemk  staff   945K Aug 30 00:46 nioneo_logical.log.v0
-rw-r--r--   1 aseemk  staff16B Aug 30 00:46 nioneo_logical.log.v1
-rw-r--r--   1 aseemk  staff33K Aug 30 00:46 nioneo_logical.log.v10
-rw-r--r--   1 aseemk  staff11K Aug 30 00:46 nioneo_logical.log.v11
-rw-r--r--   1 aseemk  staff32K Aug 30 00:46 nioneo_logical.log.v12
-rw-r--r--   1 aseemk  staff16B Aug 30 00:46 nioneo_logical.log.v13
-rw-r--r--   1 aseemk  staff12M Aug 30 00:46 nioneo_logical.log.v14
-rw-r--r--   1 aseemk  staff   1.4M Aug 30 00:46 nioneo_logical.log.v15
-rw-r--r--   1 aseemk  staff   6.8M Aug 30 00:46 nioneo_logical.log.v16
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v17
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v18
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v19
-rw-r--r--   1 aseemk  staff   1.3M Aug 30 00:46 nioneo_logical.log.v2
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v20
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v21
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v22
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v23
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v24
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 nioneo_logical.log.v25
-rw-r--r--   1 aseemk  staff25M Aug 30 00:46 

Re: [Neo4j] Size on disk, and number of properties

2011-08-30 Thread Johan Svensson
Hi Aseem,

This is actually expected behavior when performing file copy of
running db and starting up with default configuration. If you remove
the files ending with .id in the db directory on the local snapshot
and start up setting rebuild_idgenerators_fast=false you should see
the accurate amount of nodes, relationships and properties.

Regarding the amount of properties not matching this could be due to a
non clean shutdown on the production system. We are planing on
improving this in the near future by allowing for more aggressive
reuse of ids for properties. This will specifically improve things for
workloads that perform a lot of property updates.

-Johan

On Tue, Aug 30, 2011 at 10:05 AM, Aseem Kishore aseem.kish...@gmail.com wrote:
 Hey guys,

 We do offline backups of our db on a semi-regular basis (every few days),
 where we (1) stop the running db, (2) copy its data directory and (3)
 restart the db.

 A few times early on, we did running backups -- but not the proper online
 way -- where we simply copied the data directory while the db was still
 running. (We did this during times where we were confident no requests were
 hitting the db.)

 We noticed that every time we did the running backup, the number of
 properties the web admin reported -- and the space on disk of the db --
 would jump up quite a bit. We stopped doing that recently.

 But even now, both these numbers have gotten quite a bit higher than we
 expect to, and strangely, they seem to differ highly between the running db
 and the copies.

 What could be causing all of this?

 Here are our current numbers:

 *Production*
 - 2,338 nodes
 - 4,473 rels
 - 114,231 props (higher than we would expect it to be, but not by an order
 of magnitude)
 - *1.39 GB!* -- this is way unexpected, particularly since our db used to
 be in the ~10 KB ballpark, and we certainly haven't experienced hockey stick
 growth yet ;) The logical log only takes up 57 KB (0%) btw.


 *Local snapshot*
 - 2,338 nodes
 - 4,473 rels
 - *2,607,892 props!!!* -- ???
 - *1.37 GB!* -- equally surprisingly high, but also interesting that it's
 less than the production db's size. 0 KB logical logs.


 I looked around the wiki and searched this mailing list but didn't find much
 clues here. But as requested on another thread, here's the output of `ls -lh
 data/graph.db/`:

 total 1474520
 -rw-r--r--   1 aseemk  staff    11B Aug 30 00:46 active_tx_log
 drwxr-xr-x  52 aseemk  staff   1.7K Aug 30 00:46 index/
 -rw-r--r--   1 aseemk  staff   343B Aug 30 00:46 index.db
 -rw-r--r--   1 aseemk  staff   854K Aug 30 00:46 messages.log
 -rw-r--r--   1 aseemk  staff    36B Aug 30 00:46 neostore
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46 neostore.id
 -rw-r--r--   1 aseemk  staff    26K Aug 30 00:46 neostore.nodestore.db
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46 neostore.nodestore.db.id
 -rw-r--r--   1 aseemk  staff    62M Aug 30 00:46 neostore.propertystore.db
 -rw-r--r--   1 aseemk  staff   133B Aug 30 00:46
 neostore.propertystore.db.arrays
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.propertystore.db.arrays.id
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.propertystore.db.id
 -rw-r--r--   1 aseemk  staff   1.0K Aug 30 00:46
 neostore.propertystore.db.index
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.propertystore.db.index.id
 -rw-r--r--   1 aseemk  staff   4.0K Aug 30 00:46
 neostore.propertystore.db.index.keys
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.propertystore.db.index.keys.id
 -rw-r--r--   1 aseemk  staff    69M Aug 30 00:46
 neostore.propertystore.db.strings
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.propertystore.db.strings.id
 -rw-r--r--   1 aseemk  staff   144K Aug 30 00:46
 neostore.relationshipstore.db
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.relationshipstore.db.id
 -rw-r--r--   1 aseemk  staff    55B Aug 30 00:46
 neostore.relationshiptypestore.db
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.relationshiptypestore.db.id
 -rw-r--r--   1 aseemk  staff   602B Aug 30 00:46
 neostore.relationshiptypestore.db.names
 -rw-r--r--   1 aseemk  staff     9B Aug 30 00:46
 neostore.relationshiptypestore.db.names.id
 -rw-r--r--   1 aseemk  staff    16B Aug 30 00:46 nioneo_logical.log.1
 -rw-r--r--   1 aseemk  staff     4B Aug 30 00:46 nioneo_logical.log.active
 -rw-r--r--   1 aseemk  staff   945K Aug 30 00:46 nioneo_logical.log.v0
 -rw-r--r--   1 aseemk  staff    16B Aug 30 00:46 nioneo_logical.log.v1
 -rw-r--r--   1 aseemk  staff    33K Aug 30 00:46 nioneo_logical.log.v10
 -rw-r--r--   1 aseemk  staff    11K Aug 30 00:46 nioneo_logical.log.v11
 -rw-r--r--   1 aseemk  staff    32K Aug 30 00:46 nioneo_logical.log.v12
 -rw-r--r--   1 aseemk  staff    16B Aug 30 00:46 nioneo_logical.log.v13
 -rw-r--r--   1 aseemk  staff    12M Aug 30 00:46 nioneo_logical.log.v14
 -rw-r--r--   1 aseemk  staff   1.4M Aug 30 00:46 nioneo_logical.log.v15
 -rw-r--r--   1 aseemk