I've thought about this briefly, and somehow it actually seems easier (to
me) to consider a compacting (defragmenting) algorithm than a generic
import/export. The problem is that in both cases you have to deal with the
same issue, the node/relationship ID's are changed. For the import/export
this means you need another way to store the connectedness, so you export
the entire graph into another format that maintains the connectedness in
some way (perhaps a whole new set of IDs), and the re-import it again.
Getting a very complex, large and cyclic graph to work like this seems hard
to me because you have to maintain a complete table in memory of the
identity map during the export (which makes the export unscalable).

But de-fragmenting can be done by changing ID's in batches, breaking the
problem down into smaller steps, and never neading to deal with the entire
graph at the same time at any point. For example, take the node table, scan
from the base collecting free ID's. Once you have a decent block, pull that
many nodes down from above in the table. Since you keep the entire set in
memory, you maintain the mapping of old-new and can use that to 'fix' the
relationship table also. Rinse and repeat :-)

One option for the entire graph export that might work for most datasets
that have predominantly tree structures is to export to a common tree
format, like JSON (or, .... XML). This maintains most of the relationships
without requiring any memory of id mappings. The less common cyclic
connections can be maintained with temporary ID's and a table of such ID's
maintained in memory (assuming it is much smaller than the total graph).
This can allow complete export of very large graphs if the temp id table
does indeed remain small. Probably true for many datasets.

On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson <jo...@neotechnology.com>wrote:

> Alex,
>
> You are correct about the "holes" in the store file and I would
> suggest you export the data and then re-import it again. Neo4j is not
> optimized for the use case were more data is removed than added over
> time.
>
> It would be possible to write a compacting utility but since this is
> not a very common use case I think it is better to put that time into
> producing a generic export/import dump utility. The plan is to get a
> export/import utility in place as soon as possible so any input on how
> that should work, what format to use etc. would be great.
>
> -Johan
>
> On Wed, Jun 2, 2010 at 9:23 AM, Alex Averbuch <alex.averb...@gmail.com>
> wrote:
> > Hey,
> > Is there a way to compact the data stores (relationships, nodes,
> properties)
> > in Neo4j?
> > I don't mind if its a manual operation.
> >
> > I have some datasets that have had a lot of relationships removed from
> them
> > but the file is still the same size, so I'm guessing there are a lot of
> > holes in this file at the moment.
> >
> > Would this be hurting lookup performance?
> >
> > Cheers,
> > Alex
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to