Stefan,
Thanks for taking a look at it.
I'm aware that every node is versionable and we've noticed some of the
issues with that. We have the perceived requirment of 'being able to
easily revert any a set of changes' that lead us down that path. We will
have to re-think our use of mix:versionable and look at other ways of
accomplishing our application goals but haven't yet had a chance to do so
(along with other modelling changes)
I'm actually surprised that our data structure is so deep, I hadn't
intended it to be this way and am suspecting we have(or had) an
application bug that is causing this.
The _delete_me nodes are because we where unable to delete corrupt nodes.
What I think happened was that some edits were made to the repository when
it was brought up pointing at a different version store (either a
different repository.xml or someone copied/deleted a workspace data dir
without the associated version store, or something else happened to
corrupt the nodes). We kept getting InvalidItemState exceptions, the only
way we could come up with getting rid of those nodes was to rename them
and then strip them out on an import.
What I ended up doing was writing a program that connected to the source
repository and my new destination repository.
It then walked the node tree performing a non-recursive exportxml and
importxml one node at a time (stripping out the _delete_me nodes and
versionHistory properties). We performed a save after each node import.
Thanks for your help.
hi steve
On 7/30/07, Steven Singer <[EMAIL PROTECTED]> wrote:
How are people using importxml to restore or import anything but small
amounts of data into the repository? I have a 22meg xml file that I'm
unable to import because I keep running out of memory.
i analyzed the xml file that you sent me offline (thanks!).
i noticed the following:
1) system view xml export
2) file size: 22mb without whitespace,
=> 650mb with simple 2-space indentation (!)
3) 23k nodes and 202k properties
4) virtually every node is versionable
5) *very* deep structure: max depth is 2340... (!)
6) lots of junk data (e.g. thousands of _delete_me1234567890 nodes,
btw hundreds/thousands of levels deep and all versionable)
i'd say that the content model has lots of room for improvement ;)
mainly 5) accounts for the excessive memory consumption during
import. while this could certainly be improved in jackrabbit i can't think of a
really good use case for creating >2k level deep hierarchies.
i also would suggest to review the use of mix:versionable. versionability
doesn't come for free since it implies a certain overhead. making 1 node
mix:versionable creates approx. 7 nodes and 13 properties in the version store
(version history, root version etc etc). mix:versionable should therefore only
be used where needed.
btw: by using a decorated content handler which performed a save every
200 nodes i was able to import the data with 512mb heap. it took about
30 minutes on a macbook pro (2ghz).
cheers
stefan
The importxml in in JCR commands works fine but when I go to save the data
the jvm memory usage goes up to 1GB and eventually runs out of memory.
This was sort of discussed
http://mail-archives.apache.org/mod_mbox/jackrabbit-users/200610.mbox/browser
but I didn't see any solutions proposed.
Does the backup tool suffer from the same problem (being unable to restore
content above a certain size?) How have other people handled migrating
data between different persistence managers or changing a node-type
definition that seems to require a re-import?
Steven Singer
RAD International Ltd.
Steven Singer
RAD International Ltd.