If you import that big a file, you should import directly into the workspace and not in the session, without going through the transient space and using lots of memory. So use Workspace.getImportContentHandler or Workspace.importXML, not the Session methods. Read the JSR-170 for the benefits.

Florent

chewy_fruit_loop wrote:
I'm currently trying to import an XML file into a bog standard empty
repository.
The problem is the file is 72.5mb containing around 200,000 elements (yes
they are all required).  This is currently taking about 90 mins (give or
take) to get into derby, and thats with indexing off.

The time wouldn't be such an issue if it didn't use 1.7Gb of RAM.
I've decorated a ContentHandler so it calls :

root.update(<workspace name>)
root.save()

where root is the root node from the tree.
This is being called after every 500 start elements.  The save just doesn't
seem to flush the contents that have been parsed to the persistent store. This is the same if I use derby or Oracle as storage. The only time things
seem to start to be persisted is when the endDocument is hit.

have I missed something blindingly obvious here?  I really don't mind
everyone having a bit of a chuckle at me, I just want to get this sorted
out.


thanks



--
Florent Guillaume, Director of R&D, Nuxeo
Open Source Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87

Reply via email to