2011/5/2 sam lee <[email protected]>: > Yah, it looks like the fastest way of migrating data is to transport the > entire repository filesystem. > http://wiki.apache.org/jackrabbit/BackupAndMigration#Low%20Level%20Backup > > But, it'd be nice if there's a way to selectively migrate some path (of > repository). >
That is also what I aim for... > > Do you know of data transport API? JCR doesn't seem to define any. > By transport API, I mean something like this: > "transport /content/foo/bar/* from localhost:8080 to > saml.com:3040/content/foo/bar/copy/" > > Would you use RMI for this? > I do not currently know any transport API, I would do that (because of the infrastructure I use jackrabbit in) via EJBs. A naive approach could be iterating through the subtree_to_copy on the source machine and creating (via an EJB on the remote machine) the nodes with properties/versions/... on the target machine. I am sure you could do the same thing by accessing the remote repo via RMI. I used RMI-access some time ago and it was quite nice, but due to security concerns I deactivated the RMI servlet in my setup. Using a SyncFactory that returns either an RMI- or an EJB-transport-wrapper, this could nicely be solved so that once it is done (RMI and EJB) people can use what they want. I would be willing to do the EJB-stuff, and also help/work on the basic syncing as I consider that an important thing. I am aware that the EJB-thing is a "custom" wish by me, since jackrabbit comes with the RMI-access out-of-the-box, so the RMI-sync would be the default method. > > On Mon, May 2, 2011 at 8:21 AM, Jürgen Baier <[email protected]>wrote: > >> Hi, >> >> some time ago I tried something similar and used xml-export. This is >> not an option for non-trivial data, since the export/import is very, >> very slow (for your 500GB it would be much more than one day to export >> to xml, if I remember it correctly; was something in the range of >> hours/GB on my machine). >> >> What worked with me was using the filesystem-store and copying the >> whole repo-dir to the target machine. Still, I am interested in some >> sync-tool, because the ability to copy just a sub-tree of the whole >> repo would allow me to copy single users (their "home"-node and all >> nodes below that) to another machine. Since my jackrabbit-repos run as >> shared jee-resource I was thinking about a jee-solution, where I read >> the nodes on the inital machine and copy them to the target machine. >> But maybe I just miss a cool tool out there that already does this. >> >> Regards, >> Jürgen >> >> >> 2011/5/2 sam lee <[email protected]>: >> > Hey, >> > >> > I have a large repository. And, I have a few empty repositories. >> > How can I synchronize empty repositories with the content from the large >> > repository? >> > >> > Is there rsync like tool where subsequent synchronization (data >> migration) >> > is much quicker than initial pass? >> > >> > Is xml export/import the only option? Has anyone tried export/import on a >> > huge repository (500GB and growing)? >> > >> > Or, is there a way to rsync repository filesystem directory (not through >> JCR >> > but using the commandline tool)? >> > >> >
