rsync is using hashes for finding differences between local and remote files. I did not see any such content-based hash in jackrabbit, or? If it was there, it should be a property that is updated each time the node is changed. Updating it on-the-fly each time you want to compare nodes would be much too costly
2011/5/3 sam lee <[email protected]>: > rsync does diff and compression. > > What is jackrabbit equivalent of diff? Is there an efficient way of getting > the list of nodes that should be transported over? > > On Mon, May 2, 2011 at 7:03 PM, Jürgen Baier <[email protected]>wrote: > >> 2011/5/2 sam lee <[email protected]>: >> > Yah, it looks like the fastest way of migrating data is to transport the >> > entire repository filesystem. >> > >> http://wiki.apache.org/jackrabbit/BackupAndMigration#Low%20Level%20Backup >> > >> > But, it'd be nice if there's a way to selectively migrate some path (of >> > repository). >> > >> >> That is also what I aim for... >> >> > >> > Do you know of data transport API? JCR doesn't seem to define any. >> > By transport API, I mean something like this: >> > "transport /content/foo/bar/* from localhost:8080 to >> > saml.com:3040/content/foo/bar/copy/" >> > >> > Would you use RMI for this? >> > >> >> I do not currently know any transport API, I would do that (because of >> the infrastructure I use jackrabbit in) via EJBs. A naive approach >> could be iterating through the subtree_to_copy on the source machine >> and creating (via an EJB on the remote machine) the nodes with >> properties/versions/... on the target machine. I am sure you could do >> the same thing by accessing the remote repo via RMI. I used RMI-access >> some time ago and it was quite nice, but due to security concerns I >> deactivated the RMI servlet in my setup. >> >> Using a SyncFactory that returns either an RMI- or an >> EJB-transport-wrapper, this could nicely be solved so that once it is >> done (RMI and EJB) people can use what they want. I would be willing >> to do the EJB-stuff, and also help/work on the basic syncing as I >> consider that an important thing. >> >> I am aware that the EJB-thing is a "custom" wish by me, since >> jackrabbit comes with the RMI-access out-of-the-box, so the RMI-sync >> would be the default method. >> >> > >> > On Mon, May 2, 2011 at 8:21 AM, Jürgen Baier <[email protected] >> >wrote: >> > >> >> Hi, >> >> >> >> some time ago I tried something similar and used xml-export. This is >> >> not an option for non-trivial data, since the export/import is very, >> >> very slow (for your 500GB it would be much more than one day to export >> >> to xml, if I remember it correctly; was something in the range of >> >> hours/GB on my machine). >> >> >> >> What worked with me was using the filesystem-store and copying the >> >> whole repo-dir to the target machine. Still, I am interested in some >> >> sync-tool, because the ability to copy just a sub-tree of the whole >> >> repo would allow me to copy single users (their "home"-node and all >> >> nodes below that) to another machine. Since my jackrabbit-repos run as >> >> shared jee-resource I was thinking about a jee-solution, where I read >> >> the nodes on the inital machine and copy them to the target machine. >> >> But maybe I just miss a cool tool out there that already does this. >> >> >> >> Regards, >> >> Jürgen >> >> >> >> >> >> 2011/5/2 sam lee <[email protected]>: >> >> > Hey, >> >> > >> >> > I have a large repository. And, I have a few empty repositories. >> >> > How can I synchronize empty repositories with the content from the >> large >> >> > repository? >> >> > >> >> > Is there rsync like tool where subsequent synchronization (data >> >> migration) >> >> > is much quicker than initial pass? >> >> > >> >> > Is xml export/import the only option? Has anyone tried export/import >> on a >> >> > huge repository (500GB and growing)? >> >> > >> >> > Or, is there a way to rsync repository filesystem directory (not >> through >> >> JCR >> >> > but using the commandline tool)? >> >> > >> >> >> > >> >
