Re: repository synchronization

Jürgen Baier Tue, 03 May 2011 03:48:23 -0700

rsync is using hashes for finding differences between local and remote
files. I did not see any such content-based hash in jackrabbit, or? If
it was there, it should be a property that is updated each time the
node is changed. Updating it on-the-fly each time you want to compare
nodes would be much too costly


2011/5/3 sam lee <[email protected]>:
> rsync does diff and compression.
>
> What is jackrabbit equivalent of diff? Is there an efficient way of getting
> the list of nodes that should be transported over?
>
> On Mon, May 2, 2011 at 7:03 PM, Jürgen Baier <[email protected]>wrote:
>
>> 2011/5/2 sam lee <[email protected]>:
>> > Yah, it looks like the fastest way of migrating data is to transport the
>> > entire repository filesystem.
>> >
>> http://wiki.apache.org/jackrabbit/BackupAndMigration#Low%20Level%20Backup
>> >
>> > But, it'd be nice if there's  a way to selectively migrate some path (of
>> > repository).
>> >
>>
>> That is also what I aim for...
>>
>> >
>> > Do you know of data transport API? JCR doesn't seem to define any.
>> > By transport API, I mean something like this:
>> > "transport /content/foo/bar/*   from localhost:8080  to
>> > saml.com:3040/content/foo/bar/copy/"
>> >
>> > Would you use RMI for this?
>> >
>>
>> I do not currently know any transport API, I would do that (because of
>> the infrastructure I use jackrabbit in) via EJBs. A naive approach
>> could be iterating through the subtree_to_copy on the source machine
>> and creating (via an EJB on the remote machine) the nodes with
>> properties/versions/... on the target machine. I am sure you could do
>> the same thing by accessing the remote repo via RMI. I used RMI-access
>> some time ago and it was quite nice, but due to security concerns I
>> deactivated the RMI servlet in my setup.
>>
>> Using a SyncFactory that returns either an RMI- or an
>> EJB-transport-wrapper, this could nicely be solved so that once it is
>> done (RMI and EJB) people can use what they want. I would be willing
>> to do the EJB-stuff, and also help/work on the basic syncing as I
>> consider that an important thing.
>>
>> I am aware that the EJB-thing is a "custom" wish by me, since
>> jackrabbit comes with the RMI-access out-of-the-box, so the RMI-sync
>> would be the default method.
>>
>> >
>> > On Mon, May 2, 2011 at 8:21 AM, Jürgen Baier <[email protected]
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >> some time ago I tried something similar and used xml-export. This is
>> >> not an option for non-trivial data, since the export/import is very,
>> >> very slow (for your 500GB it would be much more than one day to export
>> >> to xml, if I remember it correctly; was something in the range of
>> >> hours/GB on my machine).
>> >>
>> >> What worked with me was using the filesystem-store and copying the
>> >> whole repo-dir to the target machine. Still, I am interested in some
>> >> sync-tool, because the ability to copy just a sub-tree of the whole
>> >> repo would allow me to copy single users (their "home"-node and all
>> >> nodes below that) to another machine. Since my jackrabbit-repos run as
>> >> shared jee-resource I was thinking about a jee-solution, where I read
>> >> the nodes on the inital machine and copy them to the target machine.
>> >> But maybe I just miss a cool tool out there that already does this.
>> >>
>> >> Regards,
>> >> Jürgen
>> >>
>> >>
>> >> 2011/5/2 sam lee <[email protected]>:
>> >> > Hey,
>> >> >
>> >> > I have a large repository. And, I have a few empty repositories.
>> >> > How can I synchronize empty repositories with the content from the
>> large
>> >> > repository?
>> >> >
>> >> > Is there rsync like tool where subsequent synchronization (data
>> >> migration)
>> >> > is much quicker than initial pass?
>> >> >
>> >> > Is xml export/import the only option? Has anyone tried export/import
>> on a
>> >> > huge repository (500GB and growing)?
>> >> >
>> >> > Or, is there a way to rsync repository filesystem directory (not
>> through
>> >> JCR
>> >> > but using the commandline tool)?
>> >> >
>> >>
>> >
>>
>

Re: repository synchronization

Reply via email to