Re: repository synchronization

Jürgen Baier Mon, 02 May 2011 16:04:15 -0700

2011/5/2 sam lee <[email protected]>:
> Yah, it looks like the fastest way of migrating data is to transport the
> entire repository filesystem.
> http://wiki.apache.org/jackrabbit/BackupAndMigration#Low%20Level%20Backup
>
> But, it'd be nice if there's  a way to selectively migrate some path (of
> repository).
>

That is also what I aim for...

>
> Do you know of data transport API? JCR doesn't seem to define any.
> By transport API, I mean something like this:
> "transport /content/foo/bar/*   from localhost:8080  to
> saml.com:3040/content/foo/bar/copy/"
>
> Would you use RMI for this?
>

I do not currently know any transport API, I would do that (because of
the infrastructure I use jackrabbit in) via EJBs. A naive approach
could be iterating through the subtree_to_copy on the source machine
and creating (via an EJB on the remote machine) the nodes with
properties/versions/... on the target machine. I am sure you could do
the same thing by accessing the remote repo via RMI. I used RMI-access
some time ago and it was quite nice, but due to security concerns I
deactivated the RMI servlet in my setup.

Using a SyncFactory that returns either an RMI- or an
EJB-transport-wrapper, this could nicely be solved so that once it is
done (RMI and EJB) people can use what they want. I would be willing
to do the EJB-stuff, and also help/work on the basic syncing as I
consider that an important thing.

I am aware that the EJB-thing is a "custom" wish by me, since
jackrabbit comes with the RMI-access out-of-the-box, so the RMI-sync
would be the default method.

>
> On Mon, May 2, 2011 at 8:21 AM, Jürgen Baier <[email protected]>wrote:
>
>> Hi,
>>
>> some time ago I tried something similar and used xml-export. This is
>> not an option for non-trivial data, since the export/import is very,
>> very slow (for your 500GB it would be much more than one day to export
>> to xml, if I remember it correctly; was something in the range of
>> hours/GB on my machine).
>>
>> What worked with me was using the filesystem-store and copying the
>> whole repo-dir to the target machine. Still, I am interested in some
>> sync-tool, because the ability to copy just a sub-tree of the whole
>> repo would allow me to copy single users (their "home"-node and all
>> nodes below that) to another machine. Since my jackrabbit-repos run as
>> shared jee-resource I was thinking about a jee-solution, where I read
>> the nodes on the inital machine and copy them to the target machine.
>> But maybe I just miss a cool tool out there that already does this.
>>
>> Regards,
>> Jürgen
>>
>>
>> 2011/5/2 sam lee <[email protected]>:
>> > Hey,
>> >
>> > I have a large repository. And, I have a few empty repositories.
>> > How can I synchronize empty repositories with the content from the large
>> > repository?
>> >
>> > Is there rsync like tool where subsequent synchronization (data
>> migration)
>> > is much quicker than initial pass?
>> >
>> > Is xml export/import the only option? Has anyone tried export/import on a
>> > huge repository (500GB and growing)?
>> >
>> > Or, is there a way to rsync repository filesystem directory (not through
>> JCR
>> > but using the commandline tool)?
>> >
>>
>

Re: repository synchronization

Reply via email to