RE: Cluster & Backup

sbarriba Fri, 16 Nov 2007 01:19:11 -0800

Hi Andrea,
Our method of backup has challenges but it's been working well for us thus
far.


Simply backing up the database and file system felt too low level for us.
If, god forbid, any corruption occurred it would be nice to be able to
manipulate the repository content logically to resolve issues and restore
data. 
Backing up the file system and database for JackRabbit is in my view a
little like copying the binary files of a MySql database. It represents a
backup of the data but its not the same as seeing the SQL inserts.

For this reason we:
1) backup the repository/* file system, 
2) backup the database using, in this case, mysqldump
3) we have exposed an API over HTTP to a webapp running JackRabbit which
allows us to trigger an exportSysView. We're using a simple script which
iterates over the contents of repository/workspaces to determine the names
of the workspaces to backup. The backup is written straight to the file
system from the webapp (not returned over HTTP). 

Issue 1: The size of the XML backups are growing quickly especially where we
have binary content in the database.
Issue 2: While the memory usage of the app server does not appear to
increase the with the size of the export, you are limited by the available
memory when doing an import in our experience. We currently have to allocate
over 1GB to the command line tool to port backups between environments. This
does not scale as eventually these backups will exhaust available memory.

There have been various threads on hot backups and it feels like a topic on
which the community needs to define a best practise to ensure JackRabbit is
considered enterprise ready.
I've not yet had chance to review Jacco's solution described in the thread
"Memory usage issues of importml/exportsysview" (attached).

Regards,
Shaun

-----Original Message-----
From: Andrea K. [mailto:[EMAIL PROTECTED] 
Sent: 13 November 2007 16:02
To: [email protected]
Subject: Cluster & Backup


Hi all,
Can you help me to find the right solution to backup a clustered JR?

Details are:
1. JR on Oracle 10g database (repository and cluster tables)
2. Local directories (each server) with configuration files and indexes (+
revision.log)

How can I backup it?

A question is (I describe a situation):
- Server 1 is updated to revision 120 (i.e.) and I backup it (server is
stopped).
- Server 2 is updated in the meanwhile to 124 (i.e.) and I backup it (server
is stopped).
- Server 1 is restarted and write some revisions up to 127.
- Server 1 is restarted and write some revisions up to 135.

If Server 1 crash and i re-apply backup indexes up to 120, will it work or
some revisions are skipped as generator is itself during Server1 initial
reindex?

Thanks a lot for your help.
BR,
Andrea -
-- 
View this message in context:
http://www.nabble.com/Cluster---Backup-tf4798776.html#a13728833
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

--- Begin Message ---

Hello Shaun,

We use our own created backup facilty also works "hot".
I wrote a mail about it a few days ago ( it's part of JeCARS ).

The result of the backup is a;
- CND file
- node structure file in plain ASCII, easy parseable
- the binary information is stored as seperate files.

The solution works very well. I use it in an other application in which the
repository is replicated at short intervals.

It is especially usefull when existing nodetypes are changed.... in the
future we will introduce a sort of "evolution scheme".
When e.g. propertynames are changed the "restore" operation can map the
property again.

The source (of the first version) is available.



Greetings,

  Jacco van Weert



On 10/5/07, sbarriba <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> During a recent thread Hot Backup Tools were discussed - see
> http://www.mail-archive.com/[email protected]/msg04255.html.
>
>
>
> As an outcome of that we're doing 2 things:
>
> 1)      "Low-level" backup
>
> o   Backing up the database
>
> o   Backing up the repository file system
>
> 2)      "High-level" backup
>
> o   Running exportsysview on each workspace
>
>
>
> When migrating between environments or restoring backups solution 2) is
> very
> useful although the XML files are getting very large where the content has
> lots of binaries etc. The main issue is that the memory requirements of
> "importxml" increase linearly with the size of the XML file. I presume
> this
> is due to either a) the memory required to parse the file, and/or b) the
> memory required to hold the transient state of the import.
>
>
>
> We're now needing to use a 1GB heap size for some imports and obviously
> this
> will hit a crunch point.
>
>
>
> Any suggestions on how to resolve this memory issue? For example, could
> the
> "importxml" not use a SAX event model to avoid parsing the XML into a
> complete DOM etc (note I don't know the internals of importxml as it
> stands).
>
>
>
> All suggestions welcome.
>
> Regards,
>
> Shaun
>
>
>
>


-- 
-------------------------------------
Jacco van Weert -- [EMAIL PROTECTED]
JCR Controller -- http://www.xs4all.nl/~weertj/jcr

--- End Message ---

RE: Cluster & Backup

Reply via email to