Re: Reindexing a workspace ...
Hi Thomas, Thanks for you answer. This is also exactly what we already do in most cases ;-) I guess currently it's a trade off: If you don't want to stop jackrabbit for making a backup, you must have a clustered node that you can dedicate to making the backup. Regards, Bart On Mon, Jun 15, 2009 at 11:49 AM, Thomas Müller wrote: > Hi, > > If you use database persistence managers and a database journal, you > could use the following procedure: > > 1) stop the cluster node > 2) backup the lucene index, config files, and revision.log of this cluster > node > 3) later on, backup the persistence manager data, journal data, and data store > > This backup should be consistent because the journal includes the list > of changes, so the lucene index is updated. > > Regards, > Thomas > > On Tue, Jun 9, 2009 at 3:48 PM, KÖLL Claus wrote: >> hi (thomas), >> >> your post was clear thanks for the info ... >> ok the lucene index is consistent but you will not get a snapshot from the >> repository >> as bart wrote. >> >> I see some problems with barts solution .. >> if you have a large repository a write lock that runs hours is not good >> but maybe some others have good ideas ... >> >> i have tested the environment as you mentioned with the cluster and it works >> fine at the >> moment for us because we can re-index the backup cluster in the background >> if we get >> a crash ... hopefully not :-) >> >> greets >> claus >> > -- Hippo B.V. - Amsterdam Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466 Hippo USA Inc. - San Francisco 101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646 - http://www.onehippo.com - [email protected] -
Re: Reindexing a workspace ...
Hi, If you use database persistence managers and a database journal, you could use the following procedure: 1) stop the cluster node 2) backup the lucene index, config files, and revision.log of this cluster node 3) later on, backup the persistence manager data, journal data, and data store This backup should be consistent because the journal includes the list of changes, so the lucene index is updated. Regards, Thomas On Tue, Jun 9, 2009 at 3:48 PM, KÖLL Claus wrote: > hi (thomas), > > your post was clear thanks for the info ... > ok the lucene index is consistent but you will not get a snapshot from the > repository > as bart wrote. > > I see some problems with barts solution .. > if you have a large repository a write lock that runs hours is not good > but maybe some others have good ideas ... > > i have tested the environment as you mentioned with the cluster and it works > fine at the > moment for us because we can re-index the backup cluster in the background if > we get > a crash ... hopefully not :-) > > greets > claus >
Re: Reindexing a workspace ...
On Tue, Jun 9, 2009 at 3:48 PM, KÖLL Claus wrote: > hi (thomas), > > your post was clear thanks for the info ... > ok the lucene index is consistent but you will not get a snapshot from the > repository > as bart wrote. > > I see some problems with barts solution .. > if you have a large repository a write lock that runs hours is not good > but maybe some others have good ideas ... Of course, but it depends on your definition of large. For example dumping 12 GB of data to disk from mysql will take something like half an hour. Or in other terms that's about 1.000.000 node budles and about 4.500.000 version bundles. Running for half an hour in read only in low traffic hours is imo quite acceptable in a lot of environments. > > i have tested the environment as you mentioned with the cluster and it works > fine at the > moment for us because we can re-index the backup cluster in the background if > we get > a crash ... hopefully not :-) Keep in mind that re-indexing can take quite a lot of time. IIRC a full re-index of the repository mentioned above took somewhere between 6-12 hours. Regards, Bart
Re: Reindexing a workspace ...
On Tue, Jun 9, 2009 at 11:01 AM, Thomas Müller wrote: > Hi, > >> lets say you have a disk crash and you must restore the index folder but it >> was backed up a day before. >> to get a consistent state with the data you must re-index the whole >> workspace. > > Probably my original mail was unclear. I repeat: "One solution is to > use clustering. One cluster node (the 'master') is used for regular > requests, while the other (the 'backup') is used for backup. The > master node continuously runs, while the backup node is stopped from > time to time to create a backup." In that case the Lucene index in the > backup is consistent. After a crash, you restore the backup. Like > that, you don't have an inconsistent index. We use exactly such a solution with some success. The problem I see with this solution apart from the obvious extra resources you will need, is that you also have to backup the database at 'about the same time'. It just feels like a big workaround and you never feel really sure you've got 'everything'... If we could have some kind of 'flush everything to disk/database/index and hold a write lock until further notice' method and a 'please continue as normal' method that could be called remotely somehow it would make creating consistent backups much easier: - issue 'flush and hold' - use your favorite backup method, rsync, scp, db dumps, etc. - issue 'continue' Any thoughts if such a thing would be possible? If so, I could help to implement this. Regards, Bart -- Hippo B.V. - Amsterdam Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466 Hippo USA Inc. - San Francisco 101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646 - http://www.onehippo.com - [email protected] -
Re: Reindexing a workspace ...
Hi, > lets say you have a disk crash and you must restore the index folder but it > was backed up a day before. > to get a consistent state with the data you must re-index the whole workspace. Probably my original mail was unclear. I repeat: "One solution is to use clustering. One cluster node (the 'master') is used for regular requests, while the other (the 'backup') is used for backup. The master node continuously runs, while the backup node is stopped from time to time to create a backup." In that case the Lucene index in the backup is consistent. After a crash, you restore the backup. Like that, you don't have an inconsistent index. Regards, Thomas
Re: Reindexing a workspace ...
Hi, > with your solution i will be able to backup the index but that does not solve > my problem > if you come in that situation that you must reindex a workspace from blank. How would you come in that situation? > i dont know what happens in a cluster environemnt if i start one member > without > the index to start a reindex process and work with the other cluster member > ... > is that possible ? If you start a cluster node and the Lucene index files are missing, I guess they are created automatically. I didn't test this however. Regards, Thomas On Wed, May 20, 2009 at 11:40 AM, KÖLL Claus wrote: > hi thomas, > > with your solution i will be able to backup the index but that does not solve > my problem > if you come in that situation that you must reindex a workspace from blank. > > i dont know what happens in a cluster environemnt if i start one member > without > the index to start a reindex process and work with the other cluster member > ... > is that possible ? > > greets > claus >
Re: Reindexing a workspace ...
Hi, One solution is to use clustering. One cluster node (the 'master') is used for regular requests, while the other (the 'backup') is used for backup. The master node continuously runs, while the backup node is stopped from time to time to create a backup. [Advertisement] Day CRX (http://www.day.com) supports online backup when using the Tar persistence manager and the Lucene index. Regards, Thomas
