AW: Clustering-Issues

Christian Wurbs Wed, 28 Oct 2009 06:32:09 -0700

Hi,

I think the issue JCR-1087 is about the Janitor feature of DataBaseJournal
class.

# janitorEnabled: specifies whether the clean-up thread for the journal
table is enabled (default = false)
# janitorSleep: specifies the sleep time of the clean-up thread in seconds
(only useful when the clean-up thread is enabled, default = 24 * 60 * 60,
which equals 24 hours)
# janitorFirstRunHourOfDay: specifies the hour at which the clean-up thread
initiates its first run (default = 3 which means 3:00 at night)

I already "used" it but it seemed not to work.

The second caveat of the following comment
https://issues.apache.org/jira/browse/JCR-1087?focusedCommentId=12569875&pag
e=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_
12569875 is the reason why the janitor seemed not to work in my case.

I removed obsolete cluster node ids (which had a very small local revision
number) so the janitor could now do its job. Thanks for the JCR-1087 hint.

But still there is the question why a new cluster node initializes with
Revision 0 instead of GLOBAL_REVISION?
The code at DatabaseJournal.initInstanceRevisionAndJanitor looks like it was
easier to implement that way.
I think in "normal/production" cases this is sufficient, since new nodes
only need to replay some journal entries - if the janitor works.

Regarding the "manual" deletion of permanently/obsolete cluster nodes I'm
going to implement some sanitation based on last local revision update time
per node id.

Thanks for your help.

Christian Wurbs

itemic AG
Am Brauhaus 8a
01099 Dresden 
[email protected]
Tel.: +49 (351) 26622-23
Fax.: +49 (351) 26622-20
Vorstand . Torsten Werneke . Aufsichtsratsvorsitzender . Walter Gunz . Sitz
der Gesellschaft . Dresden . Handelsregister . Amtsgericht Dresden . HRB
19383 

DISCLAIMER 
Any opinions expressed in this e-mail are those of the individual and not
necessarily the company. This e-mail and any files transmitted with it are
confidential and solely for the use of the intended recipient. If you are
not the attended recipient or the person responsible for delivering to the
intended recipient, be advised that you have received this e-mail in error
and that any use is strictly prohibited. If you have received this e-mail in
error, please advise the sender immediately by using the reply facility in
your e-mail software. We have taken every precaution to ensure that any
attachments have been checked for viruses. However, we cannot except
liability for any damage sustained as a result of software viruses and
advise that you carry out your own virus checks before opening any
attachments.

See you at   

-----Ursprüngliche Nachricht-----
Von: Ian Boston [mailto:[email protected]] Im Auftrag von Ian Boston
Gesendet: Dienstag, 27. Oktober 2009 21:28
An: [email protected]
Betreff: Re: Clustering-Issues

On 27 Oct 2009, at 16:38, Thomas Müller wrote:

> Hi,
>
>> two cluster nodes working for a while.
>> 100000 revisions in the datastore.
>> add a third cluster node
>> it's replaying 100000 journal entries
>> Is there a way of having the third (new) cluster node start at the  
>> latest
>> Global-Revision immediately?
>
> There seems to be a related feature:
> https://issues.apache.org/jira/browse/JCR-1087 - I'm not sure if this
> will solve the problem however (I don't really know this feature)

We have been running in production with a similar solution to  
JCR-1087. We have a perl script that creates a consistent snapshot of  
the local disk state (through repetitive rsyncs) and stores that  
snapshot on a central server.

When a new node comes up, it pulls the snapshot from the central  
server, adjusts some of the settings and starts the JVM up. At this  
point jackrabbit replays the part of the journal since the snapshot as  
taken.

When the snapshots are stored, we look into the local revisions file,  
extract the revision and store it. A separate process then deletes  
journal records from the database prior to the earliest snapshot,  
hence keeping the size of the journal down, and the startup time down.

The JVM's we use between 3 and 8 depending on the load at time are  
hosted on Xen based Linux virtual machines, and over the past 18  
months in production I believe we have recreated the JVM's many times  
with no problems (or at least not I've been told about).

Although the approach is a little agricultural and the repetitive  
rsync can take a while to get a solid snapshot (and we do sometimes  
get a bad one when the indexes are half way though some optimization),  
it woks with JR 1.4, and we get at least 3 parallel snapshots of the  
local node state at any one time (infact we keep several old versions  
for each node). The nice part is the JR startup script always starts  
form a snapshot so the startup time is always acceptable.

Looking at the comments on JCR-1087 it does some of the same things.

Ian

>
>> If I temporarily shut down the second cluster node I receive the  
>> following
>> error messages during synchronization at restarting this second node:
>
> I am not sure, it sounds like a bug... Could you create a Jira issue
> for this, together with a simple reproducible test case?
>
> Regards,
> Thomas

AW: Clustering-Issues

Reply via email to