Hi Christian, I have a question regarding your strategy. I'm sure you mentioned it but just to clarify.
If you somehow start up a new JR instance initialized with the GLOBAL_REVISION, the only way to keep the indexes up-to-date is by initializing the new instance with the latest index snapshot, right? On Wed, Oct 28, 2009 at 5:31 AM, Christian Wurbs <[email protected] > wrote: > Hi, > > I think the issue JCR-1087 is about the Janitor feature of DataBaseJournal > class. > > # janitorEnabled: specifies whether the clean-up thread for the journal > table is enabled (default = false) > # janitorSleep: specifies the sleep time of the clean-up thread in seconds > (only useful when the clean-up thread is enabled, default = 24 * 60 * 60, > which equals 24 hours) > # janitorFirstRunHourOfDay: specifies the hour at which the clean-up thread > initiates its first run (default = 3 which means 3:00 at night) > > I already "used" it but it seemed not to work. > > The second caveat of the following comment > > https://issues.apache.org/jira/browse/JCR-1087?focusedCommentId=12569875&pag > > e=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_<https://issues.apache.org/jira/browse/JCR-1087?focusedCommentId=12569875&pag%0Ae=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_> > 12569875 is the reason why the janitor seemed not to work in my case. > > I removed obsolete cluster node ids (which had a very small local revision > number) so the janitor could now do its job. Thanks for the JCR-1087 hint. > > But still there is the question why a new cluster node initializes with > Revision 0 instead of GLOBAL_REVISION? > The code at DatabaseJournal.initInstanceRevisionAndJanitor looks like it > was > easier to implement that way. > I think in "normal/production" cases this is sufficient, since new nodes > only need to replay some journal entries - if the janitor works. > > > Regarding the "manual" deletion of permanently/obsolete cluster nodes I'm > going to implement some sanitation based on last local revision update time > per node id. > > Thanks for your help. > > > Christian Wurbs > > itemic AG > Am Brauhaus 8a > 01099 Dresden > [email protected] > Tel.: +49 (351) 26622-23 > Fax.: +49 (351) 26622-20 > Vorstand . Torsten Werneke . Aufsichtsratsvorsitzender . Walter Gunz . Sitz > der Gesellschaft . Dresden . Handelsregister . Amtsgericht Dresden . HRB > 19383 > > DISCLAIMER > Any opinions expressed in this e-mail are those of the individual and not > necessarily the company. This e-mail and any files transmitted with it are > confidential and solely for the use of the intended recipient. If you are > not the attended recipient or the person responsible for delivering to the > intended recipient, be advised that you have received this e-mail in error > and that any use is strictly prohibited. If you have received this e-mail > in > error, please advise the sender immediately by using the reply facility in > your e-mail software. We have taken every precaution to ensure that any > attachments have been checked for viruses. However, we cannot except > liability for any damage sustained as a result of software viruses and > advise that you carry out your own virus checks before opening any > attachments. > > > See you at > > -----Ursprüngliche Nachricht----- > Von: Ian Boston [mailto:[email protected]] Im Auftrag von Ian > Boston > Gesendet: Dienstag, 27. Oktober 2009 21:28 > An: [email protected] > Betreff: Re: Clustering-Issues > > > On 27 Oct 2009, at 16:38, Thomas Müller wrote: > > > Hi, > > > >> two cluster nodes working for a while. > >> 100000 revisions in the datastore. > >> add a third cluster node > >> it's replaying 100000 journal entries > >> Is there a way of having the third (new) cluster node start at the > >> latest > >> Global-Revision immediately? > > > > There seems to be a related feature: > > https://issues.apache.org/jira/browse/JCR-1087 - I'm not sure if this > > will solve the problem however (I don't really know this feature) > > We have been running in production with a similar solution to > JCR-1087. We have a perl script that creates a consistent snapshot of > the local disk state (through repetitive rsyncs) and stores that > snapshot on a central server. > > When a new node comes up, it pulls the snapshot from the central > server, adjusts some of the settings and starts the JVM up. At this > point jackrabbit replays the part of the journal since the snapshot as > taken. > > When the snapshots are stored, we look into the local revisions file, > extract the revision and store it. A separate process then deletes > journal records from the database prior to the earliest snapshot, > hence keeping the size of the journal down, and the startup time down. > > The JVM's we use between 3 and 8 depending on the load at time are > hosted on Xen based Linux virtual machines, and over the past 18 > months in production I believe we have recreated the JVM's many times > with no problems (or at least not I've been told about). > > Although the approach is a little agricultural and the repetitive > rsync can take a while to get a solid snapshot (and we do sometimes > get a bad one when the indexes are half way though some optimization), > it woks with JR 1.4, and we get at least 3 parallel snapshots of the > local node state at any one time (infact we keep several old versions > for each node). The nice part is the JR startup script always starts > form a snapshot so the startup time is always acceptable. > > Looking at the comments on JCR-1087 it does some of the same things. > > Ian > > > > >> If I temporarily shut down the second cluster node I receive the > >> following > >> error messages during synchronization at restarting this second node: > > > > I am not sure, it sounds like a bug... Could you create a Jira issue > > for this, together with a simple reproducible test case? > > > > Regards, > > Thomas > > > -- Patricio.-
