Re: [Sequoia] Backend inconsistency after reboot (restart of controller)

Stephane GIRON Wed, 17 Jan 2007 01:55:08 -0800

Hi Ingo,

Thank you for your answer.

First, I think that you should add in your script after the "shutdownvirtualdatabase" is done a "shutdown" command for the controller, beforekilling it.But anyway, what you describes should not happen. So if you can providea log which shows the problem, it would be very nice... ;-)


Could you also tell me which recovery log you are using with sequoia?

Thank you for your help ,

Stephane

Ingo Kampe a écrit :

Hi Stephane,

I wrote an init.d script calling hook scripts to load several virtual databases
during start/stop of controller. So the restart case works as follows:

stop
=====
${SEQUOIA_HOME}/bin/console.sh -e -p $JMXPORT -f $stopscript -l
cat stopscript:
 admin @DB_NAME@
 @SEQ_ADMIN_USER@
 @SEQ_ADMIN_PASS@
 disable @VDB_BACKEND1@
 quit
 shutdown virtualdatabase @DB_NAME@ 2
 @SEQ_ADMIN_USER@
 @SEQ_ADMIN_PASS@
 quit

[ some wait for the stopscript finishing stuff ]

PID=`cat ${SEQUOIA_PID} 2>/dev/null`
[ -n "$PID" ] && kill $PID 2>/dev/null

start
=====
$SU_BIN -c "${SEQUOIA_HOME}/bin/controller.sh &" >>${SEQUOIA_LOG} 2>&1

${SEQUOIA_HOME}/bin/console.sh -e -p $JMXPORT -f $startscript -l
cat startscript:
 load virtualdatabase configuration @VDB_CONF@
 admin @DB_NAME@
 @SEQ_ADMIN_USER@
 @SEQ_ADMIN_PASS@
 enable @VDB_BACKEND1@
 quit
 quit

I have to play a little bit to get a "clean" log for you. This will be delayed
to wednesday. Tomorrow is my traditional no sequoia day ;-).

Thanx,
)ngo

Stephane GIRON wrote:

Hi Ingo,

Could you please describe the way you are doing the reboot? I mean from
Sequoia point of view, which commands are you using when stopping the
vdb / controller? Could you also provide controller logs that show the
issue (both controllers)?
I will try to investigate this. If you can provide more information, it
will be very welcome.

Thanks a lot,

Stephane

Ingo Kampe a écrit :

Hi,

I have seen several times a problem through write request on 1 side of
my db
cluster during reboot of the other. The result was that in some cases
sequoia
comes up joins the group and start without any error detected but the
data in
the backends isn't the same. The write was done on the 2nd machine but
is not
recovered at the 1st rebooting machine. I tried hard to find any
specific point
to get this behaviour deterministic reproducable but yet without
success. This
happens yet in around 10% of reboots. The main problem for us is that
sequoia
starts up without problem. If it would set the backends to disabled
state we
could do a manual resync.

May be you can give me some hints? Any to early process kill or
communication
race condition? ...

My general environment:
debian etch, java 1.5, 1 backend postgresql 8.1.4, sequoia 2.10.4,
appia 3.2.4,
hedera 1.5.6-cvs03.01.2007, "base view" setup of appia
Raidb-1 setup with 2 machines, each 1 controller with 1 single backend


Some strange log entries from the rebooting machine:

2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
negative
suspendedWrites in AbstractScheduler.resumeWrites()
2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
negative
suspendedTransactions in AbstractScheduler.resumeNewTransactions()
2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
negative
suspendedPersistentConnections in
AbstractScheduler.resumeNewPersistentConnections()

or

2007-01-12 15:43:48,486 ERROR sequoia.controller.loadbalancer Request
was not
found in total order queue, posting out of order (UPDATE
bot_event_cfg_table SET
event_name='X509_OCSP_RESPONDER_UNREACHABLE',event_priority='P2',event_is_active='1'

where bot_event_cfg_id='X509_OCSP_RESPONDER_UNREACHABLE'/)

Where could these entries come from?


Just FYI:
Sometimes I get the next one which I can reproduce with iptables
closing the
appia port for a moment. After this sequoia correctly set the backend in
disabled state and it will be restored:
2007-01-12 15:45:09,758 ERROR controller.recoverylog.RecoverThread
Unable to get
checkpoint from recovery log.
java.sql.SQLException: Unable to get checkpoint disable
botdb1_00304874867C-10.10.10.1:25322-20070112154347718+0100 from the
recovery
log (Checkpoint disable
botdb1_00304874867C-10.10.10.1:25322-20070112154347718+0100 does not
exist in
recovery log)
        at
org.continuent.sequoia.controller.recoverylog.events.GetCheckpointLogIdEvent.execute(GetCheckpointLogIdEvent.java:96)

        at
org.continuent.sequoia.controller.recoverylog.LoggerThread.run(LoggerThread.java:732)



Thanx and Greetz,
)ngo

------------------------------------------------------------------------


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia



+----------------------------------------------------------------------+
| Z1 SecureMail Gateway Info - http://www.zertificon.com               |
+----------------------------------------------------------------------+
| - Die Nachricht war weder verschluesselt noch digital unterschrieben |
+----------------------------------------------------------------------+

------------------------------------------------------------------------


_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia



_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Re: [Sequoia] Backend inconsistency after reboot (restart of controller)

Reply via email to