Re: [Sequoia] Backend inconsistency after reboot (restart of controller)

Ingo Kampe Mon, 15 Jan 2007 09:53:13 -0800

Hi Stephane,

I wrote an init.d script calling hook scripts to load several virtual databases
during start/stop of controller. So the restart case works as follows:


stop
=====
${SEQUOIA_HOME}/bin/console.sh -e -p $JMXPORT -f $stopscript -l
cat stopscript:
 admin @DB_NAME@
 @SEQ_ADMIN_USER@
 @SEQ_ADMIN_PASS@
 disable @VDB_BACKEND1@
 quit
 shutdown virtualdatabase @DB_NAME@ 2
 @SEQ_ADMIN_USER@
 @SEQ_ADMIN_PASS@
 quit

[ some wait for the stopscript finishing stuff ]

PID=`cat ${SEQUOIA_PID} 2>/dev/null`
[ -n "$PID" ] && kill $PID 2>/dev/null

start
=====
$SU_BIN -c "${SEQUOIA_HOME}/bin/controller.sh &" >>${SEQUOIA_LOG} 2>&1

${SEQUOIA_HOME}/bin/console.sh -e -p $JMXPORT -f $startscript -l
cat startscript:
 load virtualdatabase configuration @VDB_CONF@
 admin @DB_NAME@
 @SEQ_ADMIN_USER@
 @SEQ_ADMIN_PASS@
 enable @VDB_BACKEND1@
 quit
 quit

I have to play a little bit to get a "clean" log for you. This will be delayed
to wednesday. Tomorrow is my traditional no sequoia day ;-).

Thanx,
)ngo

Stephane GIRON wrote:
> Hi Ingo,
> 
> Could you please describe the way you are doing the reboot? I mean from
> Sequoia point of view, which commands are you using when stopping the
> vdb / controller? Could you also provide controller logs that show the
> issue (both controllers)?
> I will try to investigate this. If you can provide more information, it
> will be very welcome.
> 
> Thanks a lot,
> 
> Stephane
> 
> Ingo Kampe a écrit :
>> Hi,
>>
>> I have seen several times a problem through write request on 1 side of
>> my db
>> cluster during reboot of the other. The result was that in some cases
>> sequoia
>> comes up joins the group and start without any error detected but the
>> data in
>> the backends isn't the same. The write was done on the 2nd machine but
>> is not
>> recovered at the 1st rebooting machine. I tried hard to find any
>> specific point
>> to get this behaviour deterministic reproducable but yet without
>> success. This
>> happens yet in around 10% of reboots. The main problem for us is that
>> sequoia
>> starts up without problem. If it would set the backends to disabled
>> state we
>> could do a manual resync.
>>
>> May be you can give me some hints? Any to early process kill or
>> communication
>> race condition? ...
>>
>> My general environment:
>> debian etch, java 1.5, 1 backend postgresql 8.1.4, sequoia 2.10.4,
>> appia 3.2.4,
>> hedera 1.5.6-cvs03.01.2007, "base view" setup of appia
>> Raidb-1 setup with 2 machines, each 1 controller with 1 single backend
>>
>>
>> Some strange log entries from the rebooting machine:
>>
>> 2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
>> negative
>> suspendedWrites in AbstractScheduler.resumeWrites()
>> 2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
>> negative
>> suspendedTransactions in AbstractScheduler.resumeNewTransactions()
>> 2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
>> negative
>> suspendedPersistentConnections in
>> AbstractScheduler.resumeNewPersistentConnections()
>>
>> or
>>
>> 2007-01-12 15:43:48,486 ERROR sequoia.controller.loadbalancer Request
>> was not
>> found in total order queue, posting out of order (UPDATE
>> bot_event_cfg_table SET
>> event_name='X509_OCSP_RESPONDER_UNREACHABLE',event_priority='P2',event_is_active='1'
>>
>> where bot_event_cfg_id='X509_OCSP_RESPONDER_UNREACHABLE'/)
>>
>> Where could these entries come from?
>>
>>
>> Just FYI:
>> Sometimes I get the next one which I can reproduce with iptables
>> closing the
>> appia port for a moment. After this sequoia correctly set the backend in
>> disabled state and it will be restored:
>> 2007-01-12 15:45:09,758 ERROR controller.recoverylog.RecoverThread
>> Unable to get
>> checkpoint from recovery log.
>> java.sql.SQLException: Unable to get checkpoint disable
>> botdb1_00304874867C-10.10.10.1:25322-20070112154347718+0100 from the
>> recovery
>> log (Checkpoint disable
>> botdb1_00304874867C-10.10.10.1:25322-20070112154347718+0100 does not
>> exist in
>> recovery log)
>>         at
>> org.continuent.sequoia.controller.recoverylog.events.GetCheckpointLogIdEvent.execute(GetCheckpointLogIdEvent.java:96)
>>
>>         at
>> org.continuent.sequoia.controller.recoverylog.LoggerThread.run(LoggerThread.java:732)
>>
>>
>>
>> Thanx and Greetz,
>> )ngo
>>  
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Sequoia mailing list
>> [email protected]
>> https://forge.continuent.org/mailman/listinfo/sequoia
>>   
> 
> 
> _______________________________________________
> Sequoia mailing list
> [email protected]
> https://forge.continuent.org/mailman/listinfo/sequoia
> 
> 
> 
> +----------------------------------------------------------------------+
> | Z1 SecureMail Gateway Info - http://www.zertificon.com               |
> +----------------------------------------------------------------------+
> | - Die Nachricht war weder verschluesselt noch digital unterschrieben |
> +----------------------------------------------------------------------+

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Re: [Sequoia] Backend inconsistency after reboot (restart of controller)

Reply via email to