Hi Stephane, I wrote an init.d script calling hook scripts to load several virtual databases during start/stop of controller. So the restart case works as follows:
stop
=====
${SEQUOIA_HOME}/bin/console.sh -e -p $JMXPORT -f $stopscript -l
cat stopscript:
admin @DB_NAME@
@SEQ_ADMIN_USER@
@SEQ_ADMIN_PASS@
disable @VDB_BACKEND1@
quit
shutdown virtualdatabase @DB_NAME@ 2
@SEQ_ADMIN_USER@
@SEQ_ADMIN_PASS@
quit
[ some wait for the stopscript finishing stuff ]
PID=`cat ${SEQUOIA_PID} 2>/dev/null`
[ -n "$PID" ] && kill $PID 2>/dev/null
start
=====
$SU_BIN -c "${SEQUOIA_HOME}/bin/controller.sh &" >>${SEQUOIA_LOG} 2>&1
${SEQUOIA_HOME}/bin/console.sh -e -p $JMXPORT -f $startscript -l
cat startscript:
load virtualdatabase configuration @VDB_CONF@
admin @DB_NAME@
@SEQ_ADMIN_USER@
@SEQ_ADMIN_PASS@
enable @VDB_BACKEND1@
quit
quit
I have to play a little bit to get a "clean" log for you. This will be delayed
to wednesday. Tomorrow is my traditional no sequoia day ;-).
Thanx,
)ngo
Stephane GIRON wrote:
> Hi Ingo,
>
> Could you please describe the way you are doing the reboot? I mean from
> Sequoia point of view, which commands are you using when stopping the
> vdb / controller? Could you also provide controller logs that show the
> issue (both controllers)?
> I will try to investigate this. If you can provide more information, it
> will be very welcome.
>
> Thanks a lot,
>
> Stephane
>
> Ingo Kampe a écrit :
>> Hi,
>>
>> I have seen several times a problem through write request on 1 side of
>> my db
>> cluster during reboot of the other. The result was that in some cases
>> sequoia
>> comes up joins the group and start without any error detected but the
>> data in
>> the backends isn't the same. The write was done on the 2nd machine but
>> is not
>> recovered at the 1st rebooting machine. I tried hard to find any
>> specific point
>> to get this behaviour deterministic reproducable but yet without
>> success. This
>> happens yet in around 10% of reboots. The main problem for us is that
>> sequoia
>> starts up without problem. If it would set the backends to disabled
>> state we
>> could do a manual resync.
>>
>> May be you can give me some hints? Any to early process kill or
>> communication
>> race condition? ...
>>
>> My general environment:
>> debian etch, java 1.5, 1 backend postgresql 8.1.4, sequoia 2.10.4,
>> appia 3.2.4,
>> hedera 1.5.6-cvs03.01.2007, "base view" setup of appia
>> Raidb-1 setup with 2 machines, each 1 controller with 1 single backend
>>
>>
>> Some strange log entries from the rebooting machine:
>>
>> 2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
>> negative
>> suspendedWrites in AbstractScheduler.resumeWrites()
>> 2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
>> negative
>> suspendedTransactions in AbstractScheduler.resumeNewTransactions()
>> 2007-01-12 15:43:20,374 ERROR sequoia.controller.scheduler Unexpected
>> negative
>> suspendedPersistentConnections in
>> AbstractScheduler.resumeNewPersistentConnections()
>>
>> or
>>
>> 2007-01-12 15:43:48,486 ERROR sequoia.controller.loadbalancer Request
>> was not
>> found in total order queue, posting out of order (UPDATE
>> bot_event_cfg_table SET
>> event_name='X509_OCSP_RESPONDER_UNREACHABLE',event_priority='P2',event_is_active='1'
>>
>> where bot_event_cfg_id='X509_OCSP_RESPONDER_UNREACHABLE'/)
>>
>> Where could these entries come from?
>>
>>
>> Just FYI:
>> Sometimes I get the next one which I can reproduce with iptables
>> closing the
>> appia port for a moment. After this sequoia correctly set the backend in
>> disabled state and it will be restored:
>> 2007-01-12 15:45:09,758 ERROR controller.recoverylog.RecoverThread
>> Unable to get
>> checkpoint from recovery log.
>> java.sql.SQLException: Unable to get checkpoint disable
>> botdb1_00304874867C-10.10.10.1:25322-20070112154347718+0100 from the
>> recovery
>> log (Checkpoint disable
>> botdb1_00304874867C-10.10.10.1:25322-20070112154347718+0100 does not
>> exist in
>> recovery log)
>> at
>> org.continuent.sequoia.controller.recoverylog.events.GetCheckpointLogIdEvent.execute(GetCheckpointLogIdEvent.java:96)
>>
>> at
>> org.continuent.sequoia.controller.recoverylog.LoggerThread.run(LoggerThread.java:732)
>>
>>
>>
>> Thanx and Greetz,
>> )ngo
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Sequoia mailing list
>> [email protected]
>> https://forge.continuent.org/mailman/listinfo/sequoia
>>
>
>
> _______________________________________________
> Sequoia mailing list
> [email protected]
> https://forge.continuent.org/mailman/listinfo/sequoia
>
>
>
> +----------------------------------------------------------------------+
> | Z1 SecureMail Gateway Info - http://www.zertificon.com |
> +----------------------------------------------------------------------+
> | - Die Nachricht war weder verschluesselt noch digital unterschrieben |
> +----------------------------------------------------------------------+
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Sequoia mailing list [email protected] https://forge.continuent.org/mailman/listinfo/sequoia
