[My mailbox shows this as being sent yesterday (and resent today), but it 
doesn't show up in the archives, etc, so I'm resending again from the same 
address I sent the original message. Apologies for any duplication folks 
receive.]

Thanks, atz.

You are, of course, right on all counts.

I am connecting using 'RAW' socket connection to a SIP server running against 
SVN rel_1_6_0 (~ 1.6.0.2 vintage), on a replicated instance of Evergreen. That 
was my initial thought as to the cause of the problem, but it looked to me like 
all the DB calls were authoritative.

Correction: the code says they should be authoritative, but somewhere along the 
way they get mis-directed and end up hitting the slave DB to which the data 
hasn't replicated yet.

Specifically, the ACS was dying while trying to generate the content for 
message 64, the patron information response. More specifically, 
$e->retrieve_action_circulation($circ_id) in __circ_to_barcode() in 
OpenILS::SIP::Patron (around line 442) returns undef (because the circ hasn't 
replicated to the slave yet), and the ACS doesn't know how to handle that.

That shouldn't be a problem, because, by definition, $e, created by 
editor()/make_editor() in SIP.pm (which is just a singleton method for 
OpenILS::Utils::CStoreEditor) is created with xact => 1, which should wrap all 
these selects in a transaction and force them to the primary database server.

Here's an example call that returned null.

stanns SIPServer.pm: [INFO:19071:CStoreEditor.pm:104:] editor[1|0] request 
en-US open-ils.cstore.direct.action.circulation.retrieve "9704055"

>From reading OpenILS::Utils::CStoreEditor->log(), we can tell that the command 
>knows that it should be transactional (the '1' in the first position after 
>editor indicates that the call is xact => 1; 0 indicates not; and ! indicates 
>that always_xact is set in CStoreEditor.pm). But the command goes to the 
>secondary anyway.

Any thoughts now?

Brandon

Quoting Joe Atzberger <[email protected]>:

>I'm guessing you are using the "RAW" socket connection and not telnet.
>
>I'm a bit confused.  These are your client's logs.  Your should
>reference the logs from your SIP2 process on the server.  The server
>side is the one that seems to be killing the connection, so there is
>probably a fatal error indicated in the logs there.
>
>I think you might have misstated your reproduction of problem.  That's
>because the SC client's valid requests are all ODD numbers.  The
>client can never send a 64 or 12.  Those are for Patron Information
>*RESPONSE* and and Checkout *RESPONSE*, not requests.  The server
>would be justified in dropping a client that sent it 64 or 12, but
>(from your log excerpt) it does not look like that is actually
>happening.
>
>So the drop would be from a 63 sent soon after an 11.  It does not
>look like there are any diacritics or encoding issues in the original
>patron info (that have historically been a problem).
>
>Since timing is apparently important, please identify whether your EG
>is operating in a DB-replicated (i.e. slony) environment or not, and
>don't forget to mention your version of Evergreen!
>
>--Joe Atzberger
>just another SIP hacker
>
>On Tue, Jun 8, 2010 at 6:28 PM, Uhlman, Brandon EDUC:EX
><[email protected]> wrote:
>>Hi, all.
>>
>>I find myself dealing with a particularly perplexing problem.
>>
>>One of our SIP2 vendors is experiencing a problem when connecting to our
>>Evergreen system where the SIP client detects a disconnection from the
>>server, and ends the transaction without warning, spitting out the
>>receipt almost right away.
>>
>>We were able to replicate the disconnection by sending a message 64
>>(patron information request) immediately after sending a message 12, as
>>the particular self-check client does, with no discernable pause. A
>>natural human pause of about a second causes the error not to be issued,
>>which is why we were never able to reproduce the problem in testing
>>before seeing it in the logging.
>>
>>02:56:58 PM  Open(): Opening connection to xxx.yyy.zzz.aa on port abcd
>>02:57:11 PM  WritePacket(): 9300CNusername|COpassword|AY0AZF6B2<0D>
>>02:57:11 PM  ReadPacket(): 941AY0AZFDFD<0D>
>>02:57:19 PM  WritePacket(): 9909992.00AY1AZFC8D<0D>
>>02:57:19 PM  ReadPacket(): 98YYYYNN60000320100608
>>1457342.00AOBPR|BXYYYYYYYYYNYNNNYN|AY1AZEE98<0D>
>>02:57:32 PM  WritePacket(): 63   20100608    145723
>>AOBPR|AA25180123456789|AY2AZF397<0D>
>>02:57:32 PM  ReadPacket(): 64  Y           20100608
>>145747000000020002000000000000AA25180123456789|AEBrandon Wade
>>Uhlman|BHUSD|BV10.00|BD My Home Address Lillooet  BC CA V0K
>>1V0|[email protected]|AQBLP|BLY|PB1981-08-29|PCPL Circ +Full
>>Cat|PIUnfiltered|AFOK|AOBPR|AY2AZB426<0D>
>>02:58:10 PM  WritePacket(): 11NN20100608    14574720100622
>>145747AO|AA25180000031394|AB33294001302611|AC|AY4AZED73<0D>
>>02:58:12 PM  ReadPacket(): 121NYN20100608
>>145827AOBPR|AA25180000031394|AB33294001302611|AJThe Muppet Christmas
>>carol|AH2010-06-15 23:59:59|CK005|AY4AZDF2A<0D>
>>02:58:12 PM  WritePacket(): 63   20100608    145812
>>AOBPR|AA25180123456789|AY5AZF395<0D>
>>02:58:13 PM  ReadPacket(): No response
>>02:58:13 PM  WritePacket(): 63   20100608    145812
>>AOBPR|AA25180123456789|AY5AZF395<0D>
>>02:58:13 PM  Read(): Read error 64, The specified network name is no
>>longer available
>>02:58:13 PM  ReadPacket(): No response
>>02:58:13 PM  Read(): Read error 64, The specified network name is no
>>longer available
>>02:58:13 PM  Write(): Write error 64, The specified network name is no
>>longer available
>>02:58:13 PM  Read(): Read error 64, The specified network name is no
>>longer available
>>02:58:13 PM  Write(): Write error 64, The specified network name is no
>>longer available
>>02:58:13 PM  TSIPCommand::Execute(): Command failed
>>02:58:13 PM  TSIPCommand::Execute(): Communication failure, closing
>>network connection...
>>02:58:13 PM  Close(): Closing connection...
>>02:58:13 PM  SetStatus(): New session status is Busy
>>02:58:13 PM  Socket: Connection dropped
>>02:58:13 PM  SetStatus(): New session status is Inactive
>>
>>I tried the naive solution of forcing the patron information request
>>(message 64) to always pause before executing using sleep, select(undef,
>>undef, undef, time) and Time::HiRes->usleep(); sleep caused a disconnect
>>always, and usleep had no effect.
>>
>>I am uncertain what next steps to try. Anybody out in the big world with
>>any suggestions?
>>
>>Brandon

Reply via email to