Re: TNonblockingServer is dying with message "Totally Fucked"

David Reiss Wed, 20 Jan 2010 18:08:59 -0800

My best guess is that you are not using a consistent version of  
libevent when compiling, linking, and running.



--David (mobile)

On Jan 20, 2010, at 5:28 PM, "Nathan Marz" <[email protected]>  
wrote:

> OK... jumped into gdb and here's what I found:
>
> (gdb) s
> 483      event_set(&event_, socket_, eventFlags_,  
> TConnection::eventHandler,
> this);
> (gdb) p appState_
> $8 = apache::thrift::server::APP_INIT
> (gdb) s
> 484      event_base_set(server_->getEventBase(), &event_);
> (gdb) p appState_
> $9 = 128
> (gdb) s
> 487      if (event_add(&event_, 0) == -1) {
> (gdb) p appState_
> $10 = 128
> (gdb) s
> 490    }
> (gdb) p appState_
> $11 = 130
>
> It appears to be getting corrupted twice, once during  
> "event_base_set" and
> once during "event_add". Any ideas?
>
>
>
> On Wed, Jan 20, 2010 at 4:03 PM, David Reiss <[email protected]>  
> wrote:
>
>> So you're saying that this happens on the first received message?
>> Should be relatively easy to debug.
>>
>> 1/ Make a debug build of Thrift and Scribe.
>> 2/ Put a breakpoint in the constructor of of TConnection.
>> 3/ When the breakpoint hits, get the address of the appState_.
>> 4/ Put a watchpoint on the contents of that address.  If possible,
>>  make it conditional on the new value not being one of the valid
>>  enum values.
>> 5/ Continue.
>> 6/ When the watchpoint triggers (and is not a valid enum), do a  
>> backtrace
>>  to find out how it was corrupted.  Usually it is a memory error.
>>
>> If it is a memory error, it might be more efficient to just run it  
>> under
>> valgrind.
>>
>> --David
>>
>> Nathan Marz wrote:
>>> Could use some help on this one. I'm running into this error when  
>>> using
>>> scribe, and I traced back the error to TNonBlocking Server. Here's  
>>> the
>> tail
>>> of the log:
>>>
>>> Thrift: Wed Jan 20 23:11:06 2010 libevent 1.3e method epoll
>>> Thrift: Wed Jan 20 23:14:08 2010 Totally Fucked. Application State  
>>> 130
>>> scribed: src/server/TNonblockingServer.cpp:430: void
>>> apache::thrift::server::TConnection::transition(): Assertion `0'  
>>> failed.
>>>
>>> In the code, this message is printed whenever a switch statement  
>>> doesn't
>>> match any of the cases.
>>>
>>> I have scribe set up to have a "master" log server which  
>>> aggregates all
>>> logs, and the "client" servers simply forward messages to the  
>>> master.
>>> The clients work fine, it's the master that is crashing whenever it
>> receives
>>> a message. In case it's helpful, here's my scribe confs for
>> master/client:
>>>
>>> master:
>>>
>>> port=1464
>>>
>>>
>>> <store>
>>> category=default
>>> type=file
>>> rotate_period=hourly
>>> add_newlines=1
>>> create_symlink=yes
>>> file_path=/vol/scribe
>>> base_filename=thisisoverwritten
>>> fs_type=std
>>> </store>
>>>
>>> client:
>>>
>>> port=1464
>>>
>>>
>>> <store>
>>> category=default
>>> type=buffer
>>>
>>> target_write_size=20480
>>> max_write_interval=1
>>> buffer_send_rate=1
>>> retry_interval=120
>>> retry_interval_range=60
>>>
>>> <primary>
>>> type=network
>>> remote_host=XXX
>>> remote_port=1464
>>> </primary>
>>>
>>> <secondary>
>>> type=file
>>> fs_type=std
>>> file_path=/mnt/scribe
>>> base_filename=thisisoverwritten
>>> max_size=300000000
>>> </secondary>
>>> </store>
>>>
>>>
>>>
>>>
>>
>
>
>
> -- 
> Nathan Marz
> Twitter: @nathanmarz
> http://nathanmarz.com

Re: TNonblockingServer is dying with message "Totally Fucked"

Reply via email to