Sweet, you were right! libevent 1.4 was already installed on the machine when I installed libevent 1.3e prior to installing thrift/scribe. Removing libevent 1.3e and re-installing thrift/scribe from scratch fixed my issue. Thanks David.
On Wed, Jan 20, 2010 at 6:07 PM, David Reiss <[email protected]> wrote: > My best guess is that you are not using a consistent version of > libevent when compiling, linking, and running. > > > --David (mobile) > > On Jan 20, 2010, at 5:28 PM, "Nathan Marz" <[email protected]> > wrote: > > > OK... jumped into gdb and here's what I found: > > > > (gdb) s > > 483 event_set(&event_, socket_, eventFlags_, > > TConnection::eventHandler, > > this); > > (gdb) p appState_ > > $8 = apache::thrift::server::APP_INIT > > (gdb) s > > 484 event_base_set(server_->getEventBase(), &event_); > > (gdb) p appState_ > > $9 = 128 > > (gdb) s > > 487 if (event_add(&event_, 0) == -1) { > > (gdb) p appState_ > > $10 = 128 > > (gdb) s > > 490 } > > (gdb) p appState_ > > $11 = 130 > > > > It appears to be getting corrupted twice, once during > > "event_base_set" and > > once during "event_add". Any ideas? > > > > > > > > On Wed, Jan 20, 2010 at 4:03 PM, David Reiss <[email protected]> > > wrote: > > > >> So you're saying that this happens on the first received message? > >> Should be relatively easy to debug. > >> > >> 1/ Make a debug build of Thrift and Scribe. > >> 2/ Put a breakpoint in the constructor of of TConnection. > >> 3/ When the breakpoint hits, get the address of the appState_. > >> 4/ Put a watchpoint on the contents of that address. If possible, > >> make it conditional on the new value not being one of the valid > >> enum values. > >> 5/ Continue. > >> 6/ When the watchpoint triggers (and is not a valid enum), do a > >> backtrace > >> to find out how it was corrupted. Usually it is a memory error. > >> > >> If it is a memory error, it might be more efficient to just run it > >> under > >> valgrind. > >> > >> --David > >> > >> Nathan Marz wrote: > >>> Could use some help on this one. I'm running into this error when > >>> using > >>> scribe, and I traced back the error to TNonBlocking Server. Here's > >>> the > >> tail > >>> of the log: > >>> > >>> Thrift: Wed Jan 20 23:11:06 2010 libevent 1.3e method epoll > >>> Thrift: Wed Jan 20 23:14:08 2010 Totally Fucked. Application State > >>> 130 > >>> scribed: src/server/TNonblockingServer.cpp:430: void > >>> apache::thrift::server::TConnection::transition(): Assertion `0' > >>> failed. > >>> > >>> In the code, this message is printed whenever a switch statement > >>> doesn't > >>> match any of the cases. > >>> > >>> I have scribe set up to have a "master" log server which > >>> aggregates all > >>> logs, and the "client" servers simply forward messages to the > >>> master. > >>> The clients work fine, it's the master that is crashing whenever it > >> receives > >>> a message. In case it's helpful, here's my scribe confs for > >> master/client: > >>> > >>> master: > >>> > >>> port=1464 > >>> > >>> > >>> <store> > >>> category=default > >>> type=file > >>> rotate_period=hourly > >>> add_newlines=1 > >>> create_symlink=yes > >>> file_path=/vol/scribe > >>> base_filename=thisisoverwritten > >>> fs_type=std > >>> </store> > >>> > >>> client: > >>> > >>> port=1464 > >>> > >>> > >>> <store> > >>> category=default > >>> type=buffer > >>> > >>> target_write_size=20480 > >>> max_write_interval=1 > >>> buffer_send_rate=1 > >>> retry_interval=120 > >>> retry_interval_range=60 > >>> > >>> <primary> > >>> type=network > >>> remote_host=XXX > >>> remote_port=1464 > >>> </primary> > >>> > >>> <secondary> > >>> type=file > >>> fs_type=std > >>> file_path=/mnt/scribe > >>> base_filename=thisisoverwritten > >>> max_size=300000000 > >>> </secondary> > >>> </store> > >>> > >>> > >>> > >>> > >> > > > > > > > > -- > > Nathan Marz > > Twitter: @nathanmarz > > http://nathanmarz.com > -- Nathan Marz Twitter: @nathanmarz http://nathanmarz.com
