On Nov 1, 2013, at 12:10 AM, Dave Cottlehuber <[email protected]> wrote:
>> On Oct 31, 2013, at 5:13 PM, Nathan Vander Wilt > >> wrote: >> >> Aaaand my Couch commited suicide again today. Unless this is >> something different, I may have finally gotten lucky and had >> CouchDB leave a note [eerily unfinished!] in the logs this time: >> https://gist.github.com/natevw/fd509978516499ba128b >> >> ``` >> ** Reason == {badarg, >> [{io,put_chars, >> [<0.93.0>,unicode, >> <<"[Thu, 31 Oct 2013 19:48:48 GMT] [info] [<0.31789.2>] 66.249.66.216 >> - - GET >> /public/_design/glob/_list/posts/by_path?key=%5B%222012%22%2C%2203%22%2C%22metakaolin_geojson_editor%22%5D&include_docs=true&path1=2012&path2=03&path3=metakaolin_geojson_editor >> >> 200\n">>], >> []}, >> ``` >> >> So…now what? I have a rebuilt version of CouchDB I'm going to try >> [once I figure out why *it* isn't starting] but this is still really >> upsetting — I'm aware I could add my own cronjob or something to >> check and restart if needed every minute, but a) the shell script >> is SUPPOSED to be keeping CouchDB and b) it's NOT and c) this is >> embarrassing and aggravating. >> >> thanks, >> -natevw > > So there’s 2 things here > > - why the couch doesn’t get restarted? > > Sounds very much like the afore mentioned pid race condition. Wendall do you > know any more about this? I thought you had some ideas about it IIRC. > I think I figured out the answer to this one, at least in the latest crash. The Erlang process the shell script watches was still running, just not accepting connections. I didn't notice this the previous times, though…I only realized it this time because when I went to restart the shell script acted like it was already running. So maybe there's actually two crashes, one silent heartbeat one and this unicode? > - why io:putchars/2 has trouble writing to a boring log file, which obviously > works most of the time. > > <0.93.0>,unicode, <<"[Thu, 31 Oct 2013 19:48:48 GMT...”>> > > io:put_chars(Fd, unicode, <<Binary>>) doesn’t look right — there’s no > io:put_chars/3. > > This unicode looks weird and from a quick look I can’t see where it should > come from. > > Can you get more of the logfile (like hundreds of lines) and stick it > somewhere? email is fine. > > I’d like to see what happens to <0.93.0> (the process wrapping the log fd), > and also if the unicode atom turns up anywhere else prior. You want more of the log *up to* the crash? Because I have nothing *beyond* what is in that gist, that's the thing! The end of the log was cut off, I did not snip it. The log as it sits now has these exact lines in it: ``` {line,173}]}, {gen_event,ser Apache CouchDB 1.4.0 (LogLevel=info) is starting. ``` (The subsequent "starting" is due to my intervention.) -nvw
