Re: Random crash when sending USR2 + QUIT signals to Unicorn process
Jeremy Evanswrote: > On 07/24 01:25, Eric Wong wrote: > > Jeremy Evans wrote: > > > Running with GC.stress didn't catch the error for me. But I'm using a > > > fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be > > > something that only shows up on a newer compiler that does more > > > optimizations. > > > > Pere: just curious if you've had a chance to test my patch for > > sequel_pg from Jeremy's latest sequel_pg.git > > > > In any case, I'm certain my patch fixes a bug which manifests > > in a compiler-dependent manner; but here could always be other > > bugs in a similar vein. Thanks. > > I can't get it to crash with sequel_pg 1.7.0 when compiled using clang > 4.0.0 either. I even tried to build a special program designed to > trigger the crash. >From anecdotes on ruby-core, clang still seems less aggressive at optimizations than modern gcc. Fwiw, a few GC bugs in Ruby trunk got fixed recently and the fixes should be in 2.4.2 (soon): https://public-inbox.org/ruby-core/?q=T_NONE+d%3A20161225..20170808 Not identical to T_NODE which Pere got, but if it's a GC bug, but both T_NONE and T_NODE triggers are symptoms of GC bugs. -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
On 07/24 01:25, Eric Wong wrote: > Jeremy Evanswrote: > > Running with GC.stress didn't catch the error for me. But I'm using a > > fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be > > something that only shows up on a newer compiler that does more > > optimizations. > > Pere: just curious if you've had a chance to test my patch for > sequel_pg from Jeremy's latest sequel_pg.git > > In any case, I'm certain my patch fixes a bug which manifests > in a compiler-dependent manner; but here could always be other > bugs in a similar vein. Thanks. I can't get it to crash with sequel_pg 1.7.0 when compiled using clang 4.0.0 either. I even tried to build a special program designed to trigger the crash. Compiler used: $ cc -v OpenBSD clang version 4.0.0 (tags/RELEASE_400/final) (based on LLVM 4.0.0) Target: amd64-unknown-openbsd6.1 Thread model: posix InstalledDir: /usr/bin Program used: require 'sequel' DB = Sequel.postgres(:test=>false) DB.extension :pg_array # pg_array.txt contains ([[1] * 100] * 100) in PostgreSQL array format t = File.read('pg_array.txt') dot = '.' trap(:HUP){} Thread.new do while true sleep 1 Process.kill(:HUP, $$) end end GC.stress = true (0..2).map do Thread.new do i = 0 pr = lambda{|v| print dot if ((i+=1) % 100) == 0; "#{v}#{v}"} while true print 'L' Sequel::Postgres.parse_pg_array(t.dup, pr) end end end.map(&:join) Pere, can you test this program and see if it crashes in your environment? If not, can you put together a reproducible example that does crash in your environment? Thanks, Jeremy -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
Jeremy Evanswrote: > Running with GC.stress didn't catch the error for me. But I'm using a > fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be > something that only shows up on a newer compiler that does more > optimizations. Pere: just curious if you've had a chance to test my patch for sequel_pg from Jeremy's latest sequel_pg.git In any case, I'm certain my patch fixes a bug which manifests in a compiler-dependent manner; but here could always be other bugs in a similar vein. Thanks. -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
On 07/15 12:56, Jeremy Evans wrote: > On 07/15 04:45, Eric Wong wrote: > > Jeremy Evanswrote: > > > All of Sequel's postgres adapter tests still pass with this, so I merged > > > this into the master branch. I'll do some more testing of my apps, but > > > unless I run into problems I plan to release this as sequel_pg 1.7.1 > > > early next week. > > > > Thanks for the update. Btw, did you get a chance to test with > > GC.stress? It's not 100% reliable (and it is slow), but > > probably could've caught problems like this one. > > I hadn't tested with GC.stress before. You weren't kidding about it being > slow. I'll let it run overnight with the previous code (without your > patch), to see if this is something it would have caught. Running with GC.stress didn't catch the error for me. But I'm using a fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be something that only shows up on a newer compiler that does more optimizations. Thanks, Jeremy -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
On 07/15 04:45, Eric Wong wrote: > Jeremy Evanswrote: > > All of Sequel's postgres adapter tests still pass with this, so I merged > > this into the master branch. I'll do some more testing of my apps, but > > unless I run into problems I plan to release this as sequel_pg 1.7.1 > > early next week. > > Thanks for the update. Btw, did you get a chance to test with > GC.stress? It's not 100% reliable (and it is slow), but > probably could've caught problems like this one. I hadn't tested with GC.stress before. You weren't kidding about it being slow. I'll let it run overnight with the previous code (without your patch), to see if this is something it would have caught. Thanks, Jeremy -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
Jeremy Evanswrote: > All of Sequel's postgres adapter tests still pass with this, so I merged > this into the master branch. I'll do some more testing of my apps, but > unless I run into problems I plan to release this as sequel_pg 1.7.1 > early next week. Thanks for the update. Btw, did you get a chance to test with GC.stress? It's not 100% reliable (and it is slow), but probably could've caught problems like this one. -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
On 07/15 12:15, Eric Wong wrote: > Jeremy Evanswrote: > > Thanks for this patch. I'm not an RB_GC_GUARD expert, but the changes > > look fine to me. The existing RB_GC_GUARD calls were added by me in > > 2012 to fix an earlier segfault.[1] This is the first reported > > RB_GC_GUARD-related segfault in sequel_pg since then. > > No worries; I don't consider myself a RB_GC_GUARD expert, either(*). > > > [1] > > https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff > > I suspect your original guards were lucky enough for C compilers > in 2012, but compilers have gotten more clever since then. So > there's a a higher likelyhood of exposing bugs given the > conservative GC in Ruby(**). > > Historical note: > > Back in the day, "volatile" alone was enough to defeat > compiler optimizations in C Ruby. Eventually, compilers got > better, so RB_GC_GUARD was introduced. And in the future, > RB_GC_GUARD may evolve to accomodate even more clever > compilers. > > > Pere, I would appreciate if you could test this patch and see if it > > fixes your issue. I will also test it and will release a new sequel_pg > > version with this patch if it fixes the issue. > > Yes, actually testing the code is important, everything else > I've written here is theory ;) All of Sequel's postgres adapter tests still pass with this, so I merged this into the master branch. I'll do some more testing of my apps, but unless I run into problems I plan to release this as sequel_pg 1.7.1 early next week. Thanks, Jeremy -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
Jeremy Evanswrote: > Thanks for this patch. I'm not an RB_GC_GUARD expert, but the changes > look fine to me. The existing RB_GC_GUARD calls were added by me in > 2012 to fix an earlier segfault.[1] This is the first reported > RB_GC_GUARD-related segfault in sequel_pg since then. No worries; I don't consider myself a RB_GC_GUARD expert, either(*). > [1] > https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff I suspect your original guards were lucky enough for C compilers in 2012, but compilers have gotten more clever since then. So there's a a higher likelyhood of exposing bugs given the conservative GC in Ruby(**). Historical note: Back in the day, "volatile" alone was enough to defeat compiler optimizations in C Ruby. Eventually, compilers got better, so RB_GC_GUARD was introduced. And in the future, RB_GC_GUARD may evolve to accomodate even more clever compilers. > Pere, I would appreciate if you could test this patch and see if it > fixes your issue. I will also test it and will release a new sequel_pg > version with this patch if it fixes the issue. Yes, actually testing the code is important, everything else I've written here is theory ;) (*) Fwiw, I am not fluent in reading asm for systems I run Ruby on, but I know how clever compilers can be, and have written about it some on ruby-core: https://public-inbox.org/ruby-core/?q=RB_GC_GUARD:..20170714 (**) similar story with lock-free multi-threading in other projects -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
On 07/14 09:16, Eric Wong wrote: > Pere Joan Martorellwrote: > > I suspect that the conflicting gem was 'sequel_pg' (sequel_pg > > overwrites the inner loop of the Sequel postgres adapter row fetching > > code with a C version. The C version is significantly faster than the > > pure ruby version that Sequel uses by default), but given I didn't > > remove these gems one by one I can't completely ensure that. > > > > If the problem reemerges I'll keep you informed. > > > > Thanks!! :) > > Thanks for the info. I've added Jeremy Evans, the author of > sequel_pg to the Cc: even though I think he reads this list... > > Anyways, I think I've spotted one potential bug in sequel_pg > w.r.t. RB_GC_GUARD usage, and the fix is below: > > git clone https://github.com/jeremyevans/sequel_pg && cd sequel_pg > curl https://80x24.org/spew/20170714210918.3332-...@80x24.org/raw | git am > > (more in-depth explanation is in the commit message) > > Pere: perhaps you can give it a shot > > Keep in mind I've only compile-tested this. I didn't find > automated tests in the code and I don't have a usable Postgres > instance, at the moment. Eric, Thanks for this patch. I'm not an RB_GC_GUARD expert, but the changes look fine to me. The existing RB_GC_GUARD calls were added by me in 2012 to fix an earlier segfault.[1] This is the first reported RB_GC_GUARD-related segfault in sequel_pg since then. Pere, I would appreciate if you could test this patch and see if it fixes your issue. I will also test it and will release a new sequel_pg version with this patch if it fixes the issue. Thanks, Jeremy [1] https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
Pere Joan Martorellwrote: > I suspect that the conflicting gem was 'sequel_pg' (sequel_pg > overwrites the inner loop of the Sequel postgres adapter row fetching > code with a C version. The C version is significantly faster than the > pure ruby version that Sequel uses by default), but given I didn't > remove these gems one by one I can't completely ensure that. > > If the problem reemerges I'll keep you informed. > > Thanks!! :) Thanks for the info. I've added Jeremy Evans, the author of sequel_pg to the Cc: even though I think he reads this list... Anyways, I think I've spotted one potential bug in sequel_pg w.r.t. RB_GC_GUARD usage, and the fix is below: git clone https://github.com/jeremyevans/sequel_pg && cd sequel_pg curl https://80x24.org/spew/20170714210918.3332-...@80x24.org/raw | git am (more in-depth explanation is in the commit message) Pere: perhaps you can give it a shot Keep in mind I've only compile-tested this. I didn't find automated tests in the code and I don't have a usable Postgres instance, at the moment. -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/
Re: Random crash when sending USR2 + QUIT signals to Unicorn process
+Cc: Philip and Jonathan since they encountered this three years ago, but we never heard back from them: https://bogomips.org/unicorn-public/?q=T_NODE+d:..20170713 Pere Joan Martorellwrote: > > /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_request.rb:80:in > > `parse': method `hash' called on unexpected T_NODE object > > (0x0055b15b973508 flags=0xaa31b) (NotImplementedError) > Any idea what is happening? This is most likely a bug in a C extension not using write barriers correctly (perhaps via undocumented C-API functions in Ruby). I don't think I've seen this on ruby-core bug reports in a few years: https://public-inbox.org/ruby-core/?q=T_NODE Fwiw, Appendix D on Generational GC in the Ruby source is worth reading to any C extension authors: https://80x24.org/mirrors/ruby.git/plain/doc/extension.rdoc There are probably build warnings when using some dangerous methods/macros, maybe you can check build logs for const warnings. Can you share the list of RubyGems you have loaded and maybe try upgrading/replacing/eliminating the ones with C extensions one-by-one until the error stops? Also, perhaps the output of "pmap $PID_OF_WORKER" if you're on Linux (or equivalent command if you're on another OS). Anyways, I didn't notice anything suspicious in your config. I'll do another self-audit of the unicorn + kgio + raindrops extensions, too, but judging from the lack of reports and how much they get used; I suspect the bug is elsewhere (more eyes welcome, of course). Thanks for the report and any more info you can provide! -- unsubscribe: unicorn-public+unsubscr...@bogomips.org archive: https://bogomips.org/unicorn-public/