Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-08-07 Thread Eric Wong
Jeremy Evans  wrote:
> On 07/24 01:25, Eric Wong wrote:
> > Jeremy Evans  wrote:
> > > Running with GC.stress didn't catch the error for me.  But I'm using a
> > > fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be
> > > something that only shows up on a newer compiler that does more
> > > optimizations.
> > 
> > Pere: just curious if you've had a chance to test my patch for
> > sequel_pg from Jeremy's latest sequel_pg.git
> > 
> > In any case, I'm certain my patch fixes a bug which manifests
> > in a compiler-dependent manner; but here could always be other
> > bugs in a similar vein.  Thanks.
> 
> I can't get it to crash with sequel_pg 1.7.0 when compiled using clang
> 4.0.0 either.  I even tried to build a special program designed to
> trigger the crash.

>From anecdotes on ruby-core, clang still seems less aggressive
at optimizations than modern gcc.

Fwiw, a few GC bugs in Ruby trunk got fixed recently and the
fixes should be in 2.4.2 (soon):
https://public-inbox.org/ruby-core/?q=T_NONE+d%3A20161225..20170808

Not identical to T_NODE which Pere got, but if it's a GC bug,
but both T_NONE and T_NODE triggers are symptoms of GC bugs.
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-08-07 Thread Jeremy Evans
On 07/24 01:25, Eric Wong wrote:
> Jeremy Evans  wrote:
> > Running with GC.stress didn't catch the error for me.  But I'm using a
> > fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be
> > something that only shows up on a newer compiler that does more
> > optimizations.
> 
> Pere: just curious if you've had a chance to test my patch for
> sequel_pg from Jeremy's latest sequel_pg.git
> 
> In any case, I'm certain my patch fixes a bug which manifests
> in a compiler-dependent manner; but here could always be other
> bugs in a similar vein.  Thanks.

I can't get it to crash with sequel_pg 1.7.0 when compiled using clang
4.0.0 either.  I even tried to build a special program designed to
trigger the crash.

Compiler used:

$ cc -v
OpenBSD clang version 4.0.0 (tags/RELEASE_400/final) (based on LLVM 4.0.0)
Target: amd64-unknown-openbsd6.1
Thread model: posix
InstalledDir: /usr/bin

Program used:

require 'sequel'
DB = Sequel.postgres(:test=>false)
DB.extension :pg_array
# pg_array.txt contains ([[1] * 100] * 100) in PostgreSQL array format
t = File.read('pg_array.txt')
dot = '.'
trap(:HUP){}
Thread.new do
  while true
sleep 1
Process.kill(:HUP, $$)
  end
end
GC.stress = true
(0..2).map do
  Thread.new do
i = 0
pr = lambda{|v| print dot if ((i+=1) % 100) == 0; "#{v}#{v}"}
while true
  print 'L'
  Sequel::Postgres.parse_pg_array(t.dup, pr)
end
  end
end.map(&:join)

Pere, can you test this program and see if it crashes in your
environment?  If not, can you put together a reproducible example that
does crash in your environment?

Thanks,
Jeremy
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-23 Thread Eric Wong
Jeremy Evans  wrote:
> Running with GC.stress didn't catch the error for me.  But I'm using a
> fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be
> something that only shows up on a newer compiler that does more
> optimizations.

Pere: just curious if you've had a chance to test my patch for
sequel_pg from Jeremy's latest sequel_pg.git

In any case, I'm certain my patch fixes a bug which manifests
in a compiler-dependent manner; but here could always be other
bugs in a similar vein.  Thanks.
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-17 Thread Jeremy Evans
On 07/15 12:56, Jeremy Evans wrote:
> On 07/15 04:45, Eric Wong wrote:
> > Jeremy Evans  wrote:
> > > All of Sequel's postgres adapter tests still pass with this, so I merged
> > > this into the master branch.  I'll do some more testing of my apps, but
> > > unless I run into problems I plan to release this as sequel_pg 1.7.1
> > > early next week.
> > 
> > Thanks for the update.  Btw, did you get a chance to test with
> > GC.stress?  It's not 100% reliable (and it is slow), but
> > probably could've caught problems like this one.
> 
> I hadn't tested with GC.stress before. You weren't kidding about it being
> slow.  I'll let it run overnight with the previous code (without your
> patch), to see if this is something it would have caught.

Running with GC.stress didn't catch the error for me.  But I'm using a
fairly old compiler (GCC 4.2.1, the OpenBSD default), so this may be
something that only shows up on a newer compiler that does more
optimizations.

Thanks,
Jeremy
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-15 Thread Jeremy Evans
On 07/15 04:45, Eric Wong wrote:
> Jeremy Evans  wrote:
> > All of Sequel's postgres adapter tests still pass with this, so I merged
> > this into the master branch.  I'll do some more testing of my apps, but
> > unless I run into problems I plan to release this as sequel_pg 1.7.1
> > early next week.
> 
> Thanks for the update.  Btw, did you get a chance to test with
> GC.stress?  It's not 100% reliable (and it is slow), but
> probably could've caught problems like this one.

I hadn't tested with GC.stress before. You weren't kidding about it being
slow.  I'll let it run overnight with the previous code (without your
patch), to see if this is something it would have caught.

Thanks,
Jeremy
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-14 Thread Eric Wong
Jeremy Evans  wrote:
> All of Sequel's postgres adapter tests still pass with this, so I merged
> this into the master branch.  I'll do some more testing of my apps, but
> unless I run into problems I plan to release this as sequel_pg 1.7.1
> early next week.

Thanks for the update.  Btw, did you get a chance to test with
GC.stress?  It's not 100% reliable (and it is slow), but
probably could've caught problems like this one.
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-14 Thread Jeremy Evans
On 07/15 12:15, Eric Wong wrote:
> Jeremy Evans  wrote:
> > Thanks for this patch.  I'm not an RB_GC_GUARD expert, but the changes
> > look fine to me. The existing RB_GC_GUARD calls were added by me in
> > 2012 to fix an earlier segfault.[1] This is the first reported
> > RB_GC_GUARD-related segfault in sequel_pg since then.
> 
> No worries; I don't consider myself a RB_GC_GUARD expert, either(*).
> 
> > [1] 
> > https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff
> 
> I suspect your original guards were lucky enough for C compilers
> in 2012, but compilers have gotten more clever since then.  So
> there's a a higher likelyhood of exposing bugs given the
> conservative GC in Ruby(**).
> 
> Historical note:
> 
>   Back in the day, "volatile" alone was enough to defeat
>   compiler optimizations in C Ruby.  Eventually, compilers got
>   better, so RB_GC_GUARD was introduced.  And in the future,
>   RB_GC_GUARD may evolve to accomodate even more clever
>   compilers.
> 
> > Pere, I would appreciate if you could test this patch and see if it
> > fixes your issue.  I will also test it and will release a new sequel_pg
> > version with this patch if it fixes the issue.
> 
> Yes, actually testing the code is important, everything else
> I've written here is theory ;)

All of Sequel's postgres adapter tests still pass with this, so I merged
this into the master branch.  I'll do some more testing of my apps, but
unless I run into problems I plan to release this as sequel_pg 1.7.1
early next week.

Thanks,
Jeremy
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-14 Thread Eric Wong
Jeremy Evans  wrote:
> Thanks for this patch.  I'm not an RB_GC_GUARD expert, but the changes
> look fine to me. The existing RB_GC_GUARD calls were added by me in
> 2012 to fix an earlier segfault.[1] This is the first reported
> RB_GC_GUARD-related segfault in sequel_pg since then.

No worries; I don't consider myself a RB_GC_GUARD expert, either(*).

> [1] 
> https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff

I suspect your original guards were lucky enough for C compilers
in 2012, but compilers have gotten more clever since then.  So
there's a a higher likelyhood of exposing bugs given the
conservative GC in Ruby(**).

Historical note:

  Back in the day, "volatile" alone was enough to defeat
  compiler optimizations in C Ruby.  Eventually, compilers got
  better, so RB_GC_GUARD was introduced.  And in the future,
  RB_GC_GUARD may evolve to accomodate even more clever
  compilers.

> Pere, I would appreciate if you could test this patch and see if it
> fixes your issue.  I will also test it and will release a new sequel_pg
> version with this patch if it fixes the issue.

Yes, actually testing the code is important, everything else
I've written here is theory ;)


(*) Fwiw, I am not fluent in reading asm for systems I run Ruby on,
but I know how clever compilers can be, and have written
about it some on ruby-core:
https://public-inbox.org/ruby-core/?q=RB_GC_GUARD:..20170714

(**) similar story with lock-free multi-threading in other projects
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-14 Thread Jeremy Evans
On 07/14 09:16, Eric Wong wrote:
> Pere Joan Martorell  wrote:
> > I suspect that the conflicting gem was 'sequel_pg' (sequel_pg
> > overwrites the inner loop of the Sequel postgres adapter row fetching
> > code with a C version. The C version is significantly faster than the
> > pure ruby version that Sequel uses by default), but given I didn't
> > remove these gems one by one I can't completely ensure that.
> > 
> > If the problem reemerges I'll keep you informed.
> > 
> > Thanks!! :)
> 
> Thanks for the info.  I've added Jeremy Evans, the author of
> sequel_pg to the Cc: even though I think he reads this list...
> 
> Anyways, I think I've spotted one potential bug in sequel_pg
> w.r.t. RB_GC_GUARD usage, and the fix is below:
> 
>   git clone https://github.com/jeremyevans/sequel_pg && cd sequel_pg
>   curl https://80x24.org/spew/20170714210918.3332-...@80x24.org/raw | git am
> 
> (more in-depth explanation is in the commit message)
> 
> Pere: perhaps you can give it a shot
> 
> Keep in mind I've only compile-tested this.  I didn't find
> automated tests in the code and I don't have a usable Postgres
> instance, at the moment.

Eric,

Thanks for this patch.  I'm not an RB_GC_GUARD expert, but the changes
look fine to me. The existing RB_GC_GUARD calls were added by me in
2012 to fix an earlier segfault.[1] This is the first reported
RB_GC_GUARD-related segfault in sequel_pg since then.

Pere, I would appreciate if you could test this patch and see if it
fixes your issue.  I will also test it and will release a new sequel_pg
version with this patch if it fixes the issue.

Thanks,
Jeremy

[1] 
https://github.com/jeremyevans/sequel_pg/commit/15edb132887d9b5292cad419fc7692ed5cd4b01b.diff
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-14 Thread Eric Wong
Pere Joan Martorell  wrote:
> I suspect that the conflicting gem was 'sequel_pg' (sequel_pg
> overwrites the inner loop of the Sequel postgres adapter row fetching
> code with a C version. The C version is significantly faster than the
> pure ruby version that Sequel uses by default), but given I didn't
> remove these gems one by one I can't completely ensure that.
> 
> If the problem reemerges I'll keep you informed.
> 
> Thanks!! :)

Thanks for the info.  I've added Jeremy Evans, the author of
sequel_pg to the Cc: even though I think he reads this list...

Anyways, I think I've spotted one potential bug in sequel_pg
w.r.t. RB_GC_GUARD usage, and the fix is below:

  git clone https://github.com/jeremyevans/sequel_pg && cd sequel_pg
  curl https://80x24.org/spew/20170714210918.3332-...@80x24.org/raw | git am

(more in-depth explanation is in the commit message)

Pere: perhaps you can give it a shot

Keep in mind I've only compile-tested this.  I didn't find
automated tests in the code and I don't have a usable Postgres
instance, at the moment.
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/



Re: Random crash when sending USR2 + QUIT signals to Unicorn process

2017-07-13 Thread Eric Wong
+Cc: Philip and Jonathan  since they encountered this three years
ago, but we never heard back from them:

https://bogomips.org/unicorn-public/?q=T_NODE+d:..20170713


Pere Joan Martorell  wrote:
> > /home/deployer/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/unicorn-5.3.0/lib/unicorn/http_request.rb:80:in
> >  `parse': method `hash' called on unexpected T_NODE object 
> > (0x0055b15b973508 flags=0xaa31b) (NotImplementedError)

> Any idea what is happening?

This is most likely a bug in a C extension not using write
barriers correctly (perhaps via undocumented C-API functions in
Ruby).

I don't think I've seen this on ruby-core bug reports in a few years:

https://public-inbox.org/ruby-core/?q=T_NODE

Fwiw, Appendix D on Generational GC in the Ruby source is
worth reading to any C extension authors:

https://80x24.org/mirrors/ruby.git/plain/doc/extension.rdoc

There are probably build warnings when using some dangerous methods/macros,
maybe you can check build logs for const warnings.


Can you share the list of RubyGems you have loaded and maybe try
upgrading/replacing/eliminating the ones with C extensions
one-by-one until the error stops?

Also, perhaps the output of "pmap $PID_OF_WORKER" if you're on
Linux (or equivalent command if you're on another OS).

Anyways, I didn't notice anything suspicious in your config.

I'll do another self-audit of the unicorn + kgio + raindrops
extensions, too, but judging from the lack of reports and how
much they get used; I suspect the bug is elsewhere (more eyes
welcome, of course).

Thanks for the report and any more info you can provide!
--
unsubscribe: unicorn-public+unsubscr...@bogomips.org
archive: https://bogomips.org/unicorn-public/