Hi,
> There are some similarities between this and
> https://www.postgresql.org/message-id/20240207184050.rkvpuudq7huijmkb%40awork3.anarazel.de
> as described in that email.
Thanks for this information.
>
>
> Hm, I think this might make this code a bit more expensive. It's cheaper, both
> in
Hi,
There are some similarities between this and
https://www.postgresql.org/message-id/20240207184050.rkvpuudq7huijmkb%40awork3.anarazel.de
as described in that email.
On 2024-01-25 15:24:17 +0800, Andy Fan wrote:
> From 4c8fd0ab71299e57fbeb18dec70051bd1d035c7c Mon Sep 17 00:00:00 2001
> From:
Andres Freund writes:
> On 2024-01-18 14:00:58 -0500, Robert Haas wrote:
>> > The LockBufHdr also used init_local_spin_delay / perform_spin_delay
>> > infrastruce and then it has the same issue like ${subject}, it is pretty
>> > like the code in s_lock; Based on my current knowledge, I think
Hi,
> Hi,
>
> On 2024-01-22 15:18:35 +0800, Andy Fan wrote:
>> I used sigismember(, SIGQUIT) to detect if a process is doing a
>> quickdie, however this is bad not only because it doesn't work on
>> Windows, but also it has too poor performance even it impacts on
>> USE_ASSERT_CHECKING build
Hi,
On 2024-01-22 15:18:35 +0800, Andy Fan wrote:
> I used sigismember(, SIGQUIT) to detect if a process is doing a
> quickdie, however this is bad not only because it doesn't work on
> Windows, but also it has too poor performance even it impacts on
> USE_ASSERT_CHECKING build only. In v8, I
Hi,
On 2024-01-18 14:00:58 -0500, Robert Haas wrote:
> > The LockBufHdr also used init_local_spin_delay / perform_spin_delay
> > infrastruce and then it has the same issue like ${subject}, it is pretty
> > like the code in s_lock; Based on my current knowledge, I think we
> > should add the check
Robert Haas writes:
> On Mon, Jan 22, 2024 at 12:13 PM Andy Fan wrote:
>> > On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote:
>> >> I get your point! Acquiring an already held spinlock in quickdie is
>> >> unlikely to happen, but since our existing infrastructure can handle it,
>> >> then
On Mon, Jan 22, 2024 at 12:13 PM Andy Fan wrote:
> > On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote:
> >> I get your point! Acquiring an already held spinlock in quickdie is
> >> unlikely to happen, but since our existing infrastructure can handle it,
> >> then there is no reason to bypass it.
Robert Haas writes:
> On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote:
>> I get your point! Acquiring an already held spinlock in quickdie is
>> unlikely to happen, but since our existing infrastructure can handle it,
>> then there is no reason to bypass it.
>
> No, the existing
On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote:
> I get your point! Acquiring an already held spinlock in quickdie is
> unlikely to happen, but since our existing infrastructure can handle it,
> then there is no reason to bypass it.
No, the existing infrastructure cannot handle that at all.
--
Robert Haas writes:
> On Mon, Jan 22, 2024 at 2:22 AM Andy Fan wrote:
>> I used sigismember(, SIGQUIT) to detect if a process is doing a
>> quickdie, however this is bad not only because it doesn't work on
>> Windows, but also it has too poor performance even it impacts on
>>
On Mon, Jan 22, 2024 at 2:22 AM Andy Fan wrote:
> I used sigismember(, SIGQUIT) to detect if a process is doing a
> quickdie, however this is bad not only because it doesn't work on
> Windows, but also it has too poor performance even it impacts on
> USE_ASSERT_CHECKING build only. In v8, I
Andy Fan writes:
> I found a speical case about checking it in errstart. So commit 3 in v7
> is added.
>
> commit 757c67c1d4895ce6a523bcf5217af8eb2351e2a1 (HEAD -> s_stuck_v3)
> Author: yizhi.fzh
> Date: Mon Jan 22 07:14:29 2024 +0800
>
> Bypass SpinLock checking in SIGQUIT signal
Hi,
Andy Fan writes:
> Hi,
>
> v6 attached which addressed all the items Robert suggested except the
> following 2 open items. They are handled differently.
>
>>
>> Here is the summary of the open-items, it would be great that Andres and
>> Matthias have a look at this when they have time.
>>
Hi,
v6 attached which addressed all the items Robert suggested except the
following 2 open items. They are handled differently.
>
> Here is the summary of the open-items, it would be great that Andres and
> Matthias have a look at this when they have time.
>
>>> The LockBufHdr also used
Hi,
Here is the summary of the open-items, it would be great that Andres and
Matthias have a look at this when they have time.
1. Shall we treat the LockBufHdr as a SpinLock usage.
>> The LockBufHdr also used init_local_spin_delay / perform_spin_delay
>> infrastruce and then it has the same
On Thu, Jan 18, 2024 at 1:30 PM Andy Fan wrote:
> > You added an #include to dynahash.c despite making no other changes to
> > the file.
>
> That's mainly because I put the code into miscadmin.h and spin.h depends
> on miscadmin.h with MACROs.
That's not a good reason. Headers need to include
Hi Robert,
Thanks for your attention!
> On Thu, Jan 18, 2024 at 7:56 AM Andy Fan wrote:
>> Do you still have interest with making some progress on this topic?
>
> Some, but it's definitely not my top priority. I wish I could give as
> much attention as everyone would like to everyone's
On Thu, Jan 18, 2024 at 7:56 AM Andy Fan wrote:
> Do you still have interest with making some progress on this topic?
Some, but it's definitely not my top priority. I wish I could give as
much attention as everyone would like to everyone's patches, but I
can't even come close.
I think that the
ssible
to be misused in many cases. e.g. Acquiring another LWLocks or regular
locks, memory allocation. In this patch, all of such cases may increase
the timing of holding the spin lock. In our culture, if any spin lock
can't be acquired for some-time, a crash should happen. See the
s_lock_stuck in perform_sp
cquired for some-time, a crash should happen. See the
s_lock_stuck in perform_spin_delay.
CHECK_FOR_INTERRUPTS should be avoided as well when holding a spin lock.
Depends on what signals are left to handle, PG may raise error/fatal
which would cause the code jump to some other places which is h
On Wed, Jan 10, 2024 at 10:17 PM Andy Fan wrote:
> fixed in v2.
Timing the spinlock wait seems like a separate patch from the new sanity checks.
I suspect that the new sanity checks should only be armed in
assert-enabled builds.
I'm doubtful that this method of tracking the wait time will be
Robert Haas writes:
> Thanks for jumping in with a review, Matthias!
FWIW, Matthias is also the first one for this proposal at this
thread, thanks for that as well!
--
Best Regards
Andy Fan
Hi Matthias,
Thanks for the review!
Matthias van de Meent writes:
> On Wed, 10 Jan 2024 at 02:44, Andy Fan wrote:
>> Hi,
>>
>> I want to know if Andres or you have plan
>> to do some code review. I don't expect this would happen very soon, just
>> want to make sure this will not happen that
Thanks for jumping in with a review, Matthias!
On Wed, Jan 10, 2024 at 8:03 AM Matthias van de Meent
wrote:
> I'm not 100% sure on the policy of this, but theoretically you could
> use LockAquireExtended(dontWait=true) while holding a spin lock, as
> that would not have an unknown duration. Then
On Wed, 10 Jan 2024 at 02:44, Andy Fan wrote:
> Hi,
>
> I want to know if Andres or you have plan
> to do some code review. I don't expect this would happen very soon, just
> want to make sure this will not happen that both of you think the other
> one will do, but actually none of them does it
Hi,
Robert Haas writes:
> On Mon, Jan 8, 2024 at 9:40 PM Andy Fan wrote:
>> The singler handler I was refering to is 'CHECK_FOR_INTERRUPTS', Based
>> on this, spin_lock and lwlock are acted pretty differently.
>
> CHECK_FOR_INTERRUPTS() is not a signal handler,
hmm, I knew this but I
On Mon, Jan 8, 2024 at 9:40 PM Andy Fan wrote:
> The singler handler I was refering to is 'CHECK_FOR_INTERRUPTS', Based
> on this, spin_lock and lwlock are acted pretty differently.
CHECK_FOR_INTERRUPTS() is not a signal handler, and it's OK to acquire
and release spin locks or lwlocks there. We
Hi,
Robert Haas writes:
> On Sun, Jan 7, 2024 at 9:52 PM Andy Fan wrote:
>> > I think we should add cassert-only infrastructure tracking whether we
>> > currently hold spinlocks, are in a signal handler and perhaps a few other
>> > states. That'd allow us to add assertions like:
>> ..
>> > -
On Sun, Jan 7, 2024 at 9:52 PM Andy Fan wrote:
> > I think we should add cassert-only infrastructure tracking whether we
> > currently hold spinlocks, are in a signal handler and perhaps a few other
> > states. That'd allow us to add assertions like:
> ..
> > - no lwlocks or ... while in signal
Hi!
>
> I think we should add cassert-only infrastructure tracking whether we
> currently hold spinlocks, are in a signal handler and perhaps a few other
> states. That'd allow us to add assertions like:
..
> - no lwlocks or ... while in signal handlers
I *wish* lwlocks should *not* be held
Hi,
> On 2024-01-05 14:19:23 -0500, Robert Haas wrote:
>> On Fri, Jan 5, 2024 at 2:11 PM Andres Freund wrote:
>> > I see it fairly regularly. Including finding several related bugs that
>> > lead to
>> > stuck systems last year (signal handlers are a menace).
>>
>> In that case, I think this
Hi,
On 2024-01-05 14:19:23 -0500, Robert Haas wrote:
> On Fri, Jan 5, 2024 at 2:11 PM Andres Freund wrote:
> > I see it fairly regularly. Including finding several related bugs that lead
> > to
> > stuck systems last year (signal handlers are a menace).
>
> In that case, I think this proposal
Hi,
On 2024-01-05 10:20:39 +0800, Andy Fan wrote:
> Andres Freund writes:
> > On 2024-01-04 14:59:06 +0800, Andy Fan wrote:
> >> My question is if someone doesn't obey the rule by mistake (everyone
> >> can make mistake), shall we PANIC on a production environment? IMO I
> >> think it can be a
On Fri, Jan 5, 2024 at 2:11 PM Andres Freund wrote:
> I see it fairly regularly. Including finding several related bugs that lead to
> stuck systems last year (signal handlers are a menace).
In that case, I think this proposal is dead. I can't personally
testify to this code being a force for
Hi,
On 2024-01-05 08:51:53 -0500, Robert Haas wrote:
> On Thu, Jan 4, 2024 at 6:06 PM Andres Freund wrote:
> > I think we should add infrastructure to detect bugs like this during
> > development, but not PANICing when this happens in production seems
> > completely
> > non-viable.
>
> I mean
On Thu, Jan 4, 2024 at 6:06 PM Andres Freund wrote:
> I think we should add infrastructure to detect bugs like this during
> development, but not PANICing when this happens in production seems completely
> non-viable.
I mean +1 for the infrastructure, but "completely non-viable"? Why?
I've only
Hi,
Andres Freund writes:
>
> On 2024-01-04 14:59:06 +0800, Andy Fan wrote:
>> My question is if someone doesn't obey the rule by mistake (everyone
>> can make mistake), shall we PANIC on a production environment? IMO I
>> think it can be a WARNING on a production environment and be a stuck
Hi,
On 2024-01-04 17:03:18 -0600, Jim Nasby wrote:
> On 1/4/24 10:33 AM, Tom Lane wrote:
>
> Robert Haas writes:
>
> On Thu, Jan 4, 2024 at 10:22 AM Tom Lane wrote:
>
> We should be making an effort to ban coding patterns like
> "return with spinlock still
Hi,
On 2024-01-04 14:59:06 +0800, Andy Fan wrote:
> My question is if someone doesn't obey the rule by mistake (everyone
> can make mistake), shall we PANIC on a production environment? IMO I
> think it can be a WARNING on a production environment and be a stuck
> when 'ifdef
On 1/4/24 10:33 AM, Tom Lane wrote:
Robert Haas writes:
On Thu, Jan 4, 2024 at 10:22 AM Tom Lane wrote:
We should be making an effort to ban coding patterns like
"return with spinlock still held", because they're just too prone
to errors similar to this one.
I agree
Hi,
On 2024-01-04 12:04:07 -0500, Robert Haas wrote:
> On Thu, Jan 4, 2024 at 11:33 AM Tom Lane wrote:
> > I believe it's (a). No matter what the reason for a stuck spinlock
> > is, the only reliable method of getting out of the problem is to
> > blow things up and start over. The patch
On Thu, Jan 4, 2024 at 11:33 AM Tom Lane wrote:
> I believe it's (a). No matter what the reason for a stuck spinlock
> is, the only reliable method of getting out of the problem is to
> blow things up and start over. The patch proposed at the top of this
> thread would leave the system unable
Robert Haas writes:
> On Thu, Jan 4, 2024 at 10:22 AM Tom Lane wrote:
>> We should be making an effort to ban coding patterns like
>> "return with spinlock still held", because they're just too prone
>> to errors similar to this one.
> I agree that we don't want to add overhead, and also about
On Thu, Jan 4, 2024 at 10:22 AM Tom Lane wrote:
> I'm not a fan of adding overhead to such a performance-critical
> thing as spinlocks in order to catch coding errors that are easily
> detectable statically. IMV the correct usage of spinlocks is that
> they should only be held across *short,
Robert Haas writes:
> I'm not sure that the approach this patch takes is correct in detail,
> but I kind of agree with you about the overall point. I mean, the idea
> of the PANIC is to avoid having the system just sit there in a state
> from which it will never recover ... but it can also have
a similar manner as
> START_CRIT_SECTION/END_CRIT_SECTION.
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
--
Best Regards
Andy Fan
>From 7d7fd0f0e9b13a290bfffaec0ad40773191155f2 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh"
Date: Thu, 4 Jan 2024 14:33:37 +0800
Sub
On Thu, Jan 4, 2024 at 2:09 AM Andy Fan wrote:
> My question is if someone doesn't obey the rule by mistake (everyone
> can make mistake), shall we PANIC on a production environment? IMO I
> think it can be a WARNING on a production environment and be a stuck
> when 'ifdef USE_ASSERT_CHECKING'.
>
On Thu, 4 Jan 2024 at 08:09, Andy Fan wrote:
>
> My question is if someone doesn't obey the rule by mistake (everyone
> can make mistake), shall we PANIC on a production environment? IMO I
> think it can be a WARNING on a production environment and be a stuck
> when 'ifdef USE_ASSERT_CHECKING'.
>
Don't call s_lock_stuck in production environment
In the current implementation, if a spin lock is misused, the
s_lock_stuck in perform_spin_delay can cause the entire system to
PANIC. In order to balance fault tolerance and the ability to detect
incorrect usage, we can use WARNING to replace
50 matches
Mail list logo