[HACKERS] Hang issue when COPY to/from an unopened FIFO

2016-07-14 Thread Kenan Yao
Hi devs,

I came across a hang issue when COPY to a FIFO file, because the FIFO is
not opened for read on the other end. The backtrace from master branch is
like:

#0  0x00332ccc6c30 in __open_nocancel () from /lib64/libc.so.6
#1  0x00332cc6b693 in __GI__IO_file_open () from /lib64/libc.so.6
#2  0x00332cc6b7dc in _IO_new_file_fopen () from /lib64/libc.so.6
#3  0x00332cc60a04 in __fopen_internal () from /lib64/libc.so.6
#4  0x007d0536 in AllocateFile ()
#5  0x005f3d8e in BeginCopyFrom ()
#6  0x005ef192 in DoCopy ()
#7  0x0080bb22 in standard_ProcessUtility ()
#8  0x0080b4d6 in ProcessUtility ()
#9  0x0080a5cf in PortalRunUtility ()
#10 0x0080a78c in PortalRunMulti ()
#11 0x00809dff in PortalRun ()
#12 0x00803f85 in exec_simple_query ()
#13 0x0080807f in PostgresMain ()
#14 0x0077f639 in BackendRun ()
#15 0x0077eceb in BackendStartup ()
#16 0x0077b185 in ServerLoop ()
#17 0x0077a7ee in PostmasterMain ()
#18 0x006c93de in main ()

​
Reproduction is simple:

-- mkfifo /tmp/test.dat # bash
copy pg_class to '/tmp/test.dat';
-- try pg_cancel_backend or pg_terminate_backend from other sessions

​
The problem is that, if we are trapped here, we cannot cancel or terminate
this backend process unless we open the FIFO for read.

I am not sure whether this should be categorized as a bug, since it is
caused by wrong usage of FIFO indeed, but the backend cannot be terminated
anyhow.

I see recv and send call in secure_read/secure_write are implemented as
non-blocking style to make them interruptible, is it worthy to turn fopen
into non-blocking style as well?

Same thing would happen for file_fdw on an unopened FIFO.

Cheers,
Kenan


Re: [HACKERS] A question regarding LWLock in ProcSleep

2015-12-21 Thread Kenan Yao
+Heikki,

Hi Heikki,

Could you please help provide a case to elaborate why we need the LWLock
here? or could you please tell me to whom should I ask this question?

Thanks,
Kenan

On Tue, Dec 22, 2015 at 11:41 AM, Kenan Yao  wrote:

> Hi Amit,
>
> Thanks for the reply!
>
> Yes, in LockAcquireExtended, after exiting WaitOnLock, there would be a
> read access to proclock, however, since the lock has either been granted or
> rejected by deadlock check, I can think of no possible concurrent write
> access to the proclock, so is it really necessary to acquire the LWLock?
>
> Thanks,
> Kenan
>
> On Fri, Dec 18, 2015 at 10:25 PM, Amit Kapila 
> wrote:
>
>> On Thu, Dec 17, 2015 at 1:51 PM, Kenan Yao  wrote:
>>
>>> Hi there,
>>>
>>> In function ProcSleep, after the process has been waken up, either with
>>> lock granted or deadlock detected, it would re-acquire the lock table's
>>> partition LWLock.
>>>
>>> The code episode is here:
>>>
>>> /*
>>>  * Re-acquire the lock table's partition lock.  We have to do this to 
>>> hold
>>>  * off cancel/die interrupts before we can mess with lockAwaited (else 
>>> we
>>>  * might have a missed or duplicated locallock update).
>>>  */
>>> LWLockAcquire(partitionLock, LW_EXCLUSIVE);
>>>
>>> /*
>>>  * We no longer want LockErrorCleanup to do anything.
>>>  */
>>> lockAwaited = NULL;
>>>
>>> /*
>>>  * If we got the lock, be sure to remember it in the locallock table.
>>>  */
>>> if (MyProc->waitStatus == STATUS_OK)
>>> GrantAwaitedLock();
>>>
>>> /*
>>>  * We don't have to do anything else, because the awaker did all the
>>>  * necessary update of the lock table and MyProc.
>>>  */
>>> return MyProc->waitStatus;
>>>
>>> ​
>>> Questions are:
>>>
>>> (1) The comment says that "we might have a missed or duplicated
>>> locallock update", in what cases would we hit this if without holding the
>>> LWLock?
>>>
>>> (2) The comment says "we have to do this to hold off cancel/die
>>> interrupts", then:
>>>
>>>- why using LWLockAcquire instead of HOLD_INTERRUPTS directly?
>>>- From the handler of SIGINT and SIGTERM, seems nothing serious
>>>would be processed here, since no CHECK_FOR_INTERRUPS is called before
>>>releasing this LWLock. Why we should hold off cancel/die interrupts here?
>>>
>>> (3) Before releasing this LWLock, the only share memory access is
>>> MyProc->waitStatus; since the process has been granted the lock or removed
>>> from the lock's waiting list because of deadlock, is it possible some other
>>> processes would access this field? if not, then why we need LWLock here?
>>> what does this lock protect?
>>>
>>>
>> I think the other thing which needs protection of LWLock is
>> access to proclock which is done in the caller
>> (LockAcquireExtended).
>>
>>
>>
>>
>> With Regards,
>> Amit Kapila.
>> EnterpriseDB: http://www.enterprisedb.com
>>
>
>


Re: [HACKERS] A question regarding LWLock in ProcSleep

2015-12-21 Thread Kenan Yao
Hi Amit,

Thanks for the reply!

Yes, in LockAcquireExtended, after exiting WaitOnLock, there would be a
read access to proclock, however, since the lock has either been granted or
rejected by deadlock check, I can think of no possible concurrent write
access to the proclock, so is it really necessary to acquire the LWLock?

Thanks,
Kenan

On Fri, Dec 18, 2015 at 10:25 PM, Amit Kapila 
wrote:

> On Thu, Dec 17, 2015 at 1:51 PM, Kenan Yao  wrote:
>
>> Hi there,
>>
>> In function ProcSleep, after the process has been waken up, either with
>> lock granted or deadlock detected, it would re-acquire the lock table's
>> partition LWLock.
>>
>> The code episode is here:
>>
>> /*
>>  * Re-acquire the lock table's partition lock.  We have to do this to 
>> hold
>>  * off cancel/die interrupts before we can mess with lockAwaited (else we
>>  * might have a missed or duplicated locallock update).
>>  */
>> LWLockAcquire(partitionLock, LW_EXCLUSIVE);
>>
>> /*
>>  * We no longer want LockErrorCleanup to do anything.
>>  */
>> lockAwaited = NULL;
>>
>> /*
>>  * If we got the lock, be sure to remember it in the locallock table.
>>  */
>> if (MyProc->waitStatus == STATUS_OK)
>> GrantAwaitedLock();
>>
>> /*
>>  * We don't have to do anything else, because the awaker did all the
>>  * necessary update of the lock table and MyProc.
>>  */
>> return MyProc->waitStatus;
>>
>> ​
>> Questions are:
>>
>> (1) The comment says that "we might have a missed or duplicated locallock
>> update", in what cases would we hit this if without holding the LWLock?
>>
>> (2) The comment says "we have to do this to hold off cancel/die
>> interrupts", then:
>>
>>- why using LWLockAcquire instead of HOLD_INTERRUPTS directly?
>>- From the handler of SIGINT and SIGTERM, seems nothing serious would
>>be processed here, since no CHECK_FOR_INTERRUPS is called before releasing
>>this LWLock. Why we should hold off cancel/die interrupts here?
>>
>> (3) Before releasing this LWLock, the only share memory access is
>> MyProc->waitStatus; since the process has been granted the lock or removed
>> from the lock's waiting list because of deadlock, is it possible some other
>> processes would access this field? if not, then why we need LWLock here?
>> what does this lock protect?
>>
>>
> I think the other thing which needs protection of LWLock is
> access to proclock which is done in the caller
> (LockAcquireExtended).
>
>
>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


[HACKERS] A question regarding LWLock in ProcSleep

2015-12-17 Thread Kenan Yao
Hi there,

In function ProcSleep, after the process has been waken up, either with
lock granted or deadlock detected, it would re-acquire the lock table's
partition LWLock.

The code episode is here:

/*
 * Re-acquire the lock table's partition lock.  We have to do this to hold
 * off cancel/die interrupts before we can mess with lockAwaited (else we
 * might have a missed or duplicated locallock update).
 */
LWLockAcquire(partitionLock, LW_EXCLUSIVE);

/*
 * We no longer want LockErrorCleanup to do anything.
 */
lockAwaited = NULL;

/*
 * If we got the lock, be sure to remember it in the locallock table.
 */
if (MyProc->waitStatus == STATUS_OK)
GrantAwaitedLock();

/*
 * We don't have to do anything else, because the awaker did all the
 * necessary update of the lock table and MyProc.
 */
return MyProc->waitStatus;

​
Questions are:

(1) The comment says that "we might have a missed or duplicated locallock
update", in what cases would we hit this if without holding the LWLock?

(2) The comment says "we have to do this to hold off cancel/die
interrupts", then:

   - why using LWLockAcquire instead of HOLD_INTERRUPTS directly?
   - From the handler of SIGINT and SIGTERM, seems nothing serious would be
   processed here, since no CHECK_FOR_INTERRUPS is called before releasing
   this LWLock. Why we should hold off cancel/die interrupts here?

(3) Before releasing this LWLock, the only share memory access is
MyProc->waitStatus; since the process has been granted the lock or removed
from the lock's waiting list because of deadlock, is it possible some other
processes would access this field? if not, then why we need LWLock here?
what does this lock protect?

Apologize if I missed anything here, and appreciate if someone can help me
on this question. Thanks

Cheers,
Kenan