[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-09 Thread amyrazz44
Github user amyrazz44 closed the pull request at:

https://github.com/apache/incubator-hawq/pull/1157


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-06 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1157#discussion_r104572394
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -634,16 +634,16 @@ static int retry_read(int fd, char *buf, int rsize)
 
 read_retry:
sz = read(fd, buf, rsize);
-   if (sz > 0)
+   if (sz >= 0)  
return sz;
-   else if(sz == 0 || errno == EINTR)
+   else if(errno == EINTR)
--- End diff --

It's set as bocking IO in this case AFAIK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-06 Thread wengyanqing
Github user wengyanqing commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1157#discussion_r104548625
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -634,16 +634,16 @@ static int retry_read(int fd, char *buf, int rsize)
 
 read_retry:
sz = read(fd, buf, rsize);
-   if (sz > 0)
+   if (sz >= 0)  
return sz;
-   else if(sz == 0 || errno == EINTR)
+   else if(errno == EINTR)
--- End diff --

It needs to handle EAGAIN in nonblocking read.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-01 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1157#discussion_r103857662
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -885,6 +906,12 @@ writer_wait_for_acks(ShareInput_Lk_Context *pctxt, int 
share_id, int xslice)
while(ack_needed > 0)
{
CHECK_FOR_INTERRUPTS();
+   
+   if (IsAbortInProgress())
+   {
+   break;
+   }
+
--- End diff --

Whether comment is needed here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-01 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1157#discussion_r103856229
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -627,38 +627,50 @@ static void create_tmp_fifo(const char *fifoname)
 /* 
  * As all other read/write in postgres, we may be interrupted so retry is 
needed.
  */
-static int retry_read(int fd, char *buf, int rsize)
+static int retry_read(int *fd, char *buf, int rsize)
 {
int sz;
Assert(rsize > 0);
 
 read_retry:
-   sz = read(fd, buf, rsize);
+   sz = read(*fd, buf, rsize);
--- End diff --

Frankly speaking, I'd retry_read() logic simple like this:
do
{
err =read(fd, buf, rsize);
} while (err == -1 && errno == EINTR);

And leave close() and error handling code in callers of it.

If you insist on this, at least you could modify the function name to 
reflect the additional close() call and exiting.

I do not why a fd pointer is needed here since elog(ERROR, ...) will quit 
the progress.

The comment applies to the write change below also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-01 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1157#discussion_r103856770
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -627,38 +627,50 @@ static void create_tmp_fifo(const char *fifoname)
 /* 
  * As all other read/write in postgres, we may be interrupted so retry is 
needed.
  */
-static int retry_read(int fd, char *buf, int rsize)
+static int retry_read(int *fd, char *buf, int rsize)
 {
int sz;
Assert(rsize > 0);
 
 read_retry:
-   sz = read(fd, buf, rsize);
+   sz = read(*fd, buf, rsize);
if (sz > 0)
return sz;
-   else if(sz == 0 || errno == EINTR)
+   else if(sz == 0) // read EOF 
+   return 0;
--- End diff --

Why not if (sz >= 0)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-01 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1157#discussion_r103857185
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -627,38 +627,50 @@ static void create_tmp_fifo(const char *fifoname)
 /* 
  * As all other read/write in postgres, we may be interrupted so retry is 
needed.
  */
-static int retry_read(int fd, char *buf, int rsize)
+static int retry_read(int *fd, char *buf, int rsize)
 {
int sz;
Assert(rsize > 0);
 
 read_retry:
-   sz = read(fd, buf, rsize);
+   sz = read(*fd, buf, rsize);
if (sz > 0)
return sz;
-   else if(sz == 0 || errno == EINTR)
+   else if(sz == 0) // read EOF 
+   return 0;
+   else if(errno == EINTR)
goto read_retry;
else
{
+   if(*fd >= 0)
+   {
+   gp_retry_close(fd);
+   *fd = -1;
+   }
elog(ERROR, "could not read from fifo: %m");
}
Assert(!"Never be here");
return 0;
--- End diff --

Although this will never be reached, but I'd suggest -1 for return value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-01 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1157#discussion_r103855671
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -1009,10 +1059,10 @@ shareinput_writer_waitdone(void *ctxt, int 
share_id, int nsharer_xslice)
{
int save_errno = errno;
elog(LOG, "SISC WRITER (shareid=%d, slice=%d): wait 
done time out once, errno %d",
-   share_id, currentSliceId, save_errno);
-   if(save_errno == EBADF)
+   share_id, currentSliceId, save_errno);  
+   if(save_errno == EBADF || save_errno == EINVAL)
{
-   /* The file description is invalid, maybe this 
FD has been already closed by writer in some cases
+   /* The file description is invalid, maybe this 
FD has been already closed by others in some cases
--- End diff --

The comment does not apply for the check logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...

2017-03-01 Thread amyrazz44
GitHub user amyrazz44 opened a pull request:

https://github.com/apache/incubator-hawq/pull/1157

HAWQ-1371. Fix QE process hang in shared input scan node



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amyrazz44/incubator-hawq ShareinputScan

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/1157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1157


commit aa11788b0899bcc7a94dcf4380751e40e546a92e
Author: amyrazz44 
Date:   2017-03-01T08:10:59Z

HAWQ-1371. Fix QE process hang in shared input scan node




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---