[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
Github user amyrazz44 closed the pull request at: https://github.com/apache/incubator-hawq/pull/1157 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
Github user paul-guo- commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/1157#discussion_r104572394 --- Diff: src/backend/executor/nodeShareInputScan.c --- @@ -634,16 +634,16 @@ static int retry_read(int fd, char *buf, int rsize) read_retry: sz = read(fd, buf, rsize); - if (sz > 0) + if (sz >= 0) return sz; - else if(sz == 0 || errno == EINTR) + else if(errno == EINTR) --- End diff -- It's set as bocking IO in this case AFAIK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
Github user wengyanqing commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/1157#discussion_r104548625 --- Diff: src/backend/executor/nodeShareInputScan.c --- @@ -634,16 +634,16 @@ static int retry_read(int fd, char *buf, int rsize) read_retry: sz = read(fd, buf, rsize); - if (sz > 0) + if (sz >= 0) return sz; - else if(sz == 0 || errno == EINTR) + else if(errno == EINTR) --- End diff -- It needs to handle EAGAIN in nonblocking read. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
Github user paul-guo- commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/1157#discussion_r103857662 --- Diff: src/backend/executor/nodeShareInputScan.c --- @@ -885,6 +906,12 @@ writer_wait_for_acks(ShareInput_Lk_Context *pctxt, int share_id, int xslice) while(ack_needed > 0) { CHECK_FOR_INTERRUPTS(); + + if (IsAbortInProgress()) + { + break; + } + --- End diff -- Whether comment is needed here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
Github user paul-guo- commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/1157#discussion_r103856229 --- Diff: src/backend/executor/nodeShareInputScan.c --- @@ -627,38 +627,50 @@ static void create_tmp_fifo(const char *fifoname) /* * As all other read/write in postgres, we may be interrupted so retry is needed. */ -static int retry_read(int fd, char *buf, int rsize) +static int retry_read(int *fd, char *buf, int rsize) { int sz; Assert(rsize > 0); read_retry: - sz = read(fd, buf, rsize); + sz = read(*fd, buf, rsize); --- End diff -- Frankly speaking, I'd retry_read() logic simple like this: do { err =read(fd, buf, rsize); } while (err == -1 && errno == EINTR); And leave close() and error handling code in callers of it. If you insist on this, at least you could modify the function name to reflect the additional close() call and exiting. I do not why a fd pointer is needed here since elog(ERROR, ...) will quit the progress. The comment applies to the write change below also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
Github user paul-guo- commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/1157#discussion_r103856770 --- Diff: src/backend/executor/nodeShareInputScan.c --- @@ -627,38 +627,50 @@ static void create_tmp_fifo(const char *fifoname) /* * As all other read/write in postgres, we may be interrupted so retry is needed. */ -static int retry_read(int fd, char *buf, int rsize) +static int retry_read(int *fd, char *buf, int rsize) { int sz; Assert(rsize > 0); read_retry: - sz = read(fd, buf, rsize); + sz = read(*fd, buf, rsize); if (sz > 0) return sz; - else if(sz == 0 || errno == EINTR) + else if(sz == 0) // read EOF + return 0; --- End diff -- Why not if (sz >= 0)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
Github user paul-guo- commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/1157#discussion_r103857185 --- Diff: src/backend/executor/nodeShareInputScan.c --- @@ -627,38 +627,50 @@ static void create_tmp_fifo(const char *fifoname) /* * As all other read/write in postgres, we may be interrupted so retry is needed. */ -static int retry_read(int fd, char *buf, int rsize) +static int retry_read(int *fd, char *buf, int rsize) { int sz; Assert(rsize > 0); read_retry: - sz = read(fd, buf, rsize); + sz = read(*fd, buf, rsize); if (sz > 0) return sz; - else if(sz == 0 || errno == EINTR) + else if(sz == 0) // read EOF + return 0; + else if(errno == EINTR) goto read_retry; else { + if(*fd >= 0) + { + gp_retry_close(fd); + *fd = -1; + } elog(ERROR, "could not read from fifo: %m"); } Assert(!"Never be here"); return 0; --- End diff -- Although this will never be reached, but I'd suggest -1 for return value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
Github user paul-guo- commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/1157#discussion_r103855671 --- Diff: src/backend/executor/nodeShareInputScan.c --- @@ -1009,10 +1059,10 @@ shareinput_writer_waitdone(void *ctxt, int share_id, int nsharer_xslice) { int save_errno = errno; elog(LOG, "SISC WRITER (shareid=%d, slice=%d): wait done time out once, errno %d", - share_id, currentSliceId, save_errno); - if(save_errno == EBADF) + share_id, currentSliceId, save_errno); + if(save_errno == EBADF || save_errno == EINVAL) { - /* The file description is invalid, maybe this FD has been already closed by writer in some cases + /* The file description is invalid, maybe this FD has been already closed by others in some cases --- End diff -- The comment does not apply for the check logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request #1157: HAWQ-1371. Fix QE process hang in shared ...
GitHub user amyrazz44 opened a pull request: https://github.com/apache/incubator-hawq/pull/1157 HAWQ-1371. Fix QE process hang in shared input scan node You can merge this pull request into a Git repository by running: $ git pull https://github.com/amyrazz44/incubator-hawq ShareinputScan Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/1157.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1157 commit aa11788b0899bcc7a94dcf4380751e40e546a92e Author: amyrazz44Date: 2017-03-01T08:10:59Z HAWQ-1371. Fix QE process hang in shared input scan node --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---