[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-08-24 Thread amyrazz44
Github user amyrazz44 closed the pull request at:

https://github.com/apache/incubator-hawq/pull/1243


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-30 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r119279208
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -666,6 +717,29 @@ static int retry_write(int fd, char *buf, int wsize)
return 0;
 }
 
+
+
+/*
+ * generate_lock_file_name
+ *
+ * Called by reader or writer to make the unique lock file name.
+ */
+void generate_lock_file_name(char* p, int size, int share_id, const char* 
name)
+{
+   if (strncmp(name , "writer", strlen("writer")) == 0)
+   {
+   sisc_lockname(p, size, share_id, "ready");
+   strncat(p, name, lengthof(p) - strlen(p) - 1);
--- End diff --

Not lengthof(p). Should be size?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-30 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r119279170
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkLockFileForWriter
+ * 
+ * Create a unique lock file for writer, then use flock() to lock/unlock 
the lock file.
+ * We can make sure the lock file will be locked forerver until the writer 
process quits.
+ */
+static void mkLockFileForWriter(int size, int share_id, char * name)
+{
+   char *lock_file;
+   int lock;
+
+   lock_file = (char *)palloc0(sizeof(char) * MAXPGPATH);
--- End diff --

MAXPGPATH -> size?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-30 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r119279222
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -666,6 +717,29 @@ static int retry_write(int fd, char *buf, int wsize)
return 0;
 }
 
+
+
+/*
+ * generate_lock_file_name
+ *
+ * Called by reader or writer to make the unique lock file name.
+ */
+void generate_lock_file_name(char* p, int size, int share_id, const char* 
name)
+{
+   if (strncmp(name , "writer", strlen("writer")) == 0)
+   {
+   sisc_lockname(p, size, share_id, "ready");
+   strncat(p, name, lengthof(p) - strlen(p) - 1);
+   }
+   else
+   {
+   sisc_lockname(p, size, share_id, "done");
+   strncat(p, name, lengthof(p) - strlen(p) - 1);
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-30 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r119278591
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkLockFileForWriter
+ * 
+ * Create a unique lock file for writer, then use flock() to lock/unlock 
the lock file.
+ * We can make sure the lock file will be locked forerver until the writer 
process quits.
+ */
+static void mkLockFileForWriter(int size, int share_id, char * name)
+{
+   char *lock_file;
+   int lock;
+
+   lock_file = (char *)palloc0(sizeof(char) * MAXPGPATH);
+   generate_lock_file_name(lock_file, size, share_id, name);
+   elog(DEBUG3, "The lock file for writer in SISC is %s", lock_file);
+   sisc_writer_lock_fd = open(lock_file, O_CREAT, S_IRWXU);
+   if(sisc_writer_lock_fd < 0)
+   {
+   elog(ERROR, "Could not create lock file %s for writer in SISC. 
The error number is %d", lock_file, errno);
+   }
+   lock = flock(sisc_writer_lock_fd, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   elog(DEBUG3, "Could not lock lock file  \"%s\" for writer in 
SISC . The error number is %d", lock_file, errno);
+   else if(lock == 0)
+   elog(DEBUG3, "Successfully locked lock file  \"%s\" for writer 
in SISC.The error number is %d", lock_file, errno);
--- End diff --

For (lock == 0), there is no need of errno in log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-27 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118815751
  
--- Diff: src/backend/utils/misc/guc.c ---
@@ -6685,6 +6687,15 @@ static struct config_int ConfigureNamesInt[] =
&metadata_cache_max_hdfs_file_num,
524288, 32768, 8388608, NULL, NULL
},
+   {
+   {"share_input_scan_wait_lockfile_timeout", PGC_USERSET, 
DEVELOPER_OPTIONS,
+   gettext_noop("timeout for waiting lock file which 
writer creates."),
+   NULL
+   },
--- End diff --

So this is ms or second?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-27 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118815742
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,78 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_lock_file, F_OK);   
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait lock file for writer time 
out interval is %d", timeout_interval);
+   if(timeout_interval >= 
share_input_scan_wait_lockfile_timeout || flag == true) //If lock file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_lock_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
--- End diff --

Why you ignore tv_usec?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-27 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118815361
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,78 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_lock_file, F_OK);   
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait lock file for writer time 
out interval is %d", timeout_interval);
+   if(timeout_interval >= 
share_input_scan_wait_lockfile_timeout || flag == true) //If lock file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_lock_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
+   }
+   else
+   {
+   elog(LOG, "writer lock file of 
shareinput_reader_waitready() is %s", writer_lock_file);
+   flag = true;
+   lock_fd = open(writer_lock_file, O_RDONLY);
+   if(lock_fd < 0)
+   {
+   elog(DEBUG3, "Open writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
+   continue;
+   }
+   lock = flock(lock_fd, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   {
+   /*
+* Reader try to lock the lock file 
which writer created until locked the lock file successfully 
+ * which means that writer process quit. If reader 
lock the lock file failed, it means that writer
+* process is healthy.
+*/
+   elog(DEBUG3, "Lock writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
+   }
+   else if(lock == 0)
+   {
+   /*
+* There is one situation to consider 
about.
+* Writer need a time interval to lock 
the lock file after the lock file has been created.
+* So, if reader lock the lock file 
ahead of writer, we should unlock it.
+* If reader lock the lock file after 
writer, it means that writer process has abort.
+* We should break the loop to make 
sure reader no longer wait for writer.
+*/  
+   if(is_lock_firsttime == true)  
+   {
+   lock = flock(lock_fd, LOCK_UN); 
+   is_lock_firsttime = false;
+   elog(DEBUG3, "Lock writer's 
lock file %s first time successfully in SISC! Unlock it.", writer_lock_file);
+   continue;
+   }
+   else
+   {
+   elog(LOG, "Lock writer's lock 
file %s successfully in SISC!", writer_lock_file);
+   /* Retry to close the fd in 
case there is interruption from signal */
+   while ((close(lock_fd) < 0) && 
(errno == EINTR))
--- End diff --

This is a legal condition. Should  not elog(ERROR).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-27 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118815310
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -666,6 +717,29 @@ static int retry_write(int fd, char *buf, int wsize)
return 0;
 }
 
+
+
+/*
+ * generate_lock_file_name
+ *
+ * Called by reader or writer to make the unique lock file name.
+ */
+void generate_lock_file_name(char* p, int size, int share_id, const char* 
name)
+{
+   if (strncmp(name , "writer", strlen("writer")) == 0)
+   {
+   sisc_lockname(p, size, share_id, "ready");
+   strncat(p, name, sizeof(p) - strlen(p) - 1);
+   }
+   else
+   {
+   sisc_lockname(p, size, share_id, "done");
+   strncat(p, name, sizeof(p) - strlen(p) - 1);
--- End diff --

same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-27 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118815308
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -666,6 +717,29 @@ static int retry_write(int fd, char *buf, int wsize)
return 0;
 }
 
+
+
+/*
+ * generate_lock_file_name
+ *
+ * Called by reader or writer to make the unique lock file name.
+ */
+void generate_lock_file_name(char* p, int size, int share_id, const char* 
name)
+{
+   if (strncmp(name , "writer", strlen("writer")) == 0)
+   {
+   sisc_lockname(p, size, share_id, "ready");
+   strncat(p, name, sizeof(p) - strlen(p) - 1);
--- End diff --

bug here. sizeof(p) = 8 (for x64), and strlen(name)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-26 Thread amyrazz44
Github user amyrazz44 commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118670052
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,72 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_lock_file, F_OK);   
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait lock file for writer time 
out interval is %d", timeout_interval);
+   if(timeout_interval >= 
share_input_scan_wait_lockfile_timeout || flag == true) //If lock file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_lock_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
+   }
+   else
+   {
+   elog(LOG, "writer lock file of 
shareinput_reader_waitready() is %s", writer_lock_file);
+   flag = true;
+   fd_lock = open(writer_lock_file, O_RDONLY);
+   if(fd_lock < 0)
+   {
+   elog(DEBUG3, "Open writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
+   }
+   lock = flock(fd_lock, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   {
+   elog(DEBUG3, "Lock writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
+   }
+   else if(lock == 0)
+   {
+   /*
+* There is one situation to consider 
about.
+* Writer need a time interval to lock 
the lock file after the lock file has been created.
+* So, if reader lock the lock file 
ahead of writer, we should unlock it.
+* If reader lock the lock file after 
writer, it means that writer process has abort.
+* We should break the loop to make 
sure reader no longer wait for writer.
+*/  
+   if(lock_count == 0)  
+   {
+   lock = flock(fd_lock, LOCK_UN); 
+   lock_count++;
--- End diff --

Because writer needs a time interval to lock the lock file. During the time 
interval, if reader lock the lock file ahead of writer , reader should unlock 
the lock file. In the code logic, lock_count==0 indicates the above logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-26 Thread amyrazz44
Github user amyrazz44 commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118668389
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,72 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_lock_file, F_OK);   
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait lock file for writer time 
out interval is %d", timeout_interval);
+   if(timeout_interval >= 
share_input_scan_wait_lockfile_timeout || flag == true) //If lock file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_lock_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
+   }
+   else
+   {
+   elog(LOG, "writer lock file of 
shareinput_reader_waitready() is %s", writer_lock_file);
+   flag = true;
+   fd_lock = open(writer_lock_file, O_RDONLY);
+   if(fd_lock < 0)
+   {
+   elog(DEBUG3, "Open writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
--- End diff --

Writer needs time to create the lock file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-26 Thread amyrazz44
Github user amyrazz44 commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118668142
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,72 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_lock_file, F_OK);   
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait lock file for writer time 
out interval is %d", timeout_interval);
+   if(timeout_interval >= 
share_input_scan_wait_lockfile_timeout || flag == true) //If lock file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_lock_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
+   }
+   else
+   {
+   elog(LOG, "writer lock file of 
shareinput_reader_waitready() is %s", writer_lock_file);
+   flag = true;
+   fd_lock = open(writer_lock_file, O_RDONLY);
+   if(fd_lock < 0)
+   {
+   elog(DEBUG3, "Open writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
--- End diff --

Reader try to lock the lock file which writer created until locked the lock 
file successfully which means that writer process quits. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118226956
  
--- Diff: src/backend/utils/misc/guc.c ---
@@ -6685,6 +6687,15 @@ static struct config_int ConfigureNamesInt[] =
&metadata_cache_max_hdfs_file_num,
524288, 32768, 8388608, NULL, NULL
},
+   {
+   {"share_input_scan_wait_lockfile_timeout", PGC_USERSET, 
DEVELOPER_OPTIONS,
+   gettext_noop("timeout for wait lock file."),
--- End diff --

The description is too short.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118224289
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -709,6 +783,12 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
struct timeval tval;
int n;
char a;
+   int file_exists = -1;
+   int timeout_interval = 0;
+   bool flag = false; //A tag for file exists or not.
+   int fd_lock = -1;
--- End diff --

fd_lock -> lock_fd?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118221100
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkLockFileForWriter
+ * 
+ * Create a unique lock file for writer. Then use flock's unblock way to 
lock the lock file.
+ * We can make sure the lock file will be locked forerver until the writer 
process has quit.
+ */
+void mkLockFileForWriter(int size, int share_id, char * name)
--- End diff --

mkLockFileForWriter() seems to be used in this file only. Why not static 
this file and remove the definition in the header file below?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118226809
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,72 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_lock_file, F_OK);   
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait lock file for writer time 
out interval is %d", timeout_interval);
+   if(timeout_interval >= 
share_input_scan_wait_lockfile_timeout || flag == true) //If lock file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_lock_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
+   }
+   else
+   {
+   elog(LOG, "writer lock file of 
shareinput_reader_waitready() is %s", writer_lock_file);
+   flag = true;
+   fd_lock = open(writer_lock_file, O_RDONLY);
+   if(fd_lock < 0)
+   {
+   elog(DEBUG3, "Open writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
+   }
+   lock = flock(fd_lock, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   {
+   elog(DEBUG3, "Lock writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
+   }
+   else if(lock == 0)
+   {
+   /*
+* There is one situation to consider 
about.
+* Writer need a time interval to lock 
the lock file after the lock file has been created.
+* So, if reader lock the lock file 
ahead of writer, we should unlock it.
+* If reader lock the lock file after 
writer, it means that writer process has abort.
+* We should break the loop to make 
sure reader no longer wait for writer.
+*/  
+   if(lock_count == 0)  
+   {
+   lock = flock(fd_lock, LOCK_UN); 
+   lock_count++;
--- End diff --

the lock_count logic works like a bool variable. Do you think this works 
for the situation of the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118222465
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkLockFileForWriter
+ * 
+ * Create a unique lock file for writer. Then use flock's unblock way to 
lock the lock file.
+ * We can make sure the lock file will be locked forerver until the writer 
process has quit.
+ */
+void mkLockFileForWriter(int size, int share_id, char * name)
+{
+   char *lock_file;
+   int lock;
+
+   lock_file = (char *)palloc0(sizeof(char) * MAXPGPATH);
+   generate_lock_file_name(lock_file, size, share_id, name);
+   elog(DEBUG3, "The lock file for writer in SISC is %s", lock_file);
+   sisc_writer_lock_fd = open(lock_file, O_CREAT, S_IRWXU);
+   if(sisc_writer_lock_fd < 0)
+   {
+   elog(ERROR, "Could not create lock file %s for writer in SISC. 
The error number is %d", lock_file, errno);
+   }
+   lock = flock(sisc_writer_lock_fd, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   elog(DEBUG3, "Could not lock lock file  \"%s\" for writer in 
SISC : %m. The error number is %d", lock_file, errno);
+   else if(lock == 0)
+   elog(DEBUG3, "Successfully locked lock file  \"%s\" for writer 
in SISC: %m.The error number is %d", lock_file, errno);
+   pfree(lock_file);
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118224476
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -764,7 +848,7 @@ shareinput_reader_waitready(int share_id, PlanGenerator 
planGen)
tval.tv_usec = 0;
 
n = select(pctxt->readyfd+1, (fd_set *) &rset, NULL, NULL, 
&tval);
-
+   
--- End diff --

remove the tailing blank characters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118220731
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkLockFileForWriter
+ * 
+ * Create a unique lock file for writer. Then use flock's unblock way to 
lock the lock file.
+ * We can make sure the lock file will be locked forerver until the writer 
process has quit.
--- End diff --

until the writer process quits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118222840
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -552,11 +551,59 @@ static void sisc_lockname(char* p, int size, int 
share_id, const char* name)
}
 }
 
+
+char *joint_lock_file_name(ShareInput_Lk_Context *lk_ctxt, char *name)
+{
+   char *lock_file = palloc0(sizeof(char) * MAXPGPATH);
+
+   if(strncmp("writer", name, 6) ==0 )
+   {
+   strncat(lock_file, lk_ctxt->lkname_ready, 
strlen(lk_ctxt->lkname_ready));
--- End diff --

The strncat usage is not correct. It should be
strncat(lock_file, lk_ctxt->lkname_ready, MAXPGPATH - 
strlen(lock_file) - 1);

From the manpage.
 strncat(char *restrict s1, const char *restrict s2, size_t n);
 The strncat() function appends not more than n characters from s2, and 
then adds a terminating `\0'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118224402
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -738,6 +818,10 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
if(pctxt->donefd < 0)
elog(ERROR, "could not open fifo \"%s\": %m", 
pctxt->lkname_done);
 
+   char *writer_lock_file = NULL; //current path for lock file.
--- End diff --

I think moving variable definition earlier in its scope would be better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118226007
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,72 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_lock_file, F_OK);   
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait lock file for writer time 
out interval is %d", timeout_interval);
+   if(timeout_interval >= 
share_input_scan_wait_lockfile_timeout || flag == true) //If lock file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_lock_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
+   }
+   else
+   {
+   elog(LOG, "writer lock file of 
shareinput_reader_waitready() is %s", writer_lock_file);
+   flag = true;
+   fd_lock = open(writer_lock_file, O_RDONLY);
+   if(fd_lock < 0)
+   {
+   elog(DEBUG3, "Open writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
+   }
+   lock = flock(fd_lock, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   {
--- End diff --

This is expected if I understand correctly so the log should be friendly 
(e.g. told users that this is expected).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118223887
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -666,6 +717,29 @@ static int retry_write(int fd, char *buf, int wsize)
return 0;
 }
 
+
+
+/*
+ * generate_lock_file_name
+ *
+ * Called by reader or writer to make the unique lock file name.
+ */
+void generate_lock_file_name(char* p, int size, int share_id, const char* 
name)
+{
+   if (strncmp(name , "writer", 6) == 0)
--- End diff --

My habit is:

strncmp(name , "writer", strlen("writer")) == 0

So you do not need to calculate the string length yourself (compiler does 
this for you).

Anyway, up to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118226308
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,72 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_lock_file, F_OK);   
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait lock file for writer time 
out interval is %d", timeout_interval);
+   if(timeout_interval >= 
share_input_scan_wait_lockfile_timeout || flag == true) //If lock file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_lock_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
+   }
+   else
+   {
+   elog(LOG, "writer lock file of 
shareinput_reader_waitready() is %s", writer_lock_file);
+   flag = true;
+   fd_lock = open(writer_lock_file, O_RDONLY);
+   if(fd_lock < 0)
+   {
+   elog(DEBUG3, "Open writer's lock file 
%s failed!, error number is %d", writer_lock_file, errno);
--- End diff --

why the code still continue if fd_lock < 0? Should not it fail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118220716
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkLockFileForWriter
+ * 
+ * Create a unique lock file for writer. Then use flock's unblock way to 
lock the lock file.
--- End diff --

use flock() to lock/unlock the lock file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118222360
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkLockFileForWriter
+ * 
+ * Create a unique lock file for writer. Then use flock's unblock way to 
lock the lock file.
+ * We can make sure the lock file will be locked forerver until the writer 
process has quit.
+ */
+void mkLockFileForWriter(int size, int share_id, char * name)
+{
+   char *lock_file;
+   int lock;
+
+   lock_file = (char *)palloc0(sizeof(char) * MAXPGPATH);
+   generate_lock_file_name(lock_file, size, share_id, name);
+   elog(DEBUG3, "The lock file for writer in SISC is %s", lock_file);
+   sisc_writer_lock_fd = open(lock_file, O_CREAT, S_IRWXU);
+   if(sisc_writer_lock_fd < 0)
+   {
+   elog(ERROR, "Could not create lock file %s for writer in SISC. 
The error number is %d", lock_file, errno);
+   }
+   lock = flock(sisc_writer_lock_fd, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   elog(DEBUG3, "Could not lock lock file  \"%s\" for writer in 
SISC : %m. The error number is %d", lock_file, errno);
--- End diff --

If having %m we have the err info.
   m  (Glibc extension.)  Print output of strerror(errno).  No 
argument is required.

Does this code work fine on mac? I think to avoid duplication we should use 
errno or strerror, instead of "%m".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-24 Thread huor
Github user huor commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r118204222
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -41,14 +41,16 @@
 #include "postgres.h"
--- End diff --

lock would be better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-21 Thread huor
Github user huor commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117661684
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -552,11 +551,59 @@ static void sisc_lockname(char* p, int size, int 
share_id, const char* name)
}
 }
 
+
+char * joint_tmp_file_name(ShareInput_Lk_Context *lk_ctxt, char *name)
+{
+   char *tmp_file = palloc(sizeof(char) * MAXPGPATH);
+
+   if(strncmp("writer", name, 7) ==0 )
--- End diff --

if(strncmp("writer", name, 6) ==0 )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-21 Thread huor
Github user huor commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117663096
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -666,6 +717,29 @@ static int retry_write(int fd, char *buf, int wsize)
return 0;
 }
 
+
+
+/*
+ * mk_tmp_file_name
+ *
+ * Called by reader or writer to make the unique tmp file name.
+ */
+void mk_tmp_file_name(char* p, int size, int share_id, const char* name)
--- End diff --

If the name of the temp file for writer follows the naming convention of 
its ready and done notification fifo, I think we can eliminate this function.

It is good enough to have a single function like joint_tmp_file_name which 
generate name for the temp file. This function can then be used at both the 
creation and drop stage of the temp file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-21 Thread huor
Github user huor commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117661994
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -552,11 +551,59 @@ static void sisc_lockname(char* p, int size, int 
share_id, const char* name)
}
 }
 
+
+char * joint_tmp_file_name(ShareInput_Lk_Context *lk_ctxt, char *name)
+{
+   char *tmp_file = palloc(sizeof(char) * MAXPGPATH);
+
+   if(strncmp("writer", name, 7) ==0 )
+   {
+   strncat(tmp_file, lk_ctxt->lkname_ready, 
strlen(lk_ctxt->lkname_ready)+1);
+   }
+   else
+   {
+   strncat(tmp_file, lk_ctxt->lkname_done, 
strlen(lk_ctxt->lkname_done)+1);
+   }
+   strncat(tmp_file, name, strlen(name)+1);
+   return tmp_file;
+}
+
+void drop_tmp_files(ShareInput_Lk_Context *lk_ctxt)
+{
+   char *writer_tmp_file = NULL;
+   char *reader_tmp_file = NULL;
+
+   writer_tmp_file = joint_tmp_file_name(lk_ctxt, "writer");
+   if(access(writer_tmp_file, F_OK) == 0)
+   {
+   elog(DEBUG3, "Drop writer's tmp files %s in SISC", 
writer_tmp_file);
+   unlink(writer_tmp_file);
+   }
+   else
+   {
+   elog(DEBUG3, "Writer's tmp files %s has been dropped already in 
SISC", writer_tmp_file);
+   }
+   pfree(writer_tmp_file);
+   reader_tmp_file = joint_tmp_file_name(lk_ctxt, "reader");   
--- End diff --

We don't bother to add the handling of reader here as the fix is writer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-21 Thread huor
Github user huor commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117663528
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +877,68 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_tmp_file, F_OK);
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "Wait tmp file for writer time out 
interval is %d", timeout_interval);
+   if(timeout_interval == 
share_input_scan_wait_tmpfile_timeout || flag == true) //If tmp file never 
exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_tmp_file);
+   break;
+   }
+   timeout_interval += tval.tv_sec;
+   }
+   else
+   {
+   elog(LOG, "writer tmp file of 
shareinput_reader_waitready() is %s", writer_tmp_file);
+   flag = true;
+   fd_tmp = open(writer_tmp_file, O_RDONLY);
+   if(fd_tmp < 0)
+   {
+   elog(DEBUG3, "Open writer's tmp file %s 
failed!, error number is %d", writer_tmp_file, errno);
+   }
+   lock = flock(fd_tmp, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   {
+   elog(DEBUG3, "Lock writer's tmp file %s 
failed!, error number is %d", writer_tmp_file, errno);
+   }
+   else if(lock == 0)
+   {
+   /*
+* There is one situation to consider 
about.
+* Writer need a time interval to lock 
the tmp file after the tmp file has been created.
+* So, if reader lock the tmp file 
ahead of writer, we should unlock it.
+* If reader lock the tmp file after 
writer, it means that writer process has abort.
+* We should break the loop to make 
sure reader no longer wait for writer.
+*/  
+   if(lock_count == 0)  
+   {
+   lock = flock(fd_tmp, LOCK_UN); 
+   lock_count++;
+   elog(DEBUG3, "Lock writer's tmp 
file %s first time successfully in SISC!", writer_tmp_file);
+   continue;
+   }
+   else
+   {
+   elog(LOG, "Lock writer's tmp 
file %s successfully in SISC!", writer_tmp_file);
+   close(fd_tmp);
--- End diff --

close here does not always succeed. Need to add check here:
```c
if (close(fd_tmp) < 0)
{
elog(ERROR, "Failed to close SISC temporary file due to 
strerror(errno)");
}
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-21 Thread huor
Github user huor commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117661841
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -552,11 +551,59 @@ static void sisc_lockname(char* p, int size, int 
share_id, const char* name)
}
 }
 
+
+char * joint_tmp_file_name(ShareInput_Lk_Context *lk_ctxt, char *name)
--- End diff --

generate_tmp_file_name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117423393
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +887,70 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_tmp_file, F_OK);
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "time out count is %d", 
timeout_count);
+   timeout_count--;
+   if(timeout_count == 0 || flag == true) //If tmp 
file never exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_tmp_file);
+   break;
+   }
+   }
+   else
+   {
+   elog(LOG, "writer tmp file of 
shareinput_reader_waitready() is %s", writer_tmp_file);
+   flag = true;
+   fd_tmp = open(writer_tmp_file, O_RDONLY);
+   if(fd_tmp < 0)
+   {
+   elog(DEBUG3, "Open writer's tmp file %s 
failed!, error number is %d", writer_tmp_file, errno);
+   }
+   lock = flock(fd_tmp, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   {
+   elog(DEBUG3, "Lock writer's tmp file %s 
failed!, error number is %d", writer_tmp_file, errno);
+   }
+   else if(lock == 0)
+   {
+   /*
+* There is one situation to consider 
about.
+* Writer need a time interval to lock 
the tmp file after the tmp file has been created.
+* So, if reader lock the tmp file 
ahead of writer, we should unlock it.
+* If reader lock the tmp file after 
writer, it means that writer process has abort.
+* We should break the loop to make 
sure reader no longer wait for writer.
+*/  
+   if(lock_count == 0)  
+   {
+   lock = flock(fd_tmp, LOCK_UN); 
+   lock_count++;
+   elog(DEBUG3, "Lock writer's tmp 
file %s first time successfully!", writer_tmp_file);
+   continue;
+   }
+   else
+   {
+   elog(LOG, "Lock writer's tmp 
file %s successfully!", writer_tmp_file);
+   close(fd_tmp);
+   pfree(writer_tmp_file);
+   break; 
+   }
+   
--- End diff --

No need of blank lines here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117423311
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -552,11 +551,61 @@ static void sisc_lockname(char* p, int size, int 
share_id, const char* name)
}
 }
 
+
+char * joint_tmp_file_name(ShareInput_Lk_Context *lk_ctxt, char *name)
+{
+   char *tmp_file = palloc(sizeof(char) * MAXPGPATH);
+
+   strcat(tmp_file, getCurrentTempFilePath);
+   strcat(tmp_file, "/");
+   strcat(tmp_file, PG_TEMP_FILES_DIR);
+   strcat(tmp_file, "/");
+   if(strncmp("writer", name, 7) ==0 )
+   {
+   strcat(tmp_file, lk_ctxt->lkname_ready);
+   }
+   else
+   {
+   strcat(tmp_file, lk_ctxt->lkname_done);
+   }
+   strcat(tmp_file, name);
+   return tmp_file;
+}
+
+void drop_tmp_files(ShareInput_Lk_Context *lk_ctxt)
+{
--- End diff --

Need to call unlink() to drop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117419625
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -552,11 +551,61 @@ static void sisc_lockname(char* p, int size, int 
share_id, const char* name)
}
 }
 
+
+char * joint_tmp_file_name(ShareInput_Lk_Context *lk_ctxt, char *name)
+{
+   char *tmp_file = palloc(sizeof(char) * MAXPGPATH);
+
+   strcat(tmp_file, getCurrentTempFilePath);
+   strcat(tmp_file, "/");
+   strcat(tmp_file, PG_TEMP_FILES_DIR);
+   strcat(tmp_file, "/");
--- End diff --

Please either calculate the size of tmp_file before calling strcat(),  or 
call strncat().
For all strcat() cases, please also modify.

Besides, You do not have the below strings as one parameter: "/" 
PG_TEMP_FILES_DIR "/"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117423486
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +887,70 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_tmp_file, F_OK);
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "time out count is %d", 
timeout_count);
+   timeout_count--;
+   if(timeout_count == 0 || flag == true) //If tmp 
file never exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_tmp_file);
+   break;
+   }
+   }
+   else
+   {
+   elog(LOG, "writer tmp file of 
shareinput_reader_waitready() is %s", writer_tmp_file);
+   flag = true;
+   fd_tmp = open(writer_tmp_file, O_RDONLY);
+   if(fd_tmp < 0)
+   {
+   elog(DEBUG3, "Open writer's tmp file %s 
failed!, error number is %d", writer_tmp_file, errno);
--- End diff --

This seems to be expected result, right? If yes, we do not need code here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117416229
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
--- End diff --

Again, for function that is not ready for other files' use, please add 
"static". Please also double-check other functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117423597
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -793,15 +887,70 @@ shareinput_reader_waitready(int share_id, 
PlanGenerator planGen)
}
else if(n==0)
{
-   elog(DEBUG1, "SISC READER (shareid=%d, slice=%d): Wait 
ready time out once",
-   share_id, currentSliceId);
+   file_exists = access(writer_tmp_file, F_OK);
+   if(file_exists != 0)
+   {
+   elog(DEBUG3, "time out count is %d", 
timeout_count);
+   timeout_count--;
+   if(timeout_count == 0 || flag == true) //If tmp 
file never exists or disappeared, reader will no longer waiting for writer
+   {
+   elog(LOG, "SISC READER (shareid=%d, 
slice=%d): Wait ready time out and break",
+   share_id, currentSliceId);
+   pfree(writer_tmp_file);
+   break;
+   }
+   }
+   else
+   {
+   elog(LOG, "writer tmp file of 
shareinput_reader_waitready() is %s", writer_tmp_file);
+   flag = true;
+   fd_tmp = open(writer_tmp_file, O_RDONLY);
+   if(fd_tmp < 0)
+   {
+   elog(DEBUG3, "Open writer's tmp file %s 
failed!, error number is %d", writer_tmp_file, errno);
+   }
+   lock = flock(fd_tmp, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   {
+   elog(DEBUG3, "Lock writer's tmp file %s 
failed!, error number is %d", writer_tmp_file, errno);
+   }
+   else if(lock == 0)
+   {
+   /*
+* There is one situation to consider 
about.
+* Writer need a time interval to lock 
the tmp file after the tmp file has been created.
+* So, if reader lock the tmp file 
ahead of writer, we should unlock it.
+* If reader lock the tmp file after 
writer, it means that writer process has abort.
+* We should break the loop to make 
sure reader no longer wait for writer.
--- End diff --

Maybe we should add code here to tell that the tmp file will be dropped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117423739
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -41,14 +41,16 @@
 #include "postgres.h"
--- End diff --

I've been thinking that we should use a more meaningful name for all "tmp" 
or "temp"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117420258
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -552,11 +551,61 @@ static void sisc_lockname(char* p, int size, int 
share_id, const char* name)
}
 }
 
+
+char * joint_tmp_file_name(ShareInput_Lk_Context *lk_ctxt, char *name)
--- End diff --

Do we need to have the function names more meaningful? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117420455
  
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -666,6 +719,36 @@ static int retry_write(int fd, char *buf, int wsize)
return 0;
 }
 
+
+
+/*
+ * mk_tmp_file_name
+ *
+ * Called by reader or writer to make the unique tmp file name.
+ */
+void mk_tmp_file_name(char* p, int size, int share_id, const char* name)
+{
+time_t t;
+int now, random_num;
+t = time(NULL);
+now = time(&t);
+   srand(time(NULL));
+random_num = rand();
--- End diff --

It seems that there is no need of random so far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117418080
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkTempFileForWriter
+ * 
+ * Create a unique tmp file for writer. Then use flock's unblock way to 
lock the tmp file.
+ * We can make sure the tmp file will be locked forerver until the writer 
process has quit.
+ */
+void mkTempFileForWriter(int size, int share_id, char * name)
+{
+   char *tmp_file;
+   int lock;
+
+   tmp_file = (char *)palloc(sizeof(char) * MAXPGPATH);
+   mk_tmp_file_name(tmp_file, size, share_id, name);
+   elog(DEBUG3, "tmp file for writer is %s", tmp_file);
--- End diff --

I think "shareinputscan" should be added in the log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117418664
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkTempFileForWriter
+ * 
+ * Create a unique tmp file for writer. Then use flock's unblock way to 
lock the tmp file.
+ * We can make sure the tmp file will be locked forerver until the writer 
process has quit.
+ */
+void mkTempFileForWriter(int size, int share_id, char * name)
+{
+   char *tmp_file;
+   int lock;
+
+   tmp_file = (char *)palloc(sizeof(char) * MAXPGPATH);
+   mk_tmp_file_name(tmp_file, size, share_id, name);
+   elog(DEBUG3, "tmp file for writer is %s", tmp_file);
+   fd_tmp_writer = open(tmp_file, O_CREAT, S_IRWXU);
--- End diff --

I do not think we need to provide +wx permission.

I think if the file exists we should quit, right? If yes, we need to add 
O_EXCL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117418935
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -759,3 +764,30 @@ ExecEagerFreeMaterial(MaterialState *node)
}
 }
 
+
+/*
+ * mkTempFileForWriter
+ * 
+ * Create a unique tmp file for writer. Then use flock's unblock way to 
lock the tmp file.
+ * We can make sure the tmp file will be locked forerver until the writer 
process has quit.
+ */
+void mkTempFileForWriter(int size, int share_id, char * name)
+{
+   char *tmp_file;
+   int lock;
+
+   tmp_file = (char *)palloc(sizeof(char) * MAXPGPATH);
+   mk_tmp_file_name(tmp_file, size, share_id, name);
+   elog(DEBUG3, "tmp file for writer is %s", tmp_file);
+   fd_tmp_writer = open(tmp_file, O_CREAT, S_IRWXU);
+   if(fd_tmp_writer < 0)
+   {
+   elog(ERROR, "could not create tmp file %s for writer", 
tmp_file);
+   }
+   lock = flock(fd_tmp_writer, LOCK_EX | LOCK_NB);
+   if(lock == -1)
+   elog(DEBUG3, "could not lock tmp file  \"%s\": %m", tmp_file);
+   else if(lock == 0)
+   elog(DEBUG3, "successfully locked tmp file  \"%s\": %m", 
tmp_file);
+   pfree(tmp_file);
--- End diff --

Please provide more detailed info in log (e.g. this is for shareinputscan, 
fail reason, e.g. strerror(errno)) for debug purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1243: HAWQ-1458. Fix share input scan bug for w...

2017-05-19 Thread paul-guo-
Github user paul-guo- commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1243#discussion_r117413480
  
--- Diff: src/backend/executor/nodeMaterial.c ---
@@ -41,14 +41,16 @@
 #include "postgres.h"
 
 #include "executor/executor.h"
-#include "executor/nodeMaterial.h"
 #include "executor/instrument.h"/* Instrumentation */
 #include "utils/tuplestorenew.h"
-
+#include "executor/nodeMaterial.h"
 #include "miscadmin.h"
 
 #include "cdb/cdbvars.h"
+#include "postmaster/primary_mirror_mode.h"
 
+
+int fd_tmp_writer = -1;
--- End diff --

1. Use "static" if it is used in the file only. 
2. I'd suggest you should a more meaningful name for this variable and add 
comment here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---