Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/02/2012 01:03 PM, Tom Lane wrote:
 When I said list, I meant a List *.  No fixed size.

 Ok, like this?

I think this could use a bit of editorialization (I don't think the
stripe terminology is still applicable, in particular), but the
general idea seems OK.

Does anyone feel that it's a bad idea that list entries are never
reclaimed?  In the worst case a transient load peak could result in
a long list that permanently adds search overhead.  Not sure if it's
worth the extra complexity to delete a list cell that's no longer
needed, rather than leaving it present and empty.

 Do we consider this a bug fix, to be backpatched?

Yes, definitely.

I think I'd like to have a go at coding it the other way (with
release of list entries), just to see if that comes out cleaner
or uglier than this way.  If you don't mind I'll pick this up
and commit whichever way turns out better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Andrew Dunstan



On 04/04/2012 12:13 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 04/02/2012 01:03 PM, Tom Lane wrote:

When I said list, I meant a List *.  No fixed size.

Ok, like this?

I think this could use a bit of editorialization (I don't think the
stripe terminology is still applicable, in particular), but the
general idea seems OK.

Does anyone feel that it's a bad idea that list entries are never
reclaimed?  In the worst case a transient load peak could result in
a long list that permanently adds search overhead.  Not sure if it's
worth the extra complexity to delete a list cell that's no longer
needed, rather than leaving it present and empty.


Me either. The logic could possibly be something simple when we free a 
node like while the list tail is an available node prune the tail. But 
as you say, it might not be worth it.





Do we consider this a bug fix, to be backpatched?

Yes, definitely.

I think I'd like to have a go at coding it the other way (with
release of list entries), just to see if that comes out cleaner
or uglier than this way.  If you don't mind I'll pick this up
and commit whichever way turns out better.





Go for it.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/04/2012 12:13 PM, Tom Lane wrote:
 Does anyone feel that it's a bad idea that list entries are never
 reclaimed?  In the worst case a transient load peak could result in
 a long list that permanently adds search overhead.  Not sure if it's
 worth the extra complexity to delete a list cell that's no longer
 needed, rather than leaving it present and empty.

 Me either. The logic could possibly be something simple when we free a 
 node like while the list tail is an available node prune the tail. But 
 as you say, it might not be worth it.

The idea I had in mind was to compensate for adding list-removal logic
by getting rid of the concept of an unused entry.  If the removal is
conditional then you can't do that and you end up with the complications
of both methods.  Anyway I've not tried to code it yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Tom Lane
I wrote:
 The idea I had in mind was to compensate for adding list-removal logic
 by getting rid of the concept of an unused entry.  If the removal is
 conditional then you can't do that and you end up with the complications
 of both methods.  Anyway I've not tried to code it yet.

I concluded this would probably be a loser performance-wise, because it
would add a couple of palloc/pfree cycles to the processing of each
multi-chunk message, whether there was any contention or not.  So I
committed the patch with just some cosmetic cleanups.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-04 Thread Andrew Dunstan



On 04/04/2012 03:09 PM, Tom Lane wrote:

I wrote:

The idea I had in mind was to compensate for adding list-removal logic
by getting rid of the concept of an unused entry.  If the removal is
conditional then you can't do that and you end up with the complications
of both methods.  Anyway I've not tried to code it yet.

I concluded this would probably be a loser performance-wise, because it
would add a couple of palloc/pfree cycles to the processing of each
multi-chunk message, whether there was any contention or not.  So I
committed the patch with just some cosmetic cleanups.




OK, thanks for doing this.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-03 Thread Andrew Dunstan



On 04/02/2012 01:03 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 04/02/2012 12:44 PM, Tom Lane wrote:

You could do something like having a list of pending chunks for each
value of (pid mod 256).  The length of each such list ought to be plenty
short under ordinary circumstances.

Yeah, ok, that should work. How big would we make each list to start
with? Still 20, or smaller?

When I said list, I meant a List *.  No fixed size.




Ok, like this? Do we consider this a bug fix, to be backpatched?

cheers

andrew


diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index a603ab4..2e4d437 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -35,6 +35,7 @@
 #include libpq/pqsignal.h
 #include miscadmin.h
 #include pgtime.h
+#include nodes/pg_list.h
 #include postmaster/fork_process.h
 #include postmaster/postmaster.h
 #include postmaster/syslogger.h
@@ -93,10 +94,11 @@ static char *last_file_name = NULL;
 static char *last_csv_file_name = NULL;
 
 /*
- * Buffers for saving partial messages from different backends. We don't expect
- * that there will be very many outstanding at one time, so 20 seems plenty of
- * leeway. If this array gets full we won't lose messages, but we will lose
- * the protocol protection against them being partially written or interleaved.
+ * Buffers for saving partial messages from different backends.
+ *
+ * Keep 256 lists of these, the list for a given pid being pid mod 256,
+ * so that for any message we don't have to search a long list looking
+ * for a match and/or an open slot.
  *
  * An inactive buffer has pid == 0 and undefined contents of data.
  */
@@ -106,8 +108,8 @@ typedef struct
 	StringInfoData data;		/* accumulated data, as a StringInfo */
 } save_buffer;
 
-#define CHUNK_SLOTS 20
-static save_buffer saved_chunks[CHUNK_SLOTS];
+#define CHUNK_STRIPES 256
+static List *buffer_stripes[CHUNK_STRIPES];
 
 /* These must be exported for EXEC_BACKEND case ... annoying */
 #ifndef WIN32
@@ -725,6 +727,7 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
 	{
 		PipeProtoHeader p;
 		int			chunklen;
+		List *buffer_stripe;
 
 		/* Do we have a valid header? */
 		memcpy(p, cursor, sizeof(PipeProtoHeader));
@@ -743,53 +746,55 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
 			dest = (p.is_last == 'T' || p.is_last == 'F') ?
 LOG_DESTINATION_CSVLOG : LOG_DESTINATION_STDERR;
 
+			buffer_stripe = buffer_stripes[p.pid % CHUNK_STRIPES];
+
 			if (p.is_last == 'f' || p.is_last == 'F')
 			{
 /*
- * Save a complete non-final chunk in the per-pid buffer if
- * possible - if not just write it out.
+ * Save a complete non-final chunk in the per-pid buffer
  */
-int			free_slot = -1,
-			existing_slot = -1;
-int			i;
+ListCell*cell;
+save_buffer *existing_slot = NULL, 
+	*free_slot = NULL;
 StringInfo	str;
 
-for (i = 0; i  CHUNK_SLOTS; i++)
+foreach (cell, buffer_stripe)
 {
-	if (saved_chunks[i].pid == p.pid)
+	save_buffer *buffnode = cell-data.ptr_value;
+	if (buffnode-pid == p.pid)
 	{
-		existing_slot = i;
+		existing_slot = buffnode;
 		break;
 	}
-	if (free_slot  0  saved_chunks[i].pid == 0)
-		free_slot = i;
+	if (free_slot == NULL  buffnode-pid == 0)
+		free_slot = buffnode;
 }
-if (existing_slot = 0)
+if (existing_slot != NULL)
 {
-	str = (saved_chunks[existing_slot].data);
+	str = (existing_slot-data);
 	appendBinaryStringInfo(str,
 		   cursor + PIPE_HEADER_SIZE,
 		   p.len);
 }
-else if (free_slot = 0)
+else 
 {
-	saved_chunks[free_slot].pid = p.pid;
-	str = (saved_chunks[free_slot].data);
+	if (free_slot == NULL)
+	{
+		/* 
+		 * Need a free slot, but there isn't one in the list,
+		 * so create a new one and extend the list with it.
+		 */
+		free_slot = palloc(sizeof(save_buffer));
+		buffer_stripe = lappend(buffer_stripe, free_slot);
+		buffer_stripes[p.pid % CHUNK_STRIPES] = buffer_stripe;
+	}
+	free_slot-pid = p.pid;
+	str = (free_slot-data);
 	initStringInfo(str);
 	appendBinaryStringInfo(str,
 		   cursor + PIPE_HEADER_SIZE,
 		   p.len);
 }
-else
-{
-	/*
-	 * If there is no free slot we'll just have to take our
-	 * chances and write out a partial message and hope that
-	 * it's not followed by something from another pid.
-	 */
-	write_syslogger_file(cursor + PIPE_HEADER_SIZE, p.len,
-		 dest);
-}
 			}
 			else
 			{
@@ -797,26 +802,28 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
  * Final chunk --- add it to anything saved for that pid, and
  * either way write the whole thing out.
  */
-int			existing_slot = -1;
-int			i;
+ListCell *cell;
+save_buffer 

Re: [HACKERS] log chunking broken with large queries under load

2012-04-02 Thread Andrew Dunstan



On 04/01/2012 06:34 PM, Andrew Dunstan wrote:
Some of my PostgreSQL Experts colleagues have been complaining to me 
that servers under load with very large queries cause CSV log files 
that are corrupted, because lines are apparently multiplexed. The log 
chunking protocol between the errlog routines and the syslogger is 
supposed to prevent that, so I did a little work to try to reproduce 
it in a controlled way.



Well, a little further digging jogged my memory a bit. It looks like we 
underestimated the amount of messages we might get as more than one 
chunk fairly badly.


We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead 
to stripe the slots with a two dimensional array, so we didn't have to 
search a larger number of slots for any given message. See the attached 
patch.


I'm not sure how much we want to scale this up. I set CHUNK_STRIPES to 
20 to start with, and I've asked some colleagues with very heavy log 
loads with very large queries to test it out if possible. If anyone else 
has a similar load I'd appreciate similar testing.


cheers

andrew
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***
*** 93,101  static char *last_file_name = NULL;
  static char *last_csv_file_name = NULL;
  
  /*
!  * Buffers for saving partial messages from different backends. We don't expect
!  * that there will be very many outstanding at one time, so 20 seems plenty of
!  * leeway. If this array gets full we won't lose messages, but we will lose
   * the protocol protection against them being partially written or interleaved.
   *
   * An inactive buffer has pid == 0 and undefined contents of data.
--- 93,108 
  static char *last_csv_file_name = NULL;
  
  /*
!  * Buffers for saving partial messages from different backends.
!  *
!  * Under heavy load we can get quite a few of them, so we stripe them across
!  * an array based on the mod of the pid, which seem an adequate hash function
!  * for pids. We do this rather then just keeping a one-dimensional array so
!  * we don't have to probe too many slots for any given pid.
!  * 20 stripes of 20 slots each seems likely to be adequate leeway, but if there
!  * are lots of overruns we might need to increase CHUNK_STRIPES a bit, or look
!  * at some alternative scheme.
!  * If a stripe gets full we won't lose messages, but we will lose
   * the protocol protection against them being partially written or interleaved.
   *
   * An inactive buffer has pid == 0 and undefined contents of data.
***
*** 107,113  typedef struct
  } save_buffer;
  
  #define CHUNK_SLOTS 20
! static save_buffer saved_chunks[CHUNK_SLOTS];
  
  /* These must be exported for EXEC_BACKEND case ... annoying */
  #ifndef WIN32
--- 114,121 
  } save_buffer;
  
  #define CHUNK_SLOTS 20
! #define CHUNK_STRIPES 20
! static save_buffer saved_chunks[CHUNK_STRIPES][CHUNK_SLOTS];
  
  /* These must be exported for EXEC_BACKEND case ... annoying */
  #ifndef WIN32
***
*** 725,730  process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
--- 733,739 
  	{
  		PipeProtoHeader p;
  		int			chunklen;
+ 		int stripe;
  
  		/* Do we have a valid header? */
  		memcpy(p, cursor, sizeof(PipeProtoHeader));
***
*** 743,748  process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
--- 752,759 
  			dest = (p.is_last == 'T' || p.is_last == 'F') ?
  LOG_DESTINATION_CSVLOG : LOG_DESTINATION_STDERR;
  
+ 			stripe = p.pid % CHUNK_STRIPES;
+ 
  			if (p.is_last == 'f' || p.is_last == 'F')
  			{
  /*
***
*** 756,780  process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
  
  for (i = 0; i  CHUNK_SLOTS; i++)
  {
! 	if (saved_chunks[i].pid == p.pid)
  	{
  		existing_slot = i;
  		break;
  	}
! 	if (free_slot  0  saved_chunks[i].pid == 0)
  		free_slot = i;
  }
  if (existing_slot = 0)
  {
! 	str = (saved_chunks[existing_slot].data);
  	appendBinaryStringInfo(str,
  		   cursor + PIPE_HEADER_SIZE,
  		   p.len);
  }
  else if (free_slot = 0)
  {
! 	saved_chunks[free_slot].pid = p.pid;
! 	str = (saved_chunks[free_slot].data);
  	initStringInfo(str);
  	appendBinaryStringInfo(str,
  		   cursor + PIPE_HEADER_SIZE,
--- 767,791 
  
  for (i = 0; i  CHUNK_SLOTS; i++)
  {
! 	if (saved_chunks[stripe][i].pid == p.pid)
  	{
  		existing_slot = i;
  		break;
  	}
! 	if (free_slot  0  saved_chunks[stripe][i].pid == 0)
  		free_slot = i;
  }
  if (existing_slot = 0)
  {
! 	str = (saved_chunks[stripe][existing_slot].data);
  	appendBinaryStringInfo(str,
  		   cursor + PIPE_HEADER_SIZE,
  		   p.len);
  }
  else if (free_slot = 0)
  {
! 	saved_chunks[stripe][free_slot].pid = p.pid;
! 	str = (saved_chunks[stripe][free_slot].data);
  	

Re: [HACKERS] log chunking broken with large queries under load

2012-04-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/01/2012 06:34 PM, Andrew Dunstan wrote:
 Some of my PostgreSQL Experts colleagues have been complaining to me 
 that servers under load with very large queries cause CSV log files 
 that are corrupted,

 We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead 
 to stripe the slots with a two dimensional array, so we didn't have to 
 search a larger number of slots for any given message. See the attached 
 patch.

This seems like it isn't actually fixing the problem, only pushing out
the onset of trouble a bit.  Should we not replace the fixed-size array
with a dynamic data structure?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-02 Thread Andrew Dunstan



On 04/02/2012 12:00 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 04/01/2012 06:34 PM, Andrew Dunstan wrote:

Some of my PostgreSQL Experts colleagues have been complaining to me
that servers under load with very large queries cause CSV log files
that are corrupted,

We could just increase CHUNK_SLOTS in syslogger.c, but I opted instead
to stripe the slots with a two dimensional array, so we didn't have to
search a larger number of slots for any given message. See the attached
patch.

This seems like it isn't actually fixing the problem, only pushing out
the onset of trouble a bit.  Should we not replace the fixed-size array
with a dynamic data structure?





A bit = 10 to 20 times - more if we set CHUNK_STRIPES higher. :-)

But maybe your're right. If we do that and stick with my two-dimensional 
scheme to keep the number of probes per chunk down, we'd need to reorg 
the array every time we increased it. That might be a bit messy, but 
might be ok. Or maybe linearly searching an array of several hundred 
slots for our pid for every log chunk that comes in would be fast enough.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/02/2012 12:00 PM, Tom Lane wrote:
 This seems like it isn't actually fixing the problem, only pushing out
 the onset of trouble a bit.  Should we not replace the fixed-size array
 with a dynamic data structure?

 But maybe your're right. If we do that and stick with my two-dimensional 
 scheme to keep the number of probes per chunk down, we'd need to reorg 
 the array every time we increased it. That might be a bit messy, but 
 might be ok. Or maybe linearly searching an array of several hundred 
 slots for our pid for every log chunk that comes in would be fast enough.

You could do something like having a list of pending chunks for each
value of (pid mod 256).  The length of each such list ought to be plenty
short under ordinary circumstances.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-02 Thread Andrew Dunstan



On 04/02/2012 12:44 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 04/02/2012 12:00 PM, Tom Lane wrote:

This seems like it isn't actually fixing the problem, only pushing out
the onset of trouble a bit.  Should we not replace the fixed-size array
with a dynamic data structure?

But maybe your're right. If we do that and stick with my two-dimensional
scheme to keep the number of probes per chunk down, we'd need to reorg
the array every time we increased it. That might be a bit messy, but
might be ok. Or maybe linearly searching an array of several hundred
slots for our pid for every log chunk that comes in would be fast enough.

You could do something like having a list of pending chunks for each
value of (pid mod 256).  The length of each such list ought to be plenty
short under ordinary circumstances.





Yeah, ok, that should work. How big would we make each list to start 
with? Still 20, or smaller?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log chunking broken with large queries under load

2012-04-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/02/2012 12:44 PM, Tom Lane wrote:
 You could do something like having a list of pending chunks for each
 value of (pid mod 256).  The length of each such list ought to be plenty
 short under ordinary circumstances.

 Yeah, ok, that should work. How big would we make each list to start 
 with? Still 20, or smaller?

When I said list, I meant a List *.  No fixed size.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] log chunking broken with large queries under load

2012-04-01 Thread Andrew Dunstan
Some of my PostgreSQL Experts colleagues have been complaining to me 
that servers under load with very large queries cause CSV log files that 
are corrupted, because lines are apparently multiplexed. The log 
chunking protocol between the errlog routines and the syslogger is 
supposed to prevent that, so I did a little work to try to reproduce it 
in a controlled way. On my dual quad xeon setup, this script:


   #!/bin/sh
   par=$1
   seq=$2

   sed 2000q /usr/share/dict/words  words

   psql -q -c 'drop table if exists foo'
   psql -q -c 'create table foo (t text)'

   echo '\set words `cat words`'  wordsin.sql
   echo 'prepare fooplan (text) as insert into foo values ($1);' 
   wordsin.sql

   for i in `seq 1 $seq`; do
  echo execute fooplan(:'words');  wordsin.sql
   done

   for i in `seq 1 $par`; do
  psql -q -t -f wordsin.sql 
   done
   wait

called with parameters of 100 and 50 (i.e. 100 simultaneous clients each 
doing 50 very large inserts) is enough to cause CSV log corruption quite 
reliably on PostgreSQL 9.1.


This is a serious bug. I'm going to investigate, but it's causing major 
pain, so anyone else who has any ideas is welcome to chime in.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers