Re: [HACKERS] Standalone synchronous master

2012-08-25 Thread Bruce Momjian
On Tue, Jan  3, 2012 at 09:22:22PM -0500, Robert Haas wrote:
> On Tue, Dec 27, 2011 at 6:39 AM, Alexander Björnhagen
>  wrote:
> > And so we get back to the three likelihoods in our two-node setup :
> >
> > 1.The master fails
> >  - Okay, promote the standby
> >
> > 2.The standby fails
> >  - Okay, the system still works but you no longer have data
> > redundancy. Deal with it.
> >
> > 3.Both fail, together or one after the other.
> 
> It seems to me that if you are happy with #2, you don't really need to
> enable sync rep in the first place.
> 
> At any rate, even without multiple component failures, this
> configuration makes it pretty easy to lose durability (which is the
> only point of having sync rep in the first place).  Suppose the NIC
> card on the master is the failing component.  If it happens to drop
> the TCP connection to the clients just before it drops the connection
> to the standby, the standby will have all the transactions, and you
> can fail over just fine.  If it happens to drop the TCP connection to
> the just before it drops the connection to the clients, the standby
> will not have all the transactions, and failover will lose some
> transactions - and presumably you enabled this feature in the first
> place precisely to prevent that sort of occurrence.
> 
> I do think that it might be useful to have this if there's a
> configurable timeout involved - that way, people could say, well, I'm
> OK with maybe losing transactions if the standby has been gone for X
> seconds.  But if the only possible behavior is equivalent to a
> zero-second timeout I don't think it's too useful.  It's basically
> just going to lead people to believe that their data is more secure
> than it really is, which IMHO is not helpful.

Added to TODO:

Add a new "eager" synchronous mode that starts out synchronous but
reverts to asynchronous after a failure timeout period

This would require some type of command to be executed to alert
administrators of this change.

http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php


-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication timeout units

2012-08-25 Thread Bruce Momjian
On Tue, Dec 20, 2011 at 11:34:25AM -0600, Kevin Grittner wrote:
> Peter Geoghegan  wrote:
> > Magnus Hagander  wrote:
> >> from postgresql.conf.sample:
> >>
> >> #replication_timeout = 60s  # in milliseconds; 0 disables
> >>
> >> Seconds or milliseconds? I would suggest we just remove the "in
> >> milliseconds", and instead say "timeout for replication
> >> connections; 0 disables".
> > 
> > +1 from me. That's very confusing.
>  
> Isn't it providing information on both the granularity and the
> default unit if none is specified?  Why is it more confusing here
> than statement_timeout or any of the other places this pattern is
> followed?
>  
> -1 from me on removing it *only* here.

FYI, I looked into this and can't see a way to improve it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql output locations

2012-08-25 Thread Bruce Momjian
On Fri, Aug 17, 2012 at 12:28:58PM -0400, Bruce Momjian wrote:
> On Fri, Aug 17, 2012 at 12:22:38PM -0400, Alvaro Herrera wrote:
> > Excerpts from Bruce Momjian's message of vie ago 17 11:17:58 -0400 2012:
> > > On Wed, Dec 14, 2011 at 10:57:25AM -0500, Robert Haas wrote:
> > > > On Wed, Dec 14, 2011 at 4:45 AM, Magnus Hagander  
> > > > wrote:
> > > > >>> * There are a number of things that are always written to stdout, 
> > > > >>> that
> > > > >>> there is no way to redirect. In some cases it's interactive prompts 
> > > > >>> -
> > > > >>> makes sense - but also for example the output of \timing goes to
> > > > >>> stdout always. Is there some specific logic behind what/when this
> > > > >>> should be done?
> > > > >>
> > > > >> Everything that is not an error goes to stdout, no?  Except the query
> > > > >> output, if you change it.
> > > > >>
> > > > >> Maybe the way to do what you want is to invent a new setting that
> > > > >> temporarily changes stdout.
> > > > >
> > > > > Yeah, that might be it. Or I need separate settings for "put errors in
> > > > > the query output stream" and "put non-query-output-but-also-non-errors
> > > > > in the query output stream". The effect would be the same, I guess...
> > > > 
> > > > That seems an awful lot harder (and messier) than just changing the
> > > > all the call sites to use the same error-reporting function.
> > > 
> > > I have done as you suggested with the attached patch.
> > 
> > The very first hunk in your patch changes code that seems to be
> > explicitely checking the "interactive" flag.  Is the change really
> > wanted there?  Note Magnus explicitely commented about those in his
> > original post.
> 
> I noticed that but the output would be the same because there is no
> input file location to trigger.  I thought the interactive flag was
> there just to provide more customized text.

Applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timing overhead and Linux clock sources

2012-08-25 Thread Bruce Momjian
On Mon, Aug 20, 2012 at 03:11:51PM -0400, Robert Haas wrote:
> On Thu, Aug 16, 2012 at 10:28 PM, Bruce Momjian  wrote:
> > FYI, I am planning to go ahead and package this tool in /contrib for PG
> > 9.3.
> 
> Isn't this exactly what we already did, in 9.2, in the form of
> contrib/pg_test_timing?

Sorry, not sure how I missed that commit.  Anyway, I am attaching a
patch for 9.3 that I think improves the output of the tool, plus adds
some C comments.

The new output has the lowest duration times first:

Testing timing overhead for 3 seconds.
Per loop time including overhead: 41.31 nsec
Histogram of timing durations:
< usec   % of total  count
 1 95.87135   69627856
 2  4.127592997719
 4  0.00086628
 8  0.00018133
16  0.1  5
32  0.0  1

This should make the output clearer to eyeball for problems --- a good
timing has a high percentage on the first line, rather than on the last
line.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_test_timing/pg_test_timing.c b/contrib/pg_test_timing/pg_test_timing.c
new file mode 100644
index b3f98ab..8d79c7b
*** a/contrib/pg_test_timing/pg_test_timing.c
--- b/contrib/pg_test_timing/pg_test_timing.c
*** static const char *progname;
*** 14,29 
  static int32 test_duration = 3;
  
  static void handle_args(int argc, char *argv[]);
! static void test_timing(int32);
  
  int
  main(int argc, char *argv[])
  {
  	progname = get_progname(argv[0]);
  
  	handle_args(argc, argv);
  
! 	test_timing(test_duration);
  
  	return 0;
  }
--- 14,37 
  static int32 test_duration = 3;
  
  static void handle_args(int argc, char *argv[]);
! static uint64 test_timing(int32);
! static void output(uint64 loop_count);
! 
! /* record duration in powers of 2 microseconds */
! int64 histogram[32];
  
  int
  main(int argc, char *argv[])
  {
+ 	uint64		loop_count;
+ 
  	progname = get_progname(argv[0]);
  
  	handle_args(argc, argv);
  
! 	loop_count = test_timing(test_duration);
! 
! 	output(loop_count);
  
  	return 0;
  }
*** handle_args(int argc, char *argv[])
*** 95,119 
  	}
  }
  
! static void
  test_timing(int32 duration)
  {
  	uint64		total_time;
  	int64		time_elapsed = 0;
  	uint64		loop_count = 0;
! 	uint64		prev,
! cur;
! 	int32		diff,
! i,
! bits,
! found;
! 
! 	instr_time	start_time,
! end_time,
! temp;
! 
! 	static int64 histogram[32];
! 	char		buf[100];
  
  	total_time = duration > 0 ? duration * 100 : 0;
  
--- 103,116 
  	}
  }
  
! static uint64
  test_timing(int32 duration)
  {
  	uint64		total_time;
  	int64		time_elapsed = 0;
  	uint64		loop_count = 0;
! 	uint64		prev, cur;
! 	instr_time	start_time, end_time, temp;
  
  	total_time = duration > 0 ? duration * 100 : 0;
  
*** test_timing(int32 duration)
*** 122,132 
--- 119,132 
  
  	while (time_elapsed < total_time)
  	{
+ 		int32		diff, bits = 0;
+ 
  		prev = cur;
  		INSTR_TIME_SET_CURRENT(temp);
  		cur = INSTR_TIME_GET_MICROSEC(temp);
  		diff = cur - prev;
  
+ 		/* Did time go backwards? */
  		if (diff < 0)
  		{
  			printf("Detected clock going backwards in time.\n");
*** test_timing(int32 duration)
*** 134,145 
  			exit(1);
  		}
  
! 		bits = 0;
  		while (diff)
  		{
  			diff >>= 1;
  			bits++;
  		}
  		histogram[bits]++;
  
  		loop_count++;
--- 134,147 
  			exit(1);
  		}
  
! 		/* What is the highest bit in the time diff? */
  		while (diff)
  		{
  			diff >>= 1;
  			bits++;
  		}
+ 
+ 		/* Update appropriate duration bucket */
  		histogram[bits]++;
  
  		loop_count++;
*** test_timing(int32 duration)
*** 153,171 
  
  	printf("Per loop time including overhead: %0.2f nsec\n",
  		   INSTR_TIME_GET_DOUBLE(end_time) * 1e9 / loop_count);
  	printf("Histogram of timing durations:\n");
! 	printf("%9s: %10s %9s\n", "< usec", "count", "percent");
  
! 	found = 0;
! 	for (i = 31; i >= 0; i--)
  	{
! 		if (found || histogram[i])
! 		{
! 			found = 1;
! 			/* lame hack to work around INT64_FORMAT deficiencies */
! 			snprintf(buf, sizeof(buf), INT64_FORMAT, histogram[i]);
! 			printf("%9ld: %10s %8.5f%%\n", 1l << i, buf,
!    (double) histogram[i] * 100 / loop_count);
! 		}
  	}
  }
--- 155,183 
  
  	printf("Per loop time including overhead: %0.2f nsec\n",
  		   INSTR_TIME_GET_DOUBLE(end_time) * 1e9 / loop_count);
+ 
+ 	return loop_count;
+ }
+ 
+ static void
+ output(uint64 loop_count)
+ {
+ 	int64		max_bit = 31, i;
+ 
+ 	/* find highest bit value */
+ 	while (max_bit > 0 && histogram[max_bit] == 0)
+ 		max_bit--;
+ 		
  	printf("Histogram of timing durations:\n");
! 	printf("%6s   %10s %10s\n", "< usec", "% of total", "count");
  
! 	for (i = 0; i <= max_bit; i

Re: [HACKERS] PATCH: pgbench - random sampling of transaction written into log

2012-08-25 Thread Tomas Vondra
On 26.8.2012 00:19, Jeff Janes wrote:
> On Fri, Aug 24, 2012 at 2:16 PM, Tomas Vondra  wrote:
>> Hi,
>>
>> attached is a patch that adds support for random sampling in pgbench, when
>> it's executed with "-l" flag. You can do for example this:
>>
>>   $ pgbench -l -T 120 -R 1 db
>>
>> and then only 1% of transactions will be written into the log file. If you
>> omit the tag, all the transactions are written (i.e. it's backward
>> compatible).
> 
> Hi Tomas,
> 
> You use the rand() function.  Isn't that function not thread-safe?
> Or, if it is thread-safe, does it accomplish that with a mutex?  That
> was a problem with a different rand function used in pgbench that
> Robert Haas fixed a while ago, 4af43ee3f165c8e4b332a7e680.

Hi Jeff,

Aha! Good catch. I've used rand() which seems to be neither reentrant or
thread-safe (unless the man page is incorrect). Anyway, using pg_erand48
or getrand seems like an appropriate fix.

> Also, what benefit is had by using modulus on rand(), rather than just
> modulus on an incrementing counter?

Hmm, I was thinking about that too, but I wasn't really sure how would
that behave with multiple SQL files etc. But now I see the files are
actually chosen randomly, so using a counter seems like a good idea.

> Could you explain the value of this patch, given your other one that
> does aggregation?  If both were accepted, I think I would always use
> the aggregation one in preference to this one.

The difference is that the sample contains information that is simply
unavailable in the aggregated output. For example when using multiple
files, you can compute per-file averages from the sample, but the
aggregated output contains just a single line for all files combined.


Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \set vs \copy - bug or expected behaviour?

2012-08-25 Thread Bruce Momjian
On Fri, Aug 17, 2012 at 02:01:25PM -0400, Bruce Momjian wrote:
> On Fri, Aug 17, 2012 at 06:55:14PM +0100, Richard Huxton wrote:
> > >Well, it'd be nice to allow substitution there ...
> > >
> > >>What we can't easily do is to allow quotes to prevent variable
> > >>substitution in these whole-line commands because we can't process the
> > >>quotes because that will remove them.
> > >
> > >... but if there is then no way to prevent it, that's absolutely
> > >unacceptable.
> > 
> > If I'm understanding this correctly, \copy parsing just passes the
> > query part unaltered as part of a COPY statement back into the
> > top-level parser. Likewise with the \!shell stuff (but presumably to
> > execve).
> > 
> > To handle variable-substitution correctly for \copy we'd need to
> > duplicate the full parsing for COPY. For \! we'd need something
> > which understood shell-syntax (for the various shells out there).
> > Ick.
> > 
> > Or you'd need a separate variable-bracketing {{:x}} syntax that
> > could work like reverse dollar-quoting. Also Ick.
> > 
> > As far as we know this has only inconvenienced one person (me) badly
> > enough to report a maybe-bug. Thanks for trying Bruce, but I fear
> > this is one itch that'll go unscratched.
> > 
> > Rest assured I'm not about to storm off and replace all my
> > installations with MySQL :-)
> 
> Good analysis.  Basically we can't hope to fully understand COPY or
> shell quoting syntax well enough to properly replace only unquoted psql
> variable references.
> 
> Therefore, unless I hear otherwise, I will just document the limitation
> and withdraw the patch.

Patch withdrawn.  Seems documentation was already in place --- I
clarified \! limitations match \copy.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - random sampling of transaction written into log

2012-08-25 Thread Jeff Janes
On Fri, Aug 24, 2012 at 2:16 PM, Tomas Vondra  wrote:
> Hi,
>
> attached is a patch that adds support for random sampling in pgbench, when
> it's executed with "-l" flag. You can do for example this:
>
>   $ pgbench -l -T 120 -R 1 db
>
> and then only 1% of transactions will be written into the log file. If you
> omit the tag, all the transactions are written (i.e. it's backward
> compatible).

Hi Tomas,

You use the rand() function.  Isn't that function not thread-safe?
Or, if it is thread-safe, does it accomplish that with a mutex?  That
was a problem with a different rand function used in pgbench that
Robert Haas fixed a while ago, 4af43ee3f165c8e4b332a7e680.

Also, what benefit is had by using modulus on rand(), rather than just
modulus on an incrementing counter?

Could you explain the value of this patch, given your other one that
does aggregation?  If both were accepted, I think I would always use
the aggregation one in preference to this one.

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] timestamptz parsing bug?

2012-08-25 Thread Bruce Momjian
On Wed, Aug 15, 2012 at 05:29:26PM -0400, Bruce Momjian wrote:
> 
> I assume we want to apply this patch based on discussion that we should
> allow a wider range of date/time formats.

Applied, thanks.

---

> 
> On Mon, Aug 29, 2011 at 06:40:07PM +0100, Dean Rasheed wrote:
> > On 29 August 2011 15:40, Andrew Dunstan  wrote:
> > >
> > > Why do we parse this as a correct timestamptz literal:
> > >
> > >    2011-08-29T09:11:14.123 CDT
> > >
> > > but not this:
> > >
> > >    2011-08-29T09:11:14.123 America/Chicago
> > >
> > > Replace the ISO-8601 style T between the date and time parts of the latter
> > > with a space and the parser is happy again.
> > >
> > >
> > > cheers
> > >
> > > andrew
> > >
> > 
> > Funny, I've just recently been looking at this code.
> > 
> > I think that the issue is in the DTK_TIME handling code in DecodeDateTime().
> > 
> > For this input string the "T" is recognised as the start of an ISO
> > time, and the ptype variable is set to DTK_TIME. The next field is a
> > DTK_TIME, however, when it is handled it doesn't reset the ptype
> > variable.
> > 
> > When it gets to the timezone "America/Chicago" at the end, this is
> > handled in the DTK_DATE case, because of the "/". But because ptype is
> > still set, it is expecting this to be an ISO time, so it errors out.
> > 
> > The attached patch seems to fix it. Could probably use a new
> > regression test though.
> > 
> > Regards,
> > Dean
> 
> > diff --git a/src/backend/utils/adt/datetime.c 
> > b/src/backend/utils/adt/datetime.c
> > new file mode 100644
> > index 3d320cc..a935d98
> > *** a/src/backend/utils/adt/datetime.c
> > --- b/src/backend/utils/adt/datetime.c
> > *** DecodeDateTime(char **field, int *ftype,
> > *** 942,947 
> > --- 942,957 
> > break;
> >   
> > case DTK_TIME:
> > +   /*
> > +* This might be an ISO time following a "t" 
> > field.
> > +*/
> > +   if (ptype != 0)
> > +   {
> > +   /* Sanity check; should not fail this 
> > test */
> > +   if (ptype != DTK_TIME)
> > +   return DTERR_BAD_FORMAT;
> > +   ptype = 0;
> > +   }
> > dterr = DecodeTime(field[i], fmask, 
> > INTERVAL_FULL_RANGE,
> >&tmask, tm, 
> > fsec);
> > if (dterr)
> 
> > 
> > -- 
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> 
> 
> -- 
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
> 
>   + It's impossible for everything to be true. +
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_replication vs StandbyReplyMessage

2012-08-25 Thread Bruce Momjian
On Thu, Aug 23, 2012 at 12:40:33PM +0200, Magnus Hagander wrote:
> On Wed, Aug 15, 2012 at 5:39 PM, Bruce Momjian  wrote:
> > On Mon, Aug 15, 2011 at 01:03:35PM +0200, Magnus Hagander wrote:
> >> The pg_stat_replication view exposes all the fields in
> >> StandbyReplyMessage *except* for the timestamp when the message was
> >> generated. On an active system this is not all that interesting, but
> >> on a mostly idle system that allows the monitoring to react faster
> >> than the timeout that actually kicks the other end off - and could be
> >> useful in manual debugging scenarios. Any particular reason why this
> >> was not exposed as it's own column?
> >
> > Did this ever get done?  I don't think so, though everyone wanted it.
> 
> Nope, it wasn't done. Should probably do that for 9.3 (since adding a
> field to pg_stat_replication will cause initdb, so we can't really do
> it for 9.2 unless it was really critical - and it's not).

OK, TODO added:

Add entry creation timestamp column to pg_stat_replication

http://archives.postgresql.org/pgsql-hackers/2011-08/msg00694.php 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] -Wformat-zero-length

2012-08-25 Thread Bruce Momjian
On Tue, Aug 14, 2012 at 11:54:53PM -0400, Peter Eisentraut wrote:
> On Tue, 2012-08-14 at 17:56 -0400, Tom Lane wrote:
> > Peter Eisentraut  writes:
> > > On 8/10/12 7:48 PM, Dimitri Fontaine wrote:
> > >> What about having single user mode talk fe/be protocol, and talk to it 
> > >> via a UNIX pipe, with pg_upgrade starting the single user backend as a 
> > >> subprocess?
> > 
> > > I think that's essentially equivalent to starting the server on a 
> > > Unix-domain socket in a private directory.  But that has been rejected 
> > > because it doesn't work on Windows.
> > 
> > > The question in my mind is, is there some other usable way on Windows 
> > > for two unrelated processes to communicate over file descriptors in a 
> > > private and secure way?
> > 
> > You're making this unnecessarily hard, because there is no need for the
> > two processes to be unrelated.
> > 
> > The implementation I'm visualizing is that a would-be client (think psql
> > or pg_dump, though the code would actually be in libpq) forks off a
> > process that becomes a standalone backend, and then they communicate
> > over a pair of pipes that were created before forking.  This is
> > implementable on any platform that supports Postgres, because initdb
> > already relies on equivalent capabilities.
> 
> Well, that would be an interesting feature, but it's debatable which
> among this or just adding a new socket type (if available) is harder.

TODO added:

Find cleaner way to start/stop dedicated servers for upgrades

http://archives.postgresql.org/pgsql-hackers/2012-08/msg00275.php 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] writable foreign tables

2012-08-25 Thread Kohei KaiGai
2012/8/25 Robert Haas :
> On Thu, Aug 23, 2012 at 1:10 AM, Kohei KaiGai  wrote:
>> It is a responsibility of FDW extension (and DBA) to ensure each
>> foreign-row has a unique identifier that has 48-bits width integer
>> data type in maximum.
>
> It strikes me as incredibly short-sighted to decide that the row
> identifier has to have the same format as what our existing heap AM
> happens to have.  I think we need to allow the row identifier to be of
> any data type, and even compound.  For example, the foreign side might
> have no equivalent of CTID, and thus use primary key.  And the primary
> key might consist of an integer and a string, or some such.
>
I assume it is a task of FDW extension to translate between the pseudo
ctid and the primary key in remote side.

For example, if primary key of the remote table is Text data type, an idea
is to use a hash table to track the text-formed primary being associated
with a particular 48-bits integer. The pseudo ctid shall be utilized to track
the tuple to be modified on the scan-stage, then FDW can reference the
hash table to pull-out the primary key to be provided on the prepared
statement.

Do we have some other reasonable ideas?

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-25 Thread David Johnston
> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Saturday, August 25, 2012 12:46 PM
> To: David Johnston
> Cc: Jeff Davis; Vlad Arkhipov; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] temporal support patch
> 
> On Mon, Aug 20, 2012 at 7:17 PM, David Johnston 
> wrote:
> > Ideally the decision of whether to do so could be a client decision.
> > Not storing intra-transaction changes is easier than storing all
changes.
> 
> Not really.  If you don't care about suppressing intra-transaction
changes, you
> can essentially just have a trigger that fires on every update and adds
> information to the side table.  If you do care about suppressing them, you
> have to do something more complicated.  Or so it seems to me.
> 

My internals knowledge is basically zero but it would seem that If you
simply wanted the end-of-transaction result you could just record nothing
during the transaction and then copy whatever values are present at commit
to whatever logging mechanism you need.  If you are recording
intra-transaction values you could do so to a temporary storage area and
then, at commit, decide whether the recent value for a given
relation/attribute is going to be retained in the final log or whether you
end up persisting all of the intermediate values as well.


> > "You cannot allow the application to choose what is stored to identify
> > itself (client)" - i.e., its credentials identify who it is and those
> > are stored without consulting the application
> 
> I don't think we can violate the general principle that the database
super-
> user or table owner can do whatever they want.  If one of those folks
wants
> to falsify their history, are we really going to tell them "no"?  To me
that has
> "I'm sorry, Dave, I can't do that" written all over it, and I think we'll
get about
> the same reaction that Hal did.
> Now, if user A is inserting into user B's table, and is not the
super-user, then,
> of course, we can and should ensure that no falsification is possible.
> 

With respect to the physical log file there is no way for the super-user to
currently falsify (at time of statement execution) the user/role that they
are using.  Even a "SET ROLE" doesn't change the session user (I forget the
exact mechanics but I pretty sure on the general point).  I do not see how
this is that much different.

I agree that it is pointless to even try to maintain true in-database
auditing in the presence of god-like super-users so most of what I envision
relates to limited permissioned users that are forced to rely upon the
standard mechanisms provided by the database.  As a matter of principle
those wanting a secure and auditable environment should not be using
ownership level roles.

Since these temporal/audit tables are intended to be maintained by the
system if you do not ask the users to identify themselves but instead take
the information directly from the environment, you never have to give a "I'm
sorry Dave" response because Dave is never given the chance to submit a
proposed value.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-25 Thread Robert Haas
On Mon, Aug 20, 2012 at 7:17 PM, David Johnston  wrote:
> Ideally the decision of whether to do so could be a client decision.  Not
> storing intra-transaction changes is easier than storing all changes.

Not really.  If you don't care about suppressing intra-transaction
changes, you can essentially just have a trigger that fires on every
update and adds information to the side table.  If you do care about
suppressing them, you have to do something more complicated.  Or so it
seems to me.

> I see the "user" element as having two components:
>
> "Client" - what device/channel/"user" was used to connect to the database -
> PostgreSQL Role
> "User" - relative to that "client" which actual "user" performed the action
> - Application Specified
>
> A PostgreSQL role would correspond to "client" whereas the application would
> be allowed to have full control of what "User" value is stored.
>
> This gets a little complicated with respect to "SET ROLE" but gets close to
> the truth.  The idea is that you look at the "client" to determine the
> "namespace" over which the "user" is defined and identified.
>
> So, a better way to phrase the position is that:
>
> "You cannot allow the application to choose what is stored to identify
> itself (client)" - i.e., its credentials identify who it is and those are
> stored without consulting the application

I don't think we can violate the general principle that the database
super-user or table owner can do whatever they want.  If one of those
folks wants to falsify their history, are we really going to tell them
"no"?  To me that has "I'm sorry, Dave, I can't do that" written all
over it, and I think we'll get about the same reaction that Hal did.
Now, if user A is inserting into user B's table, and is not the
super-user, then, of course, we can and should ensure that no
falsification is possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] writable foreign tables

2012-08-25 Thread Robert Haas
On Thu, Aug 23, 2012 at 1:10 AM, Kohei KaiGai  wrote:
> It is a responsibility of FDW extension (and DBA) to ensure each
> foreign-row has a unique identifier that has 48-bits width integer
> data type in maximum.

It strikes me as incredibly short-sighted to decide that the row
identifier has to have the same format as what our existing heap AM
happens to have.  I think we need to allow the row identifier to be of
any data type, and even compound.  For example, the foreign side might
have no equivalent of CTID, and thus use primary key.  And the primary
key might consist of an integer and a string, or some such.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUE/FALSE vs true/false

2012-08-25 Thread Robert Haas
On Thu, Aug 23, 2012 at 11:01 AM, Tom Lane  wrote:
> Bruce Momjian  writes:
>> On Mon, Aug 20, 2012 at 03:09:08PM -0400, Robert Haas wrote:
>>> I think the thing we need to look at is what percentage of our code
>>> churn is coming from stuff like this, versus what percentage of it is
>>> coming from other factors.  If we change 250,000 lines of code per
>>> release cycle and of that this kind of thing accounts for 5,000 lines
>>> of deltas, then IMHO it's not really material.  If it accounts for
>>> 50,000 lines of deltas out of the same base, that's probably more than
>>> can really be justified by the benefit we're going to get out of it.
>
>> The true/false capitalization patch changes 1.2k lines.
>
> I did a quick look at git diff --stat between recent branches:
>
> $ git diff --shortstat REL9_0_9 REL9_1_5
>  3186 files changed, 314847 insertions(+), 210452 deletions(-)
> $ git diff --shortstat REL9_1_5 REL9_2_BETA4
>  2037 files changed, 290919 insertions(+), 189487 deletions(-)
>
> However, when you look at things a bit closer, these numbers are
> misleading because they include the .po files, which seem to have huge
> inter-branch churn --- well in excess of 10 lines changed per
> release, at least in git's simpleminded view.  Excluding those, as well
> as src/test/isolation/expected/prepared-transactions.out which added
> 34843 lines all by itself, I get
> 173080 insertions, 70300 deletions for 9.0.9 -> 9.1.5
> 130706 insertions, 55714 deletions for 9.1.5 -> 9.2beta4.
> So it looks like we touch order-of-magnitude of 100K lines per release;
> which still seems astonishingly high, but then this includes docs and
> regression tests not just code.  If I restrict the stat to *.[chyl]
> files it's about half that:
>
> $ git diff --numstat REL9_0_9 REL9_1_5 | grep '\.[chyl]$' | awk '{a += $1; b 
> += $2}
> END{print a,b}'
> 90234 33902
> $ git diff --numstat REL9_1_5 REL9_2_BETA4 | grep '\.[chyl]$' | awk '{a += 
> $1; b += $2}
> END{print a,b}'
> 90200 42218
>
> So a patch of 1K lines would by itself represent about 2% of the typical
> inter-branch delta.  Maybe that's below our threshold of pain, or maybe
> it isn't.  I'd be happier about it if there were a more compelling
> argument for it, but to me it looks like extremely trivial neatnik-ism.

I wouldn't mind a bit if we devoted 2% of our inter-branch deltas to
this sort of thing, but I've got to admit that 2% for one patch seems
a little on the high side to me.  It might not be a bad idea to
establish one formulation or the other as the one to be used in all
new code, though, to avoid making the problem worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-08-25 Thread Florian Weimer
* Jeff Janes:

> I don't see the virtue of this in this case.  Since the index is not
> unique, why not just put the index on (a,b,c,d) and be done with it?

AFAICT, SQLite 4 encodes keys in a way that is not easily reversed
(although the encoding is injective, so it's reversible in principle).
Therefore, covered columns need to be listed separately, so they are
not subject to the key encoding.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers