date:20140116

Re: [HACKERS] drop duplicate buffers in OS

2014-01-16 Thread KONDO Mitsumasa


(2014/01/16 3:34), Robert Haas wrote:

On Wed, Jan 15, 2014 at 1:53 AM, KONDO Mitsumasa
 wrote:

I create patch that can drop duplicate buffers in OS using usage_count
alogorithm. I have developed this patch since last summer. This feature seems to
be discussed in hot topic, so I submit it more faster than my schedule.

When usage_count is high in shared_buffers, they are hard to drop from
shared_buffers. However, these buffers wasn't required in file cache. Because
they aren't accessed by postgres(postgres access to shared_buffers).
So I create algorithm that dropping file cache which is high usage_count in
shared_buffers and is clean state in OS. If file cache are clean state in OS, 
and
executing posix_fadvice DONTNEED, it can only free in file cache without writing
physical disk. This algorithm will solve double-buffered situation problem and
can use memory more efficiently.

I am testing DBT-2 benchmark now...


The thing about this is that our usage counts for shared_buffers don't
really work right now; it's common for everything, or nearly
everything, to have a usage count of 5.  So I'm reluctant to rely on
that for much of anything.
This patch aims to large shared_buffers situations, so 10% memory shared_buffers 
situaition
might be not effective. This patch is in experimental and to show how to solve 
the double-buffers for one of a example.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop duplicate buffers in OS

2014-01-16 Thread KONDO Mitsumasa


(2014/01/16 21:38), Aidan Van Dyk wrote:

Can we just get the backend that dirties the page to the posix_fadvice DONTNEED?
No, it can remove clean page in OS file caches. Because if page is dirtied, it 
cause physical-disk-writing. However, it is experimental patch so it might be 
changed by future benchmark testing.



Or have another helper that sweeps the shared buffers and does this 
post-first-dirty?
We can add DropDuplicateOSCache() function to checkpointer process or other 
process. And we can chenged posix_fadvice() DONTNEED to sync_file_range(). It can 
cause physical-disk-writing in target buffer, not to free OS file caches.


I'm considering that sync_file_range() SYNC_FILE_RANGE_WAIT_BEFORE | 
SYNC_FILE_RANGE_WRITE in executing checkpoint. It can avoid fsync freeze 
situaition in part of of finnal checkpoint.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup throttling

2014-01-16 Thread Michael Paquier

On Fri, Jan 17, 2014 at 5:03 AM, Andres Freund  wrote:
> slightly related: we should start to reuse procLatch for walsenders
> instead of having a separate latch someday.
+ 1 on that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Amit Kapila

On Fri, Jan 17, 2014 at 12:16 AM, Tom Lane  wrote:
> PS: off topic, but isn't ParseConfigDirectory leaking the result
> of AbsoluteConfigLocation?  In both normal and error paths?

   Yes, I also think it leaks in both cases and similar leak is
   present in ParseConfigFile(). I have tried to fix both of these
   leaks with attached patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

fix_mem_leak_parse_config.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2014-01-16 Thread Amit Kapila

On Thu, Jan 16, 2014 at 10:24 PM, tirtho  wrote:
> Is this now fixed? If so, where do I find the patch for postgres 9.2.2.

   This is still not fixed, the patch is in Ready For Committer stage.

   You can confirm the status here:
   https://commitfest.postgresql.org/action/patch_view?id=1258

   I had verified that it still applies on HEAD, so moved it to current
   Commit Fest.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Lsf-pc] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeff Janes

On Thursday, January 16, 2014, Dave Chinner
>
wrote:

> On Thu, Jan 16, 2014 at 03:58:56PM -0800, Jeff Janes wrote:
> > On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner 
> wrote:
> >
> > > On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
> > > > On 1/15/14, 12:00 AM, Claudio Freire wrote:
> > > > >My completely unproven theory is that swapping is overwhelmed by
> > > > >near-misses. Ie: a process touches a page, and before it's
> > > > >actually swapped in, another process touches it too, blocking on
> > > > >the other process' read. But the second process doesn't account
> > > > >for that page when evaluating predictive models (ie: read-ahead),
> > > > >so the next I/O by process 2 is unexpected to the kernel. Then
> > > > >the same with 1. Etc... In essence, swap, by a fluke of its
> > > > >implementation, fails utterly to predict the I/O pattern, and
> > > > >results in far sub-optimal reads.
> > > > >
> > > > >Explicit I/O is free from that effect, all read calls are
> > > > >accountable, and that makes a difference.
> > > > >
> > > > >Maybe, if the kernel could be fixed in that respect, you could
> > > > >consider mmap'd files as a suitable form of temporary storage.
> > > > >But that would depend on the success and availability of such a
> > > > >fix/patch.
> > > >
> > > > Another option is to consider some of the more "radical" ideas in
> > > > this thread, but only for temporary data. Our write sequencing and
> > > > other needs are far less stringent for this stuff.  -- Jim C.
> > >
> > > I suspect that a lot of the temporary data issues can be solved by
> > > using tmpfs for temporary files
> > >
> >
> > Temp files can collectively reach hundreds of gigs.
>
> So unless you have terabytes of RAM you're going to have to write
> them back to disk.
>

If they turn out to be hundreds of gigs, then yes they have to hit disk (at
least on my hardware).  But if they are 10 gig, then maybe not (depending
on whether other people decide to do similar things at the same time I'm
going to be doing it--something which is often hard to predict).   But now
for every action I take, I have to decide, is this going to take 10 gig, or
14 gig, and how absolutely certain am I?  And is someone else going to try
something similar at the same time?  What a hassle.  It would be so much
nicer to say "This is accessed sequentially, and will never be fsynced.
 Maybe it will fit entirely in memory, maybe it won't, either way, you know
what to do."

If I start out writing to tmpfs, I can't very easily change my mind 94% of
the way through and decide to go somewhere else.  But the kernel,
effectively, can.

> But there's something here that I'm not getting - you're talking
> about a data set that you want ot keep cache resident that is at
> least an order of magnitude larger than the cyclic 5-15 minute WAL
> dataset that ongoing operations need to manage to avoid IO storms.
>

Those are mostly orthogonal issues.  The permanent files need to be fsynced
on a regular basis, and might have gigabytes of data dirtied at random from
within terabytes of underlying storage.  We better start writing that
pretty quickly or when do issue the fsyncs, the world will fall apart.

The temporary files will never need to be fsynced, and can be written out
sequentially if they do ever need to be written out.  Better to delay this
as much as feasible.

Where do these temporary files fit into this picture, how fast do
> they grow and why are do they need to be so large in comparison to
> the ongoing modifications being made to the database?
>

The permanent files tend to be things like "Jane Doe just bought a pair of
green shoes from Hendrick Green Shoes Limited--record that, charge her
credit card, and schedule delivery".  The temp files are more like "It is
the end of the year, how many shoes have been purchased in each color from
each manufacturer for each quarter over the last 6 years"?   So the temp
files quickly manipulate data that has slowly been accumulating over very
long times, while the permanent files represent the processes of those
accumulations.

If you are Amazon, of course, you have thousands of people who can keep two
sets of records, one organized for fast update and one slightly delayed
copy reorganized for fast analysis, and also do partial analysis on an
ongoing basis and roll them up in ways that can be incrementally updated.
 If you are not Amazon, it would be nice if one system did a better job of
doing both things with the trade off between the two being dynamic and
automatic.

Cheers,

Jeff

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Craig Ringer

On 01/16/2014 11:52 PM, Tom Lane wrote:
> Heikki Linnakangas  writes:
>> On 01/16/2014 05:39 PM, Andres Freund wrote:
>>> Do you see a reasonable way to implement this generically for all
>>> commands? I don't.
> 
>> In suitable safe places, check if you've written too much WAL, and sleep 
>> if so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of 
>> CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe 
>> vacuum_delay_point() is a better analogy. Hopefully you don't need to 
>> sprinkle them in too many places to be useful.
> 
> We probably don't really need to implement it for "all" commands; I think
> everyone can agree that, say, ALTER TABLE RENAME COLUMN isn't going to
> emit enough WAL to really matter.  My point was that we should try to hit
> everything that potentially *could* generate lots of WAL, rather than
> arbitrarily deciding that some are utility commands and some are not.

Then you land up needing a configuration mechanism to control *which*
commands get affected, too, to handle the original use case of "I want
to rate limit vaccuum and index rebuilds, while everything else runs
full tilt".

Or do you propose to just do this per-session? If so, what about autovacuum?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)

2014-01-16 Thread Craig Ringer

On 01/16/2014 02:28 AM, Robert Haas wrote:
> - If you address /* FIXME: apply sanity checking to slot name */, then
> I think that also addresses /* XXX: do we want to use truncate
> identifier instead? */. In other words, let's just error out if the
> name is too long.  I'm not sure what other sanity checking is needed
> here; maybe just restrict it to 7-bit characters to avoid problems
> with encodings for individual databases varying.

It's a common misunderstanding that restricting to 7-bit solves encoding
issues.

Thanks to the joy that is SHIFT_JIS, we must also disallow the backslash
and tilde characters.

Anybody who actually uses SHIFT_JIS as an operational encoding, rather
than as an input/output encoding, is into pain and suffering. Personally
I'd be quite happy to see it supported as client_encoding, but forbidden
as a server-side encoding. That's not the case right now - so since we
support it, we'd better guard against its quirks.

slotnames can't be regular identifiers, because they might contain chars
not valid in another DB's encoding. So lets just restrict them to
[a-zA-Z0-9_ -] and be done with it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-01-16 Thread Peter Geoghegan

On Wed, Jan 15, 2014 at 11:02 PM, Peter Geoghegan  wrote:
> It might just be a matter of:
>
> @@ -186,6 +186,13 @@ ExecLockHeapTupleForUpdateSpec(EState *estate,
> switch (test)
> {
> case HeapTupleInvisible:
> +   /*
> +* Tuple may have originated from this command, in 
> which case it's
> +* already locked
> +*/
> +   if 
> (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple.t_data))
> &&
> +   HeapTupleHeaderGetCmin(tuple.t_data) == 
> estate->es_output_cid)
> +   return true;
> /* Tuple became invisible;  try again */
> if (IsolationUsesXactSnapshot())
> ereport(ERROR,

I think we need to give this some more thought. I have not addressed
the implications for MVCC snapshots here. I think that I'll need to
raise a WARNING along the lines of "your snapshot isn't going to
consider the locked tuple visible because the same command inserted
it", or perhaps even raise an ERROR regardless of isolation level
(although note that I'm not suggesting that we raise an ERROR in the
event of receiving HeapTupleInvisible from heap_lock_tuple()/HTSU()
for other reasons, which *is* possible, nor am I suggesting that later
commands of the same xact would ever see this ERROR). I'm comfortable
with the idea of what you might loosely describe as a "READ COMMITTED
mode serialization failure" here, because this case is so much more
narrow than the other case I've proposed making a special exception to
the general semantics of MVCC snapshots to accommodate (i.e. the case
where a tuple is locked from an xact logically still-in-progress to
our snapshot in RC mode).

I think I'll be happy to declare that usage of the feature that hits
this issue is somewhere between questionable and wrong. It probably
isn't worth making another, similar HTSMVCC exception for this case.
But ISTM that we still have to do *something* other than simply credit
users with taking care to avoid tripping up on this.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Lsf-pc] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Greg Stark

On Wed, Jan 15, 2014 at 7:53 AM, Mel Gorman  wrote:
> The second is have
> pages that are strictly kept dirty until the application syncs them. An
> unbounded number of these pages would blow up but maybe bounds could be
> placed on it. There are no solid conclusions on that part yet.

I think the interface would be subtler than that. The current
architecture is that if an individual process decides to evict one of
these pages it knows how much of the log needs to be flushed and
fsynced before it can do so and proceeds to do it itself. This is a
situation to be avoided as much as possible but there are workloads
where it's inevitable (the typical example is mass data loads).

There would need to be some kind of similar interface where there
would be some way for the kernel to force log pages to be written to
allow it to advance the epoch. Either some way to wake Postgres up and
inform it of the urgency or better yet Postgres would just always be
writing out pages without fsyncing them and instead be issuing some
other syscall to mark the points in the log file that correspond to
the write barriers that would unpin these buffers.

Ted T'so was concerned this would all be a massive layering violation
and I have to admit that's a huge risk. It would take some clever API
engineering to come with a clean set of primitives to express the kind
of ordering guarantees we need without being too tied to Postgres's
specific implementation. The reason I think it's more interesting
though is that Postgres's journalling and checkpointing architecture
is pretty bog-standard CS stuff and there are hundreds or thousands of
pieces of software out there that do pretty much the same work and
trying to do it efficiently with fsync or O_DIRECT is like working
with both hands tied to your feet.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-16 Thread Andrew Dunstan



On 01/16/2014 07:39 PM, Andrew Dunstan wrote:


On 01/16/2014 01:57 PM, Peter Eisentraut wrote:

On 1/3/14, 9:00 PM, Andrew Dunstan wrote:

Here is a patch for the new json functions I mentioned a couple of
months ago. These are:

 json_to_record
 json_to_recordset
 json_object
 json_build_array
 json_build_object
 json_object_agg

So far there are no docs, but the way these work is illustrated in the
regression tests - I hope to have docs within a few days.

Compiler warnings:

json.c: In function ‘json_object_two_arg’:
json.c:2210:5: warning: unused variable ‘count’ [-Wunused-variable]

jsonfuncs.c: In function ‘json_to_record’:
jsonfuncs.c:1955:16: warning: unused variable ‘tuple’ 
[-Wunused-variable]
jsonfuncs.c:1953:18: warning: variable ‘rec’ set but not used 
[-Wunused-but-set-variable]


Also, please run your patch through git diff --check.  I have noticed
that several of your patches have hilarious whitespace, maybe
something with your editor.




I'm happy to keep you amused. Some of this was probably due to cutting 
and pasting.


all these issues are fixed in the attached patch.




In case anyone feels like really nitpicking, this version fixes some 
pgindent weirdness due to an outdated typedef list in the previous version.


cheers

andrew
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 21a2336..ef5e125 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -68,6 +68,10 @@ static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
   bool use_line_feeds);
 static void array_to_json_internal(Datum array, StringInfo result,
 	   bool use_line_feeds);
+static void datum_to_json(Datum val, bool is_null, StringInfo result,
+			  TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar);
+static void add_json(Datum orig_val, bool is_null, StringInfo result,
+		 Oid val_type, bool key_scalar);
 
 /* the null action object used for pure validation */
 static JsonSemAction nullSemAction =
@@ -751,11 +755,12 @@ json_lex_string(JsonLexContext *lex)
  report_json_context(lex)));
 
 	/*
-	 * For UTF8, replace the escape sequence by the actual utf8
-	 * character in lex->strval. Do this also for other encodings
-	 * if the escape designates an ASCII character, otherwise
-	 * raise an error. We don't ever unescape a \u, since that
-	 * would result in an impermissible nul byte.
+	 * For UTF8, replace the escape sequence by the actual
+	 * utf8 character in lex->strval. Do this also for other
+	 * encodings if the escape designates an ASCII character,
+	 * otherwise raise an error. We don't ever unescape a
+	 * \u, since that would result in an impermissible nul
+	 * byte.
 	 */
 
 	if (ch == 0)
@@ -771,8 +776,9 @@ json_lex_string(JsonLexContext *lex)
 	else if (ch <= 0x007f)
 	{
 		/*
-		 * This is the only way to designate things like a form feed
-		 * character in JSON, so it's useful in all encodings.
+		 * This is the only way to designate things like a
+		 * form feed character in JSON, so it's useful in all
+		 * encodings.
 		 */
 		appendStringInfoChar(lex->strval, (char) ch);
 	}
@@ -866,7 +872,7 @@ json_lex_string(JsonLexContext *lex)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("invalid input syntax for type json"),
-		errdetail("Unicode low surrogate must follow a high surrogate."),
+			errdetail("Unicode low surrogate must follow a high surrogate."),
  report_json_context(lex)));
 
 	/* Hooray, we found the end of the string! */
@@ -1217,11 +1223,11 @@ extract_mb_char(char *s)
  */
 static void
 datum_to_json(Datum val, bool is_null, StringInfo result,
-			  TYPCATEGORY tcategory, Oid typoutputfunc)
+			  TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar)
 {
 	char	   *outputstr;
 	text	   *jsontext;
-	bool   numeric_error;
+	bool		numeric_error;
 	JsonLexContext dummy_lex;
 
 	if (is_null)
@@ -1239,23 +1245,32 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 			composite_to_json(val, result, false);
 			break;
 		case TYPCATEGORY_BOOLEAN:
-			if (DatumGetBool(val))
-appendStringInfoString(result, "true");
+			if (!key_scalar)
+appendStringInfoString(result, DatumGetBool(val) ? "true" : "false");
 			else
-appendStringInfoString(result, "false");
+escape_json(result, DatumGetBool(val) ? "true" : "false");
 			break;
 		case TYPCATEGORY_NUMERIC:
 			outputstr = OidOutputFunctionCall(typoutputfunc, val);
-			/*
-			 * Don't call escape_json here if it's a valid JSON number.
-			 */
-			dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
-			dummy_lex.input_length = strlen(dummy_lex.input);
-			json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
-			if (! numeric_error)
-appendStringInfoString(result, outputstr);
-			else
+			if (key_scalar)
+			{
+/* always qu

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Florian Pflug

I had some more fun with this, the result is v2.5 of the patch (attached). 
Changes are explained below.

On Jan16, 2014, at 19:10 , Florian Pflug  wrote:
> On Jan16, 2014, at 09:07 , David Rowley  wrote:
>> On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug  wrote:
>>> The notnullcount machinery seems to apply to both STRICT and non-STRICT 
>>> transfer function pairs. Shouldn't that be constrained to STRICT transfer 
>>> function pairs? For non-STRICT pairs, it's up to the transfer functions to 
>>> deal with NULL inputs however they please, no?
>> 
>> The reason I had to track the notnullcount was because for non-strict was 
>> because:
>> 
>> select sum(n) over (order by i rows between current row and unbounded 
>> following) from (values(1,1),(2,NULL)) v(i,n);
>> 
>> would otherwise return
>> 1
>> 0
>> 
>> sum is not strict, so I guess we need to track notnullcount for non-strict 
>> functions.
>> See the code around if (peraggstate->notnullcount == 0) in 
>> retreat_windowaggregate(). 
> 
> Yeah, I figured that was the reason, but you can't fix it that way. See 
> http://www.postgresql.org/message-id/8e857d95-cba4-4974-a238-9dd7f61be...@phlo.org
>  for a detailed explanation why. My 2.2 patch allows pairs of non-strict 
> forward and strict inverse transfer functions exactly because of this - i.e., 
> basically it decrees that if there's an inverse transfer function, the strict 
> setting of the *inverse* function determines the aggregates NULL behaviour. 
> The forward transfer function is then never called
> for NULL inputs, but it *is* called with the NULL state for the first 
> non-NULL input, and *must* then return a non-NULL state (hence it's 
> technically not strict, it's strict only in the inputs, not in the state).
> 
> BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did 
> that in 2.2. That doesn't matter for the built-in aggregates since these 
> aren't created with CREATE AGGREGATE, but it's obviously wrong. I'll post a 
> fixed patched.

Fixed now.

>>> The logic around movedaggbase in eval_windowaggregates() seems a bit 
>>> convoluted. Couldn't the if be moved before the code that pulls 
>>> aggregatedbase up to frameheadpos using the inverse aggregation function?
> 
>> I had a look at this and even tried moving the code to before the inverse 
>> transitions, but it looks like that would only work if I added more tests to 
>> the if condition to ensure that we're not about to perform inverse 
>> transitions. To me it just seemed more bullet proof the way it is rather 
>> than duplicating logic and having to ensure that it stays duplicated. But 
>> saying that I don't think I've fully got my head around why the original 
>> code is valid in the first place. I would have imagined that it should 
>> contain a check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that 
>> sounds like complete rubbish then I'll put it down to my head still being 
>> fried from work.
> 
> Ok, I get this now. That code really just is asking "is the previous row's 
> frame the same as the current row's frame" in that "if" where you added 
> movedagg. What confused me was the rather weird way that condition is 
> expressed, which turns out is due to the fact that at the point of the if, we 
> don't know the new frame's end. Now, we could use update_frametailpos() to 
> find that, but that's potentially costly, especially if the tuple store was 
> spilled to disk. So instead, the code relies on the fact that only if the 
> frame end is "n FOLLOWING/PRECEDING" can the current row lie within the 
> previous row's frame without the two frame's ends being necessarily the same.
> 
> For added confusion, that "if" never explicitly checks whether the frame's 
> *heads* coincide - the previous code seems to have relied on the 
> impossibility of having "aggregatedbase <= current < aggregatedupto" after 
> re-initializing the aggregation, because then aggregatedbase = aggregatedupto 
> = 0. That's why you can't just move the "if" before the "frameheadpos == 
> aggregatedbase" check. But you can *if* you also check whether 
> "aggregatedbase == frameheadpos" in the if - which is clearer than relying on 
> that rather subtle  assumption anyway.
> 
> BTW, the your patch will also fail badly if the frame head ever moves 
> backwards - the invariant that "aggregatedbase == frameheadpos" after the 
> inverse transition loop will then be violated. Now, this should never happen, 
> because we require that the offset in "n PRECEDING/FOLLOWING" is constant, 
> but we should probably still check for this and elog().
> 
> That check was implicit in old code, because it advanced the tuplestore mark, 
> so if the frame head moved backwards, re-scanning the frame would have 
> failed. That brings me to another todo - as it stands, that mark gets never 
> advanced if we're doing inverse aggregation. To fix that, we need a call to 
> WinSetMarkPosition() somewhere in eval_windowaggregates().
> 
> After doing this t

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Robert Haas

On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner  wrote:
> But there's something here that I'm not getting - you're talking
> about a data set that you want ot keep cache resident that is at
> least an order of magnitude larger than the cyclic 5-15 minute WAL
> dataset that ongoing operations need to manage to avoid IO storms.
> Where do these temporary files fit into this picture, how fast do
> they grow and why are do they need to be so large in comparison to
> the ongoing modifications being made to the database?

I'm not sure you've got that quite right.  WAL is fsync'd very
frequently - on every commit, at the very least, and multiple times
per second even there are no commits going on just to make sure we get
it all down to the platter as fast as possible.  The thing that causes
the I/O storm is the data file writes, which are performed either when
we need to free up space in PostgreSQL's internal buffer pool (aka
shared_buffers) or once per checkpoint interval (5-60 minutes) in any
event.  The point of this system is that if we crash, we're going to
need to replay all of the WAL to recover the data files to the proper
state; but we don't want to keep WAL around forever, so we checkpoint
periodically.  By writing all the data back to the underlying data
files, checkpoints render older WAL segments irrelevant, at which
point we can recycle those files before the disk fills up.

Temp files are something else again.  If PostgreSQL needs to sort a
small amount of data, like a kilobyte, it'll use quicksort.  But if it
needs to sort a large amount of data, like a terabyte, it'll use a
merge sort.[1]  The reason is of course that quicksort requires random
access to work well; if parts of quicksort's working memory get paged
out during the sort, your life sucks.  Merge sort (or at least our
implementation of it) is slower overall, but it only accesses the data
sequentially.  When we do a merge sort, we use files to simulate the
tapes that Knuth had in mind when he wrote down the algorithm.  If the
OS runs short of memory - because the sort is really big or just
because of other memory pressure - it can page out the parts of the
file we're not actively using without totally destroying performance.
It'll be slow, of course, because disks always are, but not like
quicksort would be if it started swapping.

I haven't actually experienced (or heard mentioned) the problem Jeff
Janes is mentioning where temp files get written out to disk too
aggressively; as mentioned before, the problems I've seen are usually
the other way - stuff not getting written out aggressively enough.
But it sounds plausible.  The OS only lets you set one policy, and if
you make that file right for permanent data files that get
checkpointed it could well be wrong for temp files that get thrown
out.  Just stuffing the data on RAMFS will work for some
installations, but might not be good if you actually do want to
perform sorts whose size exceeds RAM.

BTW, I haven't heard anyone on pgsql-hackers say they'd be interesting
in attending Collab on behalf of the PostgreSQL community.  Although
the prospect of a cross-country flight is a somewhat depressing
thought, it does sound pretty cool, so I'm potentially interested.  I
have no idea what the procedure is here for moving forward though,
especially since it sounds like there might be only one seat available
and I don't know who else may wish to sit in it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] The threshold where we switch from quicksort to merge sort is a
configurable parameter.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-16 Thread Andrew Dunstan



On 01/16/2014 01:57 PM, Peter Eisentraut wrote:

On 1/3/14, 9:00 PM, Andrew Dunstan wrote:

Here is a patch for the new json functions I mentioned a couple of
months ago. These are:

 json_to_record
 json_to_recordset
 json_object
 json_build_array
 json_build_object
 json_object_agg

So far there are no docs, but the way these work is illustrated in the
regression tests - I hope to have docs within a few days.

Compiler warnings:

json.c: In function ‘json_object_two_arg’:
json.c:2210:5: warning: unused variable ‘count’ [-Wunused-variable]

jsonfuncs.c: In function ‘json_to_record’:
jsonfuncs.c:1955:16: warning: unused variable ‘tuple’ [-Wunused-variable]
jsonfuncs.c:1953:18: warning: variable ‘rec’ set but not used 
[-Wunused-but-set-variable]

Also, please run your patch through git diff --check.  I have noticed
that several of your patches have hilarious whitespace, maybe
something with your editor.




I'm happy to keep you amused. Some of this was probably due to cutting 
and pasting.


all these issues are fixed in the attached patch.

cheers

andrew

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 21a2336..7ed771d 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -46,16 +46,16 @@ typedef enum	/* contexts of JSON parser */
 	JSON_PARSE_OBJECT_NEXT,		/* saw object value, expecting ',' or '}' */
 	JSON_PARSE_OBJECT_COMMA,	/* saw object ',', expecting next label */
 	JSON_PARSE_END/* saw the end of a document, expect nothing */
-} JsonParseContext;
+}	JsonParseContext;
 
 static inline void json_lex(JsonLexContext *lex);
 static inline void json_lex_string(JsonLexContext *lex);
 static inline void json_lex_number(JsonLexContext *lex, char *s, bool *num_err);
-static inline void parse_scalar(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object_field(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_object(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array_element(JsonLexContext *lex, JsonSemAction *sem);
-static void parse_array(JsonLexContext *lex, JsonSemAction *sem);
+static inline void parse_scalar(JsonLexContext *lex, JsonSemAction * sem);
+static void parse_object_field(JsonLexContext *lex, JsonSemAction * sem);
+static void parse_object(JsonLexContext *lex, JsonSemAction * sem);
+static void parse_array_element(JsonLexContext *lex, JsonSemAction * sem);
+static void parse_array(JsonLexContext *lex, JsonSemAction * sem);
 static void report_parse_error(JsonParseContext ctx, JsonLexContext *lex);
 static void report_invalid_token(JsonLexContext *lex);
 static int	report_json_context(JsonLexContext *lex);
@@ -68,6 +68,10 @@ static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
   bool use_line_feeds);
 static void array_to_json_internal(Datum array, StringInfo result,
 	   bool use_line_feeds);
+static void datum_to_json(Datum val, bool is_null, StringInfo result,
+			  TYPCATEGORY tcategory, Oid typoutputfunc, bool key_scalar);
+static void add_json(Datum orig_val, bool is_null, StringInfo result,
+		 Oid val_type, bool key_scalar);
 
 /* the null action object used for pure validation */
 static JsonSemAction nullSemAction =
@@ -257,7 +261,7 @@ makeJsonLexContext(text *json, bool need_escapes)
  * pointer to a state object to be passed to those routines.
  */
 void
-pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
+pg_parse_json(JsonLexContext *lex, JsonSemAction * sem)
 {
 	JsonTokenType tok;
 
@@ -293,7 +297,7 @@ pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
  *	  - object field
  */
 static inline void
-parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
+parse_scalar(JsonLexContext *lex, JsonSemAction * sem)
 {
 	char	   *val = NULL;
 	json_scalar_action sfunc = sem->scalar;
@@ -329,7 +333,7 @@ parse_scalar(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static void
-parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
+parse_object_field(JsonLexContext *lex, JsonSemAction * sem)
 {
 	/*
 	 * an object field is "fieldname" : value where value can be a scalar,
@@ -377,7 +381,7 @@ parse_object_field(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static void
-parse_object(JsonLexContext *lex, JsonSemAction *sem)
+parse_object(JsonLexContext *lex, JsonSemAction * sem)
 {
 	/*
 	 * an object is a possibly empty sequence of object fields, separated by
@@ -425,7 +429,7 @@ parse_object(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static void
-parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
+parse_array_element(JsonLexContext *lex, JsonSemAction * sem)
 {
 	json_aelem_action astart = sem->array_element_start;
 	json_aelem_action aend = sem->array_element_end;
@@ -456,7 +460,7 @@ parse_array_element(JsonLexContext *lex, JsonSemAction *sem)
 }
 
 static void
-parse_array(JsonLexContext *lex, JsonSemAction *sem)
+parse_array(JsonLexContext *lex, JsonSemAction * sem)
 {
 	/*
 	 * an array is a possibl

Re: [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeff Janes

On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman  wrote:

> On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote:
> > >
> > > That could be something we look at. There are cases buried deep in the
> > > VM where pages get shuffled to the end of the LRU and get tagged for
> > > reclaim as soon as possible. Maybe you need access to something like
> > > that via posix_fadvise to say "reclaim this page if you need memory but
> > > leave it resident if there is no memory pressure" or something similar.
> > > Not exactly sure what that interface would look like or offhand how it
> > > could be reliably implemented.
> > >
> >
> > I think the "reclaim this page if you need memory but leave it resident
> if
> > there is no memory pressure" hint would be more useful for temporary
> > working files than for what was being discussed above (shared buffers).
> >  When I do work that needs large temporary files, I often see physical
> > write IO spike but physical read IO does not.  I interpret that to mean
> > that the temporary data is being written to disk to satisfy either
> > dirty_expire_centisecs or dirty_*bytes, but the data remains in the FS
> > cache and so disk reads are not needed to satisfy it.  So a hint that
> says
> > "this file will never be fsynced so please ignore dirty_*bytes and
> > dirty_expire_centisecs.
>
> It would be good to know if dirty_expire_centisecs or dirty ratio|bytes
> were the problem here.

Is there an easy way to tell?  I would guess it has to be at least
dirty_expire_centisecs, if not both, as a very large sort operation takes a
lot more than 30 seconds to complete.

> An interface that forces a dirty page to stay dirty
> regardless of the global system would be a major hazard. It potentially
> allows the creator of the temporary file to stall all other processes
> dirtying pages for an unbounded period of time.

Are the dirty ratio/bytes limits the mechanisms by which adequate clean
memory is maintained?  I thought those were there just to but a limit on
long it would take to execute a sync call should one be issued, and there
were other setting which said how much clean memory to maintain.  It should
definitely write out the pages if it needs the memory for other things,
just not write them out due to fear of how long it would take to sync it if
a sync was called.  (And if it needs the memory, it should be able to write
it out quickly as the writes would be mostly sequential, not
random--although how the kernel can believe me that that will always be the
case could a problem)

> I proposed in another part
> of the thread a hint for open inodes to have the background writer thread
> ignore dirty pages belonging to that inode. Dirty limits and fsync would
> still be obeyed. It might also be workable for temporary files but the
> proposal could be full of holes.
>

If calling fsync would fail with an error, would that lower the risk of DoS?

>
> Your alternative here is to create a private anonymous mapping as they
> are not subject to dirty limits. This is only a sensible option if the
> temporarily data is guaranteeed to be relatively small. If the shared
> buffers, page cache and your temporary data exceed the size of RAM then
> data will get discarded or your temporary data will get pushed to swap
> and performance will hit the floor.
>

PostgreSQL mainly uses temp files precisely when that gaurantee is hard to
make.  There is a pretty big margin where it is too big to be certain it
will fit in memory, so we have to switch to a disk-friendly
mostly-sequential algorithm.  Yet it would still be nice to avoid the
actual disk writes until we have observed that it actually is growing too
big.

Cheers,

Jeff

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeff Janes

On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner  wrote:

> On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
> > On 1/15/14, 12:00 AM, Claudio Freire wrote:
> > >My completely unproven theory is that swapping is overwhelmed by
> > >near-misses. Ie: a process touches a page, and before it's
> > >actually swapped in, another process touches it too, blocking on
> > >the other process' read. But the second process doesn't account
> > >for that page when evaluating predictive models (ie: read-ahead),
> > >so the next I/O by process 2 is unexpected to the kernel. Then
> > >the same with 1. Etc... In essence, swap, by a fluke of its
> > >implementation, fails utterly to predict the I/O pattern, and
> > >results in far sub-optimal reads.
> > >
> > >Explicit I/O is free from that effect, all read calls are
> > >accountable, and that makes a difference.
> > >
> > >Maybe, if the kernel could be fixed in that respect, you could
> > >consider mmap'd files as a suitable form of temporary storage.
> > >But that would depend on the success and availability of such a
> > >fix/patch.
> >
> > Another option is to consider some of the more "radical" ideas in
> > this thread, but only for temporary data. Our write sequencing and
> > other needs are far less stringent for this stuff.  -- Jim C.
>
> I suspect that a lot of the temporary data issues can be solved by
> using tmpfs for temporary files
>

Temp files can collectively reach hundreds of gigs.  So I would have to set
up two temporary tablespaces, one in tmpfs and one in regular storage, and
then remember to choose between them based on my estimate of how much temp
space is going to be used in each connection (and hope I don't mess up the
estimation and so either get errors, or render the server unresponsive).

So I just use regular storage, and pay the "insurance premium" of having
some extraneous write IO.  It would be nice if the insurance premium were
cheaper, though.  I think the IO storms during checkpoint syncs are
definitely the more critical issue, this is just something nice to have
which seemed to align with one the comments.

Cheers,

Jeff

Re: [HACKERS] proposal, patch: allow multiple plpgsql plugins

2014-01-16 Thread Marko Tiikkaja


Hi Pavel,

First of all, thanks for working on this!

On 1/12/14, 8:58 PM, Pavel Stehule wrote:

I still not happy with plugin_info - it is only per plugin now and should
be per plugin and per function.


I'm not sure I understand the point of plugin_info in the first place, 
but what would having a separate info per (plugin, function) achieve 
that can't be done otherwise?



As for the current patch, I'd like to see improvements on a few things:

  1) It doesn't currently compile because of extra semicolons in the
 PLpgSQL_plugin struct.

  2) The previous comment above the same struct still talk about the
 rendezvous variable which is now gone.  The comment should be
 updated to reflect the new API.

  3) The same comment talks about how important it is to unregister a
 plugin if its _PG_fini() is ever called, but the current API does
 not support unregistering.  That should probably be added?  I'm not
 sure when _PG_fini() would be called.

  4) The comment  /* reserved for use by optional plugin */  seems a bit
 weird in its new context.


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Filter error log statements by sqlstate

2014-01-16 Thread Oskari Saarenmaa


17.01.2014 00:13, Tom Lane kirjoitti:

Oskari Saarenmaa  writes:

[ 0001-Filter-error-log-statements-by-sqlstate.patch ]


I looked at this patch.  It took me some time to understand that what
it actually does has got approximately nothing to do with what one might
first expect: rather than suppressing the entire log message about some
error, it only suppresses printing of the triggering SQL statement
(if that's available to begin with).  The proposed documentation is
certainly not clear enough on that point, and the API which appears to
allow changing the error severity level associated with a SQLSTATE isn't
exactly helping to clarify things either.

Also, the patch claims it allows logging of statements that otherwise
wouldn't get logged, but AFAICS that's wrong, because we'll never get to
this code at all if errstart decides we're not going to log the message.


I agree the documentation (and perhaps the feature itself) are a bit 
confusing, but the idea is to control SQL statement logging when errors 
occur.  This patch doesn't do anything about normal error logging, it 
only controls when the statements are printed.


Running:

  set log_min_messages = 'warning';

  set log_min_error_statement = 'panic';
  set log_sqlstate_error_statement = '';
  do 'begin raise exception ''foobar 1''; end';

  set log_sqlstate_error_statement = 'P0001:error';
  do 'begin raise exception ''foobar 2''; end';

  set log_min_error_statement = 'error';
  set log_sqlstate_error_statement = 'P0001:panic';
  do 'begin raise exception ''foobar 3''; end';

logs

  2014-01-17 00:37:12 EET ERROR:  foobar 1
  2014-01-17 00:37:20 EET ERROR:  foobar 2
  2014-01-17 00:37:20 EET STATEMENT:  do 'begin raise exception 
''foobar 2''; end';

  2014-01-17 00:38:34 EET ERROR:  foobar 3


I find it hard to follow exactly what the use-case for this is; could
you make a defense of why we should carry a feature like this?


I usually run PG with log_min_messages = warning and 
log_min_error_statement = error to log any unexpected errors.  But as I 
have a lot of check constraints in my database as well as a lot of 
plpgsql and plpython code which raises exceptions on invalid user input 
I also get tons of logs for statements causing "expected" errors which I 
have to try to filter out with other tools.



In any case, I'm finding it hard to believe that filtering by individual
SQLSTATEs is a practical API.  When we've discussed providing better log
filtering in the past, that approach was invariably dismissed on the
grounds that it would be far too unwieldy to use --- any DBA attempting to
use it in anger would end up with a huge and ever-incomplete list of
SQLSTATEs he'd need to filter.  A long time ago I suggested that filtering
by SQLSTATE class (the first two characters of SQLSTATE) might be useful,
but I'm not sure I still believe that, and anyway it's not what's
implemented here.


I don't know about others, but filtering by individual SQLSTATE is 
exactly what I need - I don't want to suppress all plpgsql errors or 
constraint violations, but I may want to suppress plpgsql RAISE 
EXCEPTION and CHECK violations.



I'm concerned also about the potential performance impact of this patch,
if used with a SQLSTATE list as long as I think they're likely to get in
practice.  Have you done any performance testing?


Not yet.  As this only applies to statement logging (for now at least) I 
doubt it's a big deal, formatting and writing the statement somewhere is 
probably at least as expensive as looking up the configuration.



As far as the code goes, bit manipulations on uint64s are a pretty crummy
substitute for defining a struct with a couple of fields; and the way
you're choosing to sort the array seems mighty inefficient, as well as
probably wrong in detail (why is the loop ignoring the first array
element?); and rather than make fragile assumptions about the maximum
length of an elevel name, why not just leave the user's string as-is?
But I wouldn't advise worrying about code style issues until we decide if
we're accepting the feature.  Right now my inclination is to reject it.


I agree, I should've just defined a struct and used the original string 
length when rewriting user string (it's rewritten to drop any 
duplicates.)  I don't think the sort is a big deal, it's only done when 
the value is defined; the first array element is the length of the 
array.  I can address these points in a new version of this patch if the 
feature looks useful.


Thanks for the review!
 Oskari



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead, was: Re: ECPG fixes

2014-01-16 Thread Alvaro Herrera

Boszormenyi Zoltan escribió:
> Rebased patches after the regression test and other details were fixed
> in the infrastructure part.

This thread started in 2010, and various pieces have been applied
already and some others have changed in nature.  Would you please post a
new patchset, containing rebased patches that still need application, in
a new email thread to be linked in the commitfest entry?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Filter error log statements by sqlstate

2014-01-16 Thread Peter Eisentraut

Please fix the compiler warning:

elog.c: In function ‘check_log_sqlstate_error’:
elog.c:3789:41: warning: ‘new_newval_end’ may be used uninitialized in
this function [-Wuninitialized]


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Filter error log statements by sqlstate

2014-01-16 Thread Tom Lane

Oskari Saarenmaa  writes:
> [ 0001-Filter-error-log-statements-by-sqlstate.patch ]

I looked at this patch.  It took me some time to understand that what
it actually does has got approximately nothing to do with what one might
first expect: rather than suppressing the entire log message about some
error, it only suppresses printing of the triggering SQL statement
(if that's available to begin with).  The proposed documentation is
certainly not clear enough on that point, and the API which appears to
allow changing the error severity level associated with a SQLSTATE isn't
exactly helping to clarify things either.

Also, the patch claims it allows logging of statements that otherwise
wouldn't get logged, but AFAICS that's wrong, because we'll never get to
this code at all if errstart decides we're not going to log the message.

I find it hard to follow exactly what the use-case for this is; could
you make a defense of why we should carry a feature like this?

In any case, I'm finding it hard to believe that filtering by individual
SQLSTATEs is a practical API.  When we've discussed providing better log
filtering in the past, that approach was invariably dismissed on the
grounds that it would be far too unwieldy to use --- any DBA attempting to
use it in anger would end up with a huge and ever-incomplete list of
SQLSTATEs he'd need to filter.  A long time ago I suggested that filtering
by SQLSTATE class (the first two characters of SQLSTATE) might be useful,
but I'm not sure I still believe that, and anyway it's not what's
implemented here.

I'm concerned also about the potential performance impact of this patch,
if used with a SQLSTATE list as long as I think they're likely to get in
practice.  Have you done any performance testing?

As far as the code goes, bit manipulations on uint64s are a pretty crummy
substitute for defining a struct with a couple of fields; and the way
you're choosing to sort the array seems mighty inefficient, as well as
probably wrong in detail (why is the loop ignoring the first array
element?); and rather than make fragile assumptions about the maximum
length of an elevel name, why not just leave the user's string as-is?
But I wouldn't advise worrying about code style issues until we decide if
we're accepting the feature.  Right now my inclination is to reject it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.3.2 Client-only installation per docs fails creating directory.

2014-01-16 Thread Mike Blackwell

Per the docs (15.4 Installation Procedure, Client-only installation), after
running make, the following should work:

gmake -C src/bin installgmake -C src/include installgmake -C
src/interfaces installgmake -C doc install


However, it fails on the first command:

[postgresql-9.3.2]$ sudo gmake -C src/bin install
gmake: Entering directory `/opt/postgresql-9.3.2/src/bin'
gmake -C initdb install
gmake[1]: Entering directory `/opt/postgresql-9.3.2/src/bin/initdb'
gmake -C ../../../src/port all
[successful stuff cut here]
/bin/mkdir -p '/usr/local/pgsql-9.3.2/bin'
/usr/bin/install -c  psql '/usr/local/pgsql-9.3.2/bin/psql'
/usr/bin/install -c -m 644 ./psqlrc.sample
'/usr/local/pgsql-9.3.2/share/psqlrc.sample'
/usr/bin/install: cannot create regular file
`/usr/local/pgsql-9.3.2/share/psqlrc.sample': No such file or directory
gmake[1]: *** [install] Error 1
gmake[1]: Leaving directory `/opt/postgresql-9.3.2/src/bin/psql'
gmake: *** [install-psql-recurse] Error 2
gmake: Leaving directory `/opt/postgresql-9.3.2/src/bin'

The directory 'share' does not exist, which seem to be the issue.  Is there
a missing dependency somewhere?

It appears the doc install correctly creates 'share', so installing src/bin
last works.

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com



* *

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Theodore Ts'o

On Wed, Jan 15, 2014 at 10:35:44AM +0100, Jan Kara wrote:
> Filesystems could in theory provide facility like atomic write (at least up
> to a certain size say in MB range) but it's not so easy and when there are
> no strong usecases fs people are reluctant to make their code more complex
> unnecessarily. OTOH without widespread atomic write support I understand
> application developers have similar stance. So it's kind of chicken and egg
> problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place
> due to its data=journal mode so if someone on the PostgreSQL side wanted to
> research on this, knitting some experimental ext4 patches should be doable.

For the record, a researcher (plus is PhD student) at HP Labs actually
implemented a prototype based on ext3 which created an atomic write
facility.  It was good up to about 25% of the ext4 journal size (so, a
couple of MB), and it was use to research using persistent memory by
creating a persistent heap using standard in-memory data structures as
a replacement for using a database.

The results of their research work was that showed that ext3 plus
atomic write plus standard Java associative arrays beat using Sqllite.

It was a research prototype, so they didn't handle OOM kill
conditions, and they also didn't try benchmarking against a real
database instead of a toy database such as SqlLite, but if someone
wants to experiment with Atomic write, there are patches against ext3
that we can probably get from HP Labs.

  - Ted

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeff Layton

On Wed, 15 Jan 2014 21:37:16 -0500
Robert Haas  wrote:

> On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara  wrote:
> > On Wed 15-01-14 10:12:38, Robert Haas wrote:
> >> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara  wrote:
> >> > Filesystems could in theory provide facility like atomic write (at least 
> >> > up
> >> > to a certain size say in MB range) but it's not so easy and when there 
> >> > are
> >> > no strong usecases fs people are reluctant to make their code more 
> >> > complex
> >> > unnecessarily. OTOH without widespread atomic write support I understand
> >> > application developers have similar stance. So it's kind of chicken and 
> >> > egg
> >> > problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place
> >> > due to its data=journal mode so if someone on the PostgreSQL side wanted 
> >> > to
> >> > research on this, knitting some experimental ext4 patches should be 
> >> > doable.
> >>
> >> Atomic 8kB writes would improve performance for us quite a lot.  Full
> >> page writes to WAL are very expensive.  I don't remember what
> >> percentage of write-ahead log traffic that accounts for, but it's not
> >> small.
> >   OK, and do you need atomic writes on per-IO basis or per-file is enough?
> > It basically boils down to - is all or most of IO to a file going to be
> > atomic or it's a smaller fraction?
> 
> The write-ahead log wouldn't need it, but data files writes would.  So
> we'd need it a lot, but not for absolutely everything.
> 
> For any given file, we'd either care about writes being atomic, or we 
> wouldn't.
> 

Just getting caught up on this thread. One thing that you're just now
getting to here is that the different types of files in the DB have
different needs.

It might be good to outline each type of file (WAL, data files, tmp
files), what sort of I/O patterns are typically done to them, and what
sort of "special needs" they have (atomicity or whatever). Then we
could treat each file type as a separate problem, which may make some
of these problems easier to solve.

For instance, typically a WAL would be fairly sequential I/O, whereas
the data files are almost certainly random. It may make sense to
consider DIO for some of these use-cases, even if it's not suitable
everywhere.

For tempfiles, it may make sense to consider housing those on tmpfs.
They wouldn't go to disk at all that way, but if there is mem pressure
they could get swapped out (maybe this is standard practice already --
I don't know).

> > As Dave notes, unless there is HW support (which is coming with newest
> > solid state drives), ext4/xfs will have to implement this by writing data
> > to a filesystem journal and after transaction commit checkpointing them to
> > a final location. Which is exactly what you do with your WAL logs so
> > it's not clear it will be a performance win. But it is easy enough to code
> > for ext4 that I'm willing to try...
> 
> Yeah, hardware support would be great.
> 


-- 
Jeff Layton 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Dave Chinner

On Wed, Jan 15, 2014 at 07:31:15PM -0500, Tom Lane wrote:
> Dave Chinner  writes:
> > On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
> >> No, we'd be happy to re-request it during each checkpoint cycle, as
> >> long as that wasn't an unduly expensive call to make.  I'm not quite
> >> sure where such requests ought to "live" though.  One idea is to tie
> >> them to file descriptors; but the data to be written might be spread
> >> across more files than we really want to keep open at one time.
> 
> > It would be a property of the inode, as that is how writeback is
> > tracked and timed. Set and queried through a file descriptor,
> > though - it's basically the same context that fadvise works
> > through.
> 
> Ah, got it.  That would be fine on our end, I think.
> 
> >> We could probably live with serially checkpointing data
> >> in sets of however-many-files-we-can-have-open, if file descriptors are
> >> the place to keep the requests.
> 
> > Inodes live longer than file descriptors, but there's no guarantee
> > that they live from one fd context to another. Hence my question
> > about persistence ;)
> 
> I plead ignorance about what an "fd context" is.

open-to-close life time.

fd = open("some/file", );
.
close(fd);

is a single context. If multiple fd contexts of the same file
overlap in lifetime, then the inode is constantly referenced and the
inode won't get reclaimed so the value won't get lost. However, is
there is no open fd context, there are no external references to the
inode so it can get reclaimed. Hence there's not guarantee that the
inode is present and the writeback property maintained across
close-to-open timeframes.

> We're ahead of the game as long as it usually works.

*nod*

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jan Kara

On Wed 15-01-14 10:12:38, Robert Haas wrote:
> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara  wrote:
> > Filesystems could in theory provide facility like atomic write (at least up
> > to a certain size say in MB range) but it's not so easy and when there are
> > no strong usecases fs people are reluctant to make their code more complex
> > unnecessarily. OTOH without widespread atomic write support I understand
> > application developers have similar stance. So it's kind of chicken and egg
> > problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place
> > due to its data=journal mode so if someone on the PostgreSQL side wanted to
> > research on this, knitting some experimental ext4 patches should be doable.
> 
> Atomic 8kB writes would improve performance for us quite a lot.  Full
> page writes to WAL are very expensive.  I don't remember what
> percentage of write-ahead log traffic that accounts for, but it's not
> small.
  OK, and do you need atomic writes on per-IO basis or per-file is enough?
It basically boils down to - is all or most of IO to a file going to be
atomic or it's a smaller fraction?

As Dave notes, unless there is HW support (which is coming with newest
solid state drives), ext4/xfs will have to implement this by writing data
to a filesystem journal and after transaction commit checkpointing them to
a final location. Which is exactly what you do with your WAL logs so
it's not clear it will be a performance win. But it is easy enough to code
for ext4 that I'm willing to try...

Honza
-- 
Jan Kara 
SUSE Labs, CR


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jan Kara

On Wed 15-01-14 21:37:16, Robert Haas wrote:
> On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara  wrote:
> > On Wed 15-01-14 10:12:38, Robert Haas wrote:
> >> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara  wrote:
> >> > Filesystems could in theory provide facility like atomic write (at least 
> >> > up
> >> > to a certain size say in MB range) but it's not so easy and when there 
> >> > are
> >> > no strong usecases fs people are reluctant to make their code more 
> >> > complex
> >> > unnecessarily. OTOH without widespread atomic write support I understand
> >> > application developers have similar stance. So it's kind of chicken and 
> >> > egg
> >> > problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place
> >> > due to its data=journal mode so if someone on the PostgreSQL side wanted 
> >> > to
> >> > research on this, knitting some experimental ext4 patches should be 
> >> > doable.
> >>
> >> Atomic 8kB writes would improve performance for us quite a lot.  Full
> >> page writes to WAL are very expensive.  I don't remember what
> >> percentage of write-ahead log traffic that accounts for, but it's not
> >> small.
> >   OK, and do you need atomic writes on per-IO basis or per-file is enough?
> > It basically boils down to - is all or most of IO to a file going to be
> > atomic or it's a smaller fraction?
> 
> The write-ahead log wouldn't need it, but data files writes would.  So
> we'd need it a lot, but not for absolutely everything.
> 
> For any given file, we'd either care about writes being atomic, or we
> wouldn't.
  OK, when you say that either all writes to a file should be atomic or
none of them should be, then can you try the following:
chattr +j 

  will turn on data journalling for  on ext3/ext4 filesystem.
Currently it *won't* guarantee the atomicity in all the cases but the
performance will be very similar as if it would. You might also want to
increase filesystem journal size with 'tune2fs -J size=XXX /dev/yyy' where
XXX is desired journal size in MB. Default is 128 MB I think but with
intensive data journalling you might want to have that in GB range. I'd be
interested in hearing what impact does turning 'atomic write' support
in PostgreSQL and using data journalling on ext4 have.

Honza
-- 
Jan Kara 
SUSE Labs, CR


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

2014-01-16 Thread Stephen Frost

Greetings,

  It's a day late and I'm a dollar short, but attached is a (very) minor
  patch to allow users to more easily move their various objects from
  one tablespace to another.  Included are docs and a regression test;
  I'm happy to improve on both should folks send me suggestions.

  As we use tablespaces quite a bit, this can be extremely handy for us
  and I expect others will find it useful too.

  Thoughts?

Thanks,

Stephen
diff --git a/doc/src/sgml/ref/alter_tablespace.sgml b/doc/src/sgml/ref/alter_tablespace.sgml
new file mode 100644
index 7d3ee2c..fcf4155 100644
*** a/doc/src/sgml/ref/alter_tablespace.sgml
--- b/doc/src/sgml/ref/alter_tablespace.sgml
*** PostgreSQL documentation
*** 12,18 
  
   
ALTER TABLESPACE
!   change the definition of a tablespace
   
  
   
--- 12,18 
  
   
ALTER TABLESPACE
!   change the definition of a tablespace or affect objects of a tablespace
   
  
   
*** ALTER TABLESPACE namename OWNER TO new_owner
  ALTER TABLESPACE name SET ( tablespace_option = value [, ... ] )
  ALTER TABLESPACE name RESET ( tablespace_option [, ... ] )
+ ALTER TABLESPACE name MOVE ALL TO new_tablespace [ NOWAIT ]
  
   
  
*** ALTER TABLESPACE nameDescription
  

!ALTER TABLESPACE changes the definition of
!a tablespace.

  

!You must own the tablespace to use ALTER TABLESPACE.
 To alter the owner, you must also be a direct or indirect member of the new
 owning role.
 (Note that superusers have these privileges automatically.)

   
  
--- 33,60 
Description
  

!ALTER TABLESPACE can be used to change the definition of
!a tablespace or to migrate all of the objects in the current database which
!are owned by the user out of a given tablespace.

  

!You must own the tablespace to change the definition of a tablespace.
 To alter the owner, you must also be a direct or indirect member of the new
 owning role.
 (Note that superusers have these privileges automatically.)
+ 
+Users may use ALTER TABLESPACE ... MOVE ALL, but they must have CREATE
+rights on the new tablespace and only objects, directly or indirectly, owned
+by the user will be moved.  Note that the superuser is considered an owner
+of all objects and therefore an ALTER TABLESPACE ... MOVE ALL issued by the
+superuser will move all objects in the current database which are in the
+tablespace.
+ 
+System catalogs will not be moved by this command- individuals wishing to
+move a whole database should use ALTER DATABASE, or call ALTER TABLE on the
+individual system catalogs.  Note that relations in information_schema
+will be moved, just as any other normal database objects.

   
  
*** ALTER TABLESPACE name
  
 
+ 
+
+ new_tablespace
+ 
+  
+   The name of the tablespace to move objects into.  The user must have
+   CREATE rights on the new tablespace to move objects into that
+   tablespace, unless the tablespace being moved into is the default
+   tablespace for the database connected to.
+  
+ 
+
+ 
+
+ NOWAIT
+ 
+  
+   The NOWAIT option causes the ALTER TABLESPACE command to fail immediately
+   if it is unable to acquire the necessary lock on all of the objects being
+   move.
+  
+ 
+
+ 

   
  
*** ALTER TABLESPACE index_space RENAME TO f
*** 112,117 
--- 150,162 
  
  ALTER TABLESPACE index_space OWNER TO mary;
  
+ 
+   
+Move all of the objects which I own from the default tablespace to
+the fast_raid tablespace:
+ 
+ ALTER TABLESPACE pg_default MOVE ALL TO fast_raid;
+ 
   
  
   
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
new file mode 100644
index 07f5221..c47d13c 100644
*** a/src/backend/commands/tablespace.c
--- b/src/backend/commands/tablespace.c
***
*** 59,78 
--- 59,83 
  #include "catalog/catalog.h"
  #include "catalog/dependency.h"
  #include "catalog/indexing.h"
+ #include "catalog/namespace.h"
  #include "catalog/objectaccess.h"
+ #include "catalog/pg_namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "commands/comment.h"
  #include "commands/seclabel.h"
+ #include "commands/tablecmds.h"
  #include "commands/tablespace.h"
  #include "common/relpath.h"
  #include "miscadmin.h"
  #include "postmaster/bgwriter.h"
  #include "storage/fd.h"
+ #include "storage/lmgr.h"
  #include "storage/standby.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/fmgroids.h"
  #include "utils/guc.h"
+ #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
  #include "utils/tqual.h"
*** AlterTableSpaceOptions(AlterTableSpaceOp
*** 956,961 
--- 961,1101 
  }
  
  /*
+  * Alter table space move all
+  */
+ Oid
+ AlterTableSpaceMove(AlterTableSpaceMo

[HACKERS] ALTER TABLE ... SET TABLESPACE pg_default

2014-01-16 Thread Stephen Frost

Greetings,

  Harking back to 10 years ago when tablespaces were added, it looks
  like we originally figured that users didn't need permissions to
  create tables in the database default, per 2467394e.  That strikes me
  as perfectly fair.  Unfortunately, the later addition of
  ALTER TABLE ... SET TABLESPACE (af4de814) didn't get the memo about
  the default tablespace being special in this regard and refuses to let
  a user move their tables into the default tablespace, even though they
  can do so via 'CREATE TABLE ... AS SELECT * FROM ...'.

  Barring objections, I'll add the same conditional around the AclCheck
  in ATPrepSetTableSpace() as exists in DefineRelation() to allow users
  to ALTER TABLE ... SET TABLESPACE into the database's default
  tablespace and backpatch accordingly.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Backup throttling

2014-01-16 Thread Peter Geoghegan

On Thu, Jan 16, 2014 at 12:03 PM, Andres Freund  wrote:
> slightly related: we should start to reuse procLatch for walsenders
> instead of having a separate latch someday.

+1. The potential for bugs from failing to account for this within
signal handlers seems like a concern. I think that every process
should use the process latch, except for the archiver which uses a
local latch because it pointedly does not touch shared memory. I think
I wrote a comment that made it into the latch header comments
encouraging this, but never saw to it that it was universally adhered
to.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes

2014-01-16 Thread Alvaro Herrera

Alvaro Herrera escribió:
> Boszormenyi Zoltan escribió:
> 
> > All patches are attached again for completeness.
> 
> Thanks.  I pushed a commit comprising patches 09 through 14.

Now also pushed 15 to 17.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink performance regression

2014-01-16 Thread Joe Conway


yes

Joe

Sent with AquaMail for Android
http://www.aqua-mail.com


On January 16, 2014 2:32:55 PM Josh Berkus  wrote:


On 12/07/2013 05:50 PM, Joe Conway wrote:
> On 12/07/2013 05:41 PM, Fabrízio de Royes Mello wrote:
> >> On Sat, Dec 7, 2013 at 11:20 PM, Michael Paquier >> 
mailto:michael.paqu...@gmail.com>>

>> wrote:

 IMHO is more elegant create a procedure to encapsulate the code
 to avoid redundancy.
>>> Yep, perhaps something like PQsetClientEncodingIfDifferent or
>>> similar would make sense.
> >> Well I think at this first moment we can just create a procedure
>> inside the dblink contrib and not touch in libpq.
> Maybe a libpq function could be done for 9.4, but not for back branches.
> I don't think it makes sense to create a new function in dblink either
> -- we're only talking about two lines of added redundancy which is
> less lines of code than a new function would add. But if we create
> PQsetClientEncodingIfDifferent() (or whatever) we can remove those
> extra lines in 9.4 ;-)

Hey, since we're about to do 9.3.3: was this patch ever committed?


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Display oprcode and its volatility in \do+

2014-01-16 Thread Marko Tiikkaja


On 1/16/14, 9:32 PM, Tom Lane wrote:

Given the lack of other votes, I pushed this without a volatility column,
and with some other changes --- mostly, I kept the Description column
last, because that's how all the other \d commands do it.


Thanks!  And looks like I missed the documentation as well, sorry about 
that. :-(



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Tom Lane

Andres Freund  writes:
> On 2014-01-16 14:23:47 -0500, Tom Lane wrote:
>> Dunno, I think that a transition state containing both an int64 and
>> a (presumably separately palloc'd) numeric will be a real PITA.

> Yea, not sure myself. I just dislike the idea of having a good part of a
> 128bit math implementation for a single transition function.

Not sure how you figure that we need very much new code beyond the
overflow test.

> Well, you don't need to check the second variable for lots of
> operations. Say, the current sum is 0 and you add a -1. With the two
> variables scheme that requires checking the second variable,
> manipulating it etc.

I'm envisioning just

state->lowhalf += input;
if (overflowed_up)
state->highhalf++;
else if (overflowed_down)
state->highhalf--;

The only thing that might take a moment's thought, or extra cycles in the
normal case, is extending the overflow test so that it can tell whether
we need to increment or decrement the upper half.

[ thinks a bit... ]  Note that I'm supposing that the state is defined
as (highhalf * 2^64) + lowhalf, not that we need the two variables to
be exactly a 128-bit twos-complement value, which is what I think
you're talking about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Display oprcode and its volatility in \do+

2014-01-16 Thread Tom Lane

I wrote:
> Anybody else have an opinion?

Given the lack of other votes, I pushed this without a volatility column,
and with some other changes --- mostly, I kept the Description column
last, because that's how all the other \d commands do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup throttling

2014-01-16 Thread Andres Freund

Hi,

On 2014-01-15 18:52:32 -0300, Alvaro Herrera wrote:
> Another thing I found a bit strange was the use of the latch.  What this
> patch does is create a separate latch which is used for the throttling.
> This means that if the walsender process receives a signal, it will not
> wake up if it's sleeping in throttling.  Perhaps this is okay: as Andres
> was quoted upthread as saying, maybe this is not a problem because the
> sleep times are typically short anyway.  But we're pretty much used to
> the idea that whenever a signal is sent, processes act on it
> *immediately*.  Maybe some admin will not feel comfortable about waiting
> some extra 20ms when they cancel their base backups.  In any case,
> having a secondary latch to sleep on in a process seems weird.  Maybe
> this should be using MyWalSnd->latch somehow.

Yes, this definitely should reuse MyWalSnd->latch.

slightly related: we should start to reuse procLatch for walsenders
instead of having a separate latch someday.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes

2014-01-16 Thread Alvaro Herrera

Boszormenyi Zoltan escribió:

> All patches are attached again for completeness.

Thanks.  I pushed a commit comprising patches 09 through 14.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Andres Freund

On 2014-01-16 14:23:47 -0500, Tom Lane wrote:
> > Maybe it's instead sufficient to just have flag indicating that you're
> > working with a state that hasn't overflowed so far and just plain int8
> > math as long as that's the case, and entirely fall back to the current
> > path once overflowed. That will probably be slightly faster and easily
> > handle the majority of cases since overflowing int8 ought to be pretty
> > rare in the real world.
> 
> Dunno, I think that a transition state containing both an int64 and
> a (presumably separately palloc'd) numeric will be a real PITA.

Yea, not sure myself. I just dislike the idea of having a good part of a
128bit math implementation for a single transition function.

Another alternative would be a configure check for compiler/native
128bit math and fall back to the current implementation if none is
provided... That should give decent performance with a pretty low amount
of code for most platforms.

> And it will not be faster, because the principal drag on performance
> will just be the overflow test, which you have to do either way.

Well, you don't need to check the second variable for lots of
operations. Say, the current sum is 0 and you add a -1. With the two
variables scheme that requires checking the second variable,
manipulating it etc.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-16 Thread Peter Eisentraut

You appear to be generating your patches with git diff --no-prefix.
Don't do that, leave the a/ and b/ in.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink performance regression

2014-01-16 Thread Josh Berkus

On 12/07/2013 05:50 PM, Joe Conway wrote:
> On 12/07/2013 05:41 PM, Fabrízio de Royes Mello wrote:
> 
>> On Sat, Dec 7, 2013 at 11:20 PM, Michael Paquier 
>> mailto:michael.paqu...@gmail.com>>
>> wrote:

 IMHO is more elegant create a procedure to encapsulate the code
 to avoid redundancy.
>>> Yep, perhaps something like PQsetClientEncodingIfDifferent or
>>> similar would make sense.
> 
>> Well I think at this first moment we can just create a procedure
>> inside the dblink contrib and not touch in libpq.
> 
> Maybe a libpq function could be done for 9.4, but not for back branches.
> 
> I don't think it makes sense to create a new function in dblink either
> -- we're only talking about two lines of added redundancy which is
> less lines of code than a new function would add. But if we create
> PQsetClientEncodingIfDifferent() (or whatever) we can remove those
> extra lines in 9.4 ;-)

Hey, since we're about to do 9.3.3: was this patch ever committed?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Tom Lane

Andres Freund  writes:
> You'll have to handle adding negative values and underflow as
> well.

Right.

> Maybe it's instead sufficient to just have flag indicating that you're
> working with a state that hasn't overflowed so far and just plain int8
> math as long as that's the case, and entirely fall back to the current
> path once overflowed. That will probably be slightly faster and easily
> handle the majority of cases since overflowing int8 ought to be pretty
> rare in the real world.

Dunno, I think that a transition state containing both an int64 and
a (presumably separately palloc'd) numeric will be a real PITA.
And it will not be faster, because the principal drag on performance
will just be the overflow test, which you have to do either way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Tom Lane

Jeff Janes  writes:
> I think the argument is that drawing the next value from a sequence can
> generate xlog that needs to be flushed, but doesn't assign an xid.

> I would think the sequence should flush that record before it hands out the
> value, not before the commit, but...

IIRC the argument was that we'd flush WAL before any use of the value
could make it to disk.  Which is true if you're just inserting it into
a table; perhaps less so if the client is doing something external to
the database with it.  (But it'd be reasonable to say that clients
who want a guaranteed-good serial for such purposes should have to
commit the transaction that created the value.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Andres Freund

On 2014-01-16 21:07:33 +0200, Heikki Linnakangas wrote:
> On 01/16/2014 08:59 PM, Tom Lane wrote:
> >Heikki Linnakangas  writes:
> >>I propose that we reimplement sum(bigint) in a more efficient way: For
> >>the internal state, let's use an int8 and a numeric overflow field. The
> >>transition function adds to the int8 variable, and checks for overflow.
> >>On overflow, increment the numeric field by one. In the final function,
> >>multiply the numeric by 2^64, and add the residual int8 value.
> >
> >It'd probably be sufficient to handle it as two int64 fields (handmade
> >128-bit arithmetic, or maybe even not so handmade if that ever gets
> >reasonably common among C compilers).
> 
> True. That would be sufficient for summing 2^64 int8s of INT64_MAX. That
> sounds like enough, especially considering that that count() will overflow
> after that too.

You'll have to handle adding negative values and underflow as
well.
Maybe it's instead sufficient to just have flag indicating that you're
working with a state that hasn't overflowed so far and just plain int8
math as long as that's the case, and entirely fall back to the current
path once overflowed. That will probably be slightly faster and easily
handle the majority of cases since overflowing int8 ought to be pretty
rare in the real world.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Heikki Linnakangas


On 01/16/2014 08:59 PM, Tom Lane wrote:

Heikki Linnakangas  writes:

I propose that we reimplement sum(bigint) in a more efficient way: For
the internal state, let's use an int8 and a numeric overflow field. The
transition function adds to the int8 variable, and checks for overflow.
On overflow, increment the numeric field by one. In the final function,
multiply the numeric by 2^64, and add the residual int8 value.


It'd probably be sufficient to handle it as two int64 fields (handmade
128-bit arithmetic, or maybe even not so handmade if that ever gets
reasonably common among C compilers).


True. That would be sufficient for summing 2^64 int8s of INT64_MAX. That 
sounds like enough, especially considering that that count() will 
overflow after that too.



 You're assuming the final output is still numeric, right?


Yep.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Andres Freund

On 2014-01-16 11:01:29 -0800, Jeff Janes wrote:
> I think the argument is that drawing the next value from a sequence can
> generate xlog that needs to be flushed, but doesn't assign an xid.

Then that should assign an xid. Which would yield correct behaviour with
async commit where it's currently *not* causing a WAL flush at all
unless a page boundary is crossed.

I've tried arguing that way before...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Peter Eisentraut

The documentation doesn't build:

openjade:config.sgml:1416:11:E: end tag for "VARIABLELIST" omitted, but OMITTAG 
NO was specified
openjade:config.sgml:1390:5: start tag was here



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Jeff Janes

On Thu, Jan 16, 2014 at 9:37 AM, Andres Freund wrote:

> On 2014-01-16 09:25:51 -0800, Jeff Janes wrote:
> > On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund  >wrote:
> >
> > > On 2013-11-21 14:40:36 -0800, Jeff Janes wrote:
> > > > But if the transaction would not have otherwise generated WAL (i.e. a
> > > > select that did not have to do any HOT pruning, or an update with
> zero
> > > rows
> > > > matching the where condition), doesn't it now have to flush and wait
> when
> > > > it would otherwise not?
> > >
> > > We short circuit that if there's no xid assigned. Check
> > > RecordTransactionCommit().
> > >
> >
> > It looks like that only short-circuits the flush if both there is no xid
> > assigned, and !wrote_xlog.  (line 1054 of xact.c)
>
> Hm. Indeed. Why don't we just always use the async commit behaviour for
> that? I don't really see any significant dangers from doing so?
>

I think the argument is that drawing the next value from a sequence can
generate xlog that needs to be flushed, but doesn't assign an xid.

I would think the sequence should flush that record before it hands out the
value, not before the commit, but...

>
> It's also rather odd to use the sync rep mechanisms in such
> scenarios... The if() really should test markXidCommitted instead of
> wrote_xlog.
>
> > I do see stalls on fdatasync on flush from select statements which had no
> > xid, but did generate xlog due to HOT pruning, I don't see why WAL
> logging
> > hint bits would be different.
>
> Are the stalls at commit or while the select is running? If wal_buffers
> is filled too fast, which can easily happen if loads of pages are hinted
> and wal logged, that will happen independently from
> RecordTransactionCommit().
>

In the real world, I'm not sure what the distribution is.

But in my present test case, they are coming almost exclusively from
RecordTransactionCommit.

I use "pgbench -T10" in a loop to generate dirty data and checkpoints (with
synchronous_commit on but with a BBU), and then to probe the consequences I
use:

pgbench -T10 -S -n --startup='set synchronous_commit='$f

(where --startup is an extension to pgbench proposed a few months ago)

Running the select-only query with synchronous_commit off almost completely
isolates it from the checkpoint drama that otherwise has a massive effect
on it.  with synchronous_commit=on, it goes from 6000 tps normally to 30
tps during the checkpoint sync, with synchronous_commit=off it might dip to
4000 or so during the worst of it.

(To be clear, this is about the pruning, not the logging of the hint bits)

Cheers,

Jeff

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Josh Berkus

On 01/16/2014 07:32 AM, Christian Kruse wrote:
> Hi Alvaro,
> 
> On 16/01/14 10:21, Alvaro Herrera wrote:
>> 1. it is to be read automatically by the server without need for an
>>"include_dir conf.d" option in the main postgresql.conf.
> 
> +1
> 
>> 4. there is no point in "disabling" it, and thus we offer no mechanism
>>to do that.
> 
> Not only there is „no point“ in disabling it, it makes this feature
> nearly useless. One can't rely on it if the distro may disable
> it. There are so many out there, it will never be a reliable feature
> if it can be disabled.

It would make *my* life vastly easier if we could mandate things like
the presence and relative directory of a conf.d.  However, if Apache
can't do it, we certainly can't.  Ultimately, we cannot impose things on
distributions which they are unwilling to support; Debian, for one, will
happily fork PostgreSQL rather than accept directory assignments which
don't meet their standards.

Also, enough people install PostgreSQL from source or using custom
packages to make for a high degree of variation anyway.

That's why I was just advocating changing the *defaults*, not mandating
anything.  Actual directory locations and usage should be configurable
by distros, packagers and users.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Tom Lane

Heikki Linnakangas  writes:
> I propose that we reimplement sum(bigint) in a more efficient way: For 
> the internal state, let's use an int8 and a numeric overflow field. The 
> transition function adds to the int8 variable, and checks for overflow. 
> On overflow, increment the numeric field by one. In the final function, 
> multiply the numeric by 2^64, and add the residual int8 value.

It'd probably be sufficient to handle it as two int64 fields (handmade
128-bit arithmetic, or maybe even not so handmade if that ever gets
reasonably common among C compilers).  You're assuming the final output
is still numeric, right?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-16 Thread Peter Eisentraut

On 1/3/14, 9:00 PM, Andrew Dunstan wrote:
> 
> Here is a patch for the new json functions I mentioned a couple of
> months ago. These are:
> 
> json_to_record
> json_to_recordset
> json_object
> json_build_array
> json_build_object
> json_object_agg
> 
> So far there are no docs, but the way these work is illustrated in the
> regression tests - I hope to have docs within a few days.

Compiler warnings:

json.c: In function ‘json_object_two_arg’:
json.c:2210:5: warning: unused variable ‘count’ [-Wunused-variable]

jsonfuncs.c: In function ‘json_to_record’:
jsonfuncs.c:1955:16: warning: unused variable ‘tuple’ [-Wunused-variable]
jsonfuncs.c:1953:18: warning: variable ‘rec’ set but not used 
[-Wunused-but-set-variable]

Also, please run your patch through git diff --check.  I have noticed
that several of your patches have hilarious whitespace, maybe
something with your editor.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Stephen Frost

* Josh Berkus (j...@agliodbs.com) wrote:
> On 01/16/2014 10:46 AM, Tom Lane wrote:
> > I'm fine if the proposal is that postgresql.conf include "include_dir
> > conf.d" by default (where that's read as relative to postgresql.conf's own
> > directory).  Even better if it's not terribly difficult for a packager to
> > change that, because I think some will want to.  We could possibly reduce
> > the need for packagers to change it if we made it be
> > "include_dir postgresql.d", because conf.d is a damn generic name for
> > something that might be in the same /etc directory as configs for other
> > packages.
> 
> FWIW, this is what I was proposing.  We have an "include_dir
> postgresql.conf.d" currently in the stock postgresql.conf, but it's
> disabled (commented out) by default.  I'd just like it enabled by
> default, and to pass a suggestion to the packagers that they pick an
> appropriate directory and enable it by default.

To this point- I've asked the Debian packager Martin Pitt to review and
comment on this thread when he gets a chance.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Josh Berkus

On 01/16/2014 10:46 AM, Tom Lane wrote:
> I'm fine if the proposal is that postgresql.conf include "include_dir
> conf.d" by default (where that's read as relative to postgresql.conf's own
> directory).  Even better if it's not terribly difficult for a packager to
> change that, because I think some will want to.  We could possibly reduce
> the need for packagers to change it if we made it be
> "include_dir postgresql.d", because conf.d is a damn generic name for
> something that might be in the same /etc directory as configs for other
> packages.

FWIW, this is what I was proposing.  We have an "include_dir
postgresql.conf.d" currently in the stock postgresql.conf, but it's
disabled (commented out) by default.  I'd just like it enabled by
default, and to pass a suggestion to the packagers that they pick an
appropriate directory and enable it by default.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Another point here is that hard-wiring a config directory location
>> into the executables completely breaks many scenarios for running
>> multiple clusters with the same executables.

> Therefore the proposal is not to hardwire the location in the
> executables.

At least some people seem to be advocating that, even if I misunderstood
whether you were.

I'm fine if the proposal is that postgresql.conf include "include_dir
conf.d" by default (where that's read as relative to postgresql.conf's own
directory).  Even better if it's not terribly difficult for a packager to
change that, because I think some will want to.  We could possibly reduce
the need for packagers to change it if we made it be
"include_dir postgresql.d", because conf.d is a damn generic name for
something that might be in the same /etc directory as configs for other
packages.

regards, tom lane

PS: off topic, but isn't ParseConfigDirectory leaking the result
of AbsoluteConfigLocation?  In both normal and error paths?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Heikki Linnakangas


On 01/16/2014 01:02 PM, David Rowley wrote:

sum(bigint) became a bit weird as it uses numeric types internally, so I
had to keep the do_numeric_discard() function to support it.


It's pretty weird that we have implemented sum(bigint) that way. I 
understand that the result is a numeric so that it won't overflow, but 
implementing it by converting every value to numeric is naive.


I propose that we reimplement sum(bigint) in a more efficient way: For 
the internal state, let's use an int8 and a numeric overflow field. The 
transition function adds to the int8 variable, and checks for overflow. 
On overflow, increment the numeric field by one. In the final function, 
multiply the numeric by 2^64, and add the residual int8 value.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > I missed the part of this where you point out that Apache on Debian has
> > some kind of problem because it's possible for an admin to remove the
> > 'IncludeDir sites-enabled' line from apache2.conf.
> 
> I don't see that this is parallel.  For one thing, I don't think you
> would normally have multiple apache2.conf files.  In the other hand,
> even if you had more than one, surely the one that's managed by the
> packaging system, if there is one, is not going to want to have that
> line changed, because that would immediately break all modules installed
> by packages.

My point above was that having it configurable does *not* make it a
burden on packagers or tool authors.  If it did, the Debian folks would
be complaining to ASF to have the option removed and instead required.
Indeed, the fact that it's configurable, in my view, is how we ended up
with sites-available and sites-enabled in the first place.  Perhaps we'd
have gotten there without it, but it sure feels like having the option
was a help and not a hindrence.

> I just checked my system, and it seems to me that most cases that have
> .conf files and something.conf.d directly in /etc are more because of
> historical reasons than because they have made a conscious, contemporary
> decision to do that.  Do we want to replicate choices made in the
> 1990's?

There are people who are still running OS's from the 1990's and forcing
what we feel is the 'right' answer on users is helping ourselves more
than helping our users, imv.

> Yeah, so instead of making a rational choice for people using all
> platforms, mirroring the rational choice made by Linux/Debian/GNU, lets
> make an irrational choice so that everybody will be constantly bitten by
> it, regardless of platform.

This argument that it only works if it's hard-coded simply doesn't fly
when you look at the existing examples in Debian.

> > To allow admins to configure it?  Perhaps their policy is that it must
> > be postgresql.conf.d
> 
> This example sounds rather weak.  I mean, if they *must* have
> /etc/postgresql/9.2/main/postgresql.conf.d instead of
> /etc/postgresql/9.2/main/conf.d, why aren't they forced to have 

You're being completely Debian-centric here.  It's possible they've got
their own /etc directory structure which doesn't match your
expectations- and that's *fine* until we start forcing the issue on
them.  Perhaps they prefer to have /etc/postgresql.conf and
/etc/postgresql.conf.d, why should we be preventing them from doing so
just because the OS's and distro's that we run don't work that way?

> > or that they want it served off of an NFS mount instead of being
> > pulled from /etc,
> 
> So conf.d would be in an NFS mount but postgresql.conf would not?

That's correct- and it wouldn't be mounted at
/etc/postgresql/9.2/main/conf.d, and please don't start in with the
symlinks argument- *that* would be a mess.

> > I don't want PG randomly guessing at directories to go poking around in
> > for config files.  It's also documentation.
> 
> Documentation is fine.  We can have a comment at the top of
> postgresql.conf saying "also see files in postgresql.conf.d/" or some
> such.  I can hardly thing that opening and reading a subdirectory named
> conf.d or postgresql.conf.d, *in the same directory as postgresql.conf*
> is "randomly poking".

It's completely random when we have no idea where the postgesql.conf is
nor what other files or directories exist there today.  Perhaps it's
100% true that, no where on this planet, a postgresql.conf file and a
conf.d directory/file/socket/whateve share a directory, but I wouldn't
bet on it.  Hell, I'm half tempted to go make sure just to be sure to
prove anyone wrong who argues that case. ;)

Remember- these would not just be new installs, someone simply running
pg_upgrade to move to 9.4 would suddenly find their database server
trying to open a non-existant directory wherever their postgresql.conf
file resides.

Another issue is how does this scale up to directories for pg_hba.conf
or pg_ident.conf or friends- are we going to hard-code directories for
those too?

> If you want to argue that multiple lines setting the same variable
> should raise an error instead of "last definition wins", please do
> propose that separately.

I've brought it up a number of times before- having the lines be in
different files just makes it that much worse.

> > Which cluster directory is the packager of PostGIS going to put his
> > config snippets into, exactly?  Even if you force the hard-code conf.d
> > idea, the packager will still need to figure out what clusters exist.
> 
> This was a thinko on my part, I admit, which doesn't invalidate the rest
> of my argumentation.  I think the way to think about this would be
> something like having mode in pg_ctlcluster (for example) that enables
> postgis for a certain cluster, which means it drops a certain file in
> the cluster's conf.d di

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Alvaro Herrera

Tom Lane wrote:
> Stephen Frost  writes:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> >> 1. it is to be read automatically by the server without need for an
> >> "include_dir conf.d" option in the main postgresql.conf.
> 
> > I am not thrilled with the idea that we're claiming ownership of the
> > directory which postgresql.conf happens to reside in and you had better
> > not have any 'conf.d' directory in there unless it's the one for PG.
> 
> If we were to do this, I'd argue that the location of this hard-wired
> config directory ought to be selected by a configure option.   And
> in fact, I'd argue that the default behavior with no such option be
> that there's no such hard-wired directory.  That puts it entirely
> on the packager to decide what makes sense as a location.

I fail to see the sense in this.  Surely the only useful option is to
use a relative path: an absolute path is completely useless because then
there's no way to have multiple clusters using the same executables, as
you say.  Since it has to be relative to *something*, there aren't that
many options: we could make it relative to the directory where the
executables reside (which would be absolutely pointless), or relative to
the data directory (which we already ruled out for reasons well
understood), or relative to the directory containing the other
configuration files.

> On the whole though, I find the argument that we can't configure this
> in postgresql.conf to be exceedingly weak.  In particular, the idea
> that you can puppet-ify a configuration without any knowledge of the
> distro you're targeting is ridiculous on its face,

Surely puppetification is not the only use case for conf.d.  For
example, suppose for a minute that a module providing an extension wants
to offer a program to initialize its configuration file into some
configuration directory (say, an hypothetical PostGIS package drops a
"postgis.conf" into conf.d with initial values for all the config
options it provides).  If you're not administering postgresql.conf with
puppet, you'd like to be able to do this to the clusters you have
without having to figure out *where* to put the files (that is, without
having to parse and possibly modify postgresql.conf).  Just dump it in
the cluster's conf.d.

> as is the idea that we/postgres can dictate configuration file
> location conventions to packagers who have distro rules to follow.

I don't see that this is ridiculous at all.  Configuration files always
go together; a distro rule that some config files go in some place and
other configuration files go somewhere else, sounds more ridiculous to
me, actually.

> There isn't going to be a "one location to rule them all", and that
> means the argument that a location determined by postgresql.conf is
> too unstable does not really hold water.

postgresql.conf's location already rules them all.  Remember, my
proposal upthread was to have conf.d be located in the same directory
that holds postgresql.conf.

> Another point here is that hard-wiring a config directory location
> into the executables completely breaks many scenarios for running
> multiple clusters with the same executables.

Therefore the proposal is not to hardwire the location in the
executables.

A server with multiple clusters using the same executables needs a way
to specify different ports for each (at the very least); therefore they
already have separate postgresql.conf.  Each directory containing such
file can contain a conf.d subdirectory with extra stuff perfectly well.

Stephen Frost wrote:

> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

> > In /etc you have /etc/postfix/ where you have Postfix files, and
> > /etc/exim4 where you have Exim files, and /etc/apache2 or whatever.  And
> > if each of those software packages uses a "conf.d" or "conf-enabled"
> > directory, they use a hardcoded name for a place inside their own
> > subdirectory, not one directly in /etc.
> 
> I missed the part of this where you point out that Apache on Debian has
> some kind of problem because it's possible for an admin to remove the
> 'IncludeDir sites-enabled' line from apache2.conf.

I don't see that this is parallel.  For one thing, I don't think you
would normally have multiple apache2.conf files.  In the other hand,
even if you had more than one, surely the one that's managed by the
packaging system, if there is one, is not going to want to have that
line changed, because that would immediately break all modules installed
by packages.

> > What you don't have is /etc/exim4.conf and /etc/postfix.conf and
> > /etc/apache.conf, where each of those files specifies that I would
> > please like to have extra files loaded from /etc/exim.conf.d and
> > /etc/apache.conf.d, and that packages please figure out by parsing my
> > config file in whatever random format I have invented to figure out
> > where snippets are to be stored.
> 
> Right, there's absolutely zero cases of /etc/*.conf files these days.
> Nor are

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Jeff Janes

On Thu, Jan 16, 2014 at 8:19 AM, Tom Lane  wrote:

>
> > I think the usecases that would want this for DML probably also wan this
> > to work for unlogged, temp tables.
>
> Huh?  Unlogged tables generate *zero* WAL, by definition.
>

Transactions that only change unlogged tables still generate commit records
to WAL.

I don't think that amount of WAL is particularly relevant to this
discussion, but I was recently surprised by it, so wanted to publicize it.
 (It was causing a lot of churn in my WAL due to interaction with
archive_timeout)

Cheers,

Jeff

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-16 Thread Florian Pflug

On Jan16, 2014, at 09:07 , David Rowley  wrote:
> On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug  wrote:
>> The notnullcount machinery seems to apply to both STRICT and non-STRICT 
>> transfer function pairs. Shouldn't that be constrained to STRICT transfer 
>> function pairs? For non-STRICT pairs, it's up to the transfer functions to 
>> deal with NULL inputs however they please, no?
> 
> The reason I had to track the notnullcount was because for non-strict was 
> because:
> 
> select sum(n) over (order by i rows between current row and unbounded 
> following) from (values(1,1),(2,NULL)) v(i,n);
> 
> would otherwise return
> 1
> 0
> 
> sum is not strict, so I guess we need to track notnullcount for non-strict 
> functions.
> See the code around if (peraggstate->notnullcount == 0) in 
> retreat_windowaggregate(). 

Yeah, I figured that was the reason, but you can't fix it that way. See 
http://www.postgresql.org/message-id/8e857d95-cba4-4974-a238-9dd7f61be...@phlo.org
 for a detailed explanation why. My 2.2 patch allows pairs of non-strict 
forward and strict inverse transfer functions exactly because of this - i.e., 
basically it decrees that if there's an inverse transfer function, the strict 
setting of the *inverse* function determines the aggregates NULL behaviour. The 
forward transfer function is then never called
for NULL inputs, but it *is* called with the NULL state for the first non-NULL 
input, and *must* then return a non-NULL state (hence it's technically not 
strict, it's strict only in the inputs, not in the state).

BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did 
that in 2.2. That doesn't matter for the built-in aggregates since these aren't 
created with CREATE AGGREGATE, but it's obviously wrong. I'll post a fixed 
patched.

>> The logic around movedaggbase in eval_windowaggregates() seems a bit 
>> convoluted. Couldn't the if be moved before the code that pulls 
>> aggregatedbase up to frameheadpos using the inverse aggregation function?

> I had a look at this and even tried moving the code to before the inverse 
> transitions, but it looks like that would only work if I added more tests to 
> the if condition to ensure that we're not about to perform inverse 
> transitions. To me it just seemed more bullet proof the way it is rather than 
> duplicating logic and having to ensure that it stays duplicated. But saying 
> that I don't think I've fully got my head around why the original code is 
> valid in the first place. I would have imagined that it should contain a 
> check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that sounds like 
> complete rubbish then I'll put it down to my head still being fried from work.

Ok, I get this now. That code really just is asking "is the previous row's 
frame the same as the current row's frame" in that "if" where you added 
movedagg. What confused me was the rather weird way that condition is 
expressed, which turns out is due to the fact that at the point of the if, we 
don't know the new frame's end. Now, we could use update_frametailpos() to find 
that, but that's potentially costly, especially if the tuple store was spilled 
to disk. So instead, the code relies on the fact that only if the frame end is 
"n FOLLOWING/PRECEDING" can the current row lie within the previous row's frame 
without the two frame's ends being necessarily the same.

For added confusion, that "if" never explicitly checks whether the frame's 
*heads* coincide - the previous code seems to have relied on the impossibility 
of having "aggregatedbase <= current < aggregatedupto" after re-initializing 
the aggregation, because then aggregatedbase = aggregatedupto = 0. That's why 
you can't just move the "if" before the "frameheadpos == aggregatedbase" check. 
But you can *if* you also check whether "aggregatedbase == frameheadpos" in the 
if - which is clearer than relying on that rather subtle  assumption anyway.

BTW, the your patch will also fail badly if the frame head ever moves backwards 
- the invariant that "aggregatedbase == frameheadpos" after the inverse 
transition loop will then be violated. Now, this should never happen, because 
we require that the offset in "n PRECEDING/FOLLOWING" is constant, but we 
should probably still check for this and elog().

That check was implicit in old code, because it advanced the tuplestore mark, 
so if the frame head moved backwards, re-scanning the frame would have failed. 
That brings me to another todo - as it stands, that mark gets never advanced if 
we're doing inverse aggregation. To fix that, we need a call to 
WinSetMarkPosition() somewhere in eval_windowaggregates().

After doing this things, eval_windowaggregates ended up calling 
initalize_windowframeaggregates at a single place again, so I inlined it back 
into eval_windowaggregates().  I also made it use temp_slot_1 in the inverse 
aggregation loop, since that seemed safe - the code assumes some invariants 
about aggregatedbase and

Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Andres Freund

On 2014-01-16 09:25:51 -0800, Jeff Janes wrote:
> On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund wrote:
> 
> > On 2013-11-21 14:40:36 -0800, Jeff Janes wrote:
> > > But if the transaction would not have otherwise generated WAL (i.e. a
> > > select that did not have to do any HOT pruning, or an update with zero
> > rows
> > > matching the where condition), doesn't it now have to flush and wait when
> > > it would otherwise not?
> >
> > We short circuit that if there's no xid assigned. Check
> > RecordTransactionCommit().
> >
> 
> It looks like that only short-circuits the flush if both there is no xid
> assigned, and !wrote_xlog.  (line 1054 of xact.c)

Hm. Indeed. Why don't we just always use the async commit behaviour for
that? I don't really see any significant dangers from doing so?

It's also rather odd to use the sync rep mechanisms in such
scenarios... The if() really should test markXidCommitted instead of
wrote_xlog.

> I do see stalls on fdatasync on flush from select statements which had no
> xid, but did generate xlog due to HOT pruning, I don't see why WAL logging
> hint bits would be different.

Are the stalls at commit or while the select is running? If wal_buffers
is filled too fast, which can easily happen if loads of pages are hinted
and wal logged, that will happen independently from
RecordTransactionCommit().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Jeff Janes

On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund wrote:

> On 2013-11-21 14:40:36 -0800, Jeff Janes wrote:
> > But if the transaction would not have otherwise generated WAL (i.e. a
> > select that did not have to do any HOT pruning, or an update with zero
> rows
> > matching the where condition), doesn't it now have to flush and wait when
> > it would otherwise not?
>
> We short circuit that if there's no xid assigned. Check
> RecordTransactionCommit().
>

It looks like that only short-circuits the flush if both there is no xid
assigned, and !wrote_xlog.  (line 1054 of xact.c)

I do see stalls on fdatasync on flush from select statements which had no
xid, but did generate xlog due to HOT pruning, I don't see why WAL logging
hint bits would be different.

Cheers,

Jeff

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Heikki Linnakangas


On 01/16/2014 05:39 PM, Andres Freund wrote:

On 2014-01-16 10:35:20 -0500, Tom Lane wrote:

I think possibly a more productive approach to this would be to treat
it as a session-level GUC setting, rather than hard-wiring it to affect
certain commands and not others.


Do you see a reasonable way to implement this generically for all
commands? I don't.


In suitable safe places, check if you've written too much WAL, and sleep 
if so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of 
CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe 
vacuum_delay_point() is a better analogy. Hopefully you don't need to 
sprinkle them in too many places to be useful.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-16 Thread Steeve Lennmark

Alvaro,
On Thu, Jan 16, 2014 at 3:25 PM, Alvaro Herrera wrote:

> Eyeballing this patch, three thoughts:
>
> 1. I wonder whether ilist.c/h should be moved to src/common so that
> frontend code could use it.
>

That would be nice, I guess lack of helpers is why a lot of stuff is
using pgdumputils.h from src/bin/pg_dump.

$ git grep -l dumputils.h src/bin/{psql,scripts}
src/bin/psql/command.c
src/bin/psql/copy.c
src/bin/psql/describe.c
src/bin/scripts/clusterdb.c
src/bin/scripts/createdb.c
src/bin/scripts/createuser.c
src/bin/scripts/dropdb.c
src/bin/scripts/dropuser.c
src/bin/scripts/reindexdb.c
src/bin/scripts/vacuumdb.c


> 3. How many definitions of atooid() do we have now?
>

$ git grep '#define atooid' |wc -l
  11

I found no obvious .h to include though.

--
Steeve Lennmark

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Simon Riggs

On 16 January 2014 17:49, Tom Lane  wrote:
> Simon Riggs  writes:
>> Agreed, but it won't happen in this release. I/O resource control to
>> follow in later releases.
>
> "This release"?  TBH, I think this patch has arrived too late to be
> considered for 9.4.  It's a fairly major feature and clearly not
> without controversy, so I don't think that posting it for the first
> time the day before CF4 starts meets the guidelines we all agreed
> to back in Ottawa.

I think its majorly useful, but there really isn't much to the
implementation and I expect its beneath the level of being defined as
a major feature. (I think it may have taken less time to write than
this current conversation has lasted).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2014-01-16 Thread tirtho

Is this now fixed? If so, where do I find the patch for postgres 9.2.2.

Thanks

- Tirthankar



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Heavily-modified-big-table-bloat-even-in-auto-vacuum-is-running-tp5774274p5787477.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-16 Thread Steeve Lennmark

Alvaro,
On Thu, Jan 16, 2014 at 3:20 PM, Alvaro Herrera wrote:

> Please keep the --help and the options in the SGML table in alphabetical
> order within their respective sections.

Ah, I failed to see that there was sub groups and thought the options
where not alphabetically ordered. This patch fixes that.

--
Steeve Lennmark

0005-pg_basebackup-relocate-tablespace.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Tom Lane

Simon Riggs  writes:
> Agreed, but it won't happen in this release. I/O resource control to
> follow in later releases.

"This release"?  TBH, I think this patch has arrived too late to be
considered for 9.4.  It's a fairly major feature and clearly not
without controversy, so I don't think that posting it for the first
time the day before CF4 starts meets the guidelines we all agreed
to back in Ottawa.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Simon Riggs

On 16 January 2014 17:29, Andres Freund  wrote:

>> Please let me know if I missed something (rather than debating what is
>> or is not a "maintenance" command).
>
> If we're going to do this for DML - which I am far from convinced of -
> we also should do it for SELECT, since that can generate significant
> amounts of WAL with checksums turned on.
> Otherwise stuff like INSERT ... SELECT, UPDATE FROM et al. will behave
> very confusingly since suddenly thez will not only block the WAL
> generated by the INSERT but also the SELECT.

Good point, but rather happily I can say "thought of that" and refer
you to the other patch which limits SELECT's ability to dirty pages,
and thus, with checksums enabled will limit the generation of WAL.
(And no, they aren't the same thing).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-01-16 11:35:00 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@2ndquadrant.com) wrote:
> > > So, if we want to support antything but a) relative to pgdata b)
> > > relative to postgresql.conf it needs to be configurable at startup
> > > time. Possibly with an added initdb switch to set the location.
> > 
> > How would an initdb switch be better than an option in postgresql.conf?
> > That sounds very backwards to me.  If you meant a *postmaster*/pg_ctl
> > option, that would make some sense to me, for folks who would like to
> > avoid having a single postgresql.conf at all...  Don't know if that
> > would work today or not tho.
> 
> Oh, I was thinking of initdb option being complementary to a
> postgresql.conf option, just setting the path in the generated
> postgresql.conf. That way distributions wouldn't have to muck around in
> the generated config anymore.

Oh, sure, I can see that being useful.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Andres Freund

On 2014-01-16 11:35:00 -0500, Stephen Frost wrote:
> * Andres Freund (and...@2ndquadrant.com) wrote:
> > So, if we want to support antything but a) relative to pgdata b)
> > relative to postgresql.conf it needs to be configurable at startup
> > time. Possibly with an added initdb switch to set the location.
> 
> How would an initdb switch be better than an option in postgresql.conf?
> That sounds very backwards to me.  If you meant a *postmaster*/pg_ctl
> option, that would make some sense to me, for folks who would like to
> avoid having a single postgresql.conf at all...  Don't know if that
> would work today or not tho.

Oh, I was thinking of initdb option being complementary to a
postgresql.conf option, just setting the path in the generated
postgresql.conf. That way distributions wouldn't have to muck around in
the generated config anymore.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> So, if we want to support antything but a) relative to pgdata b)
> relative to postgresql.conf it needs to be configurable at startup
> time. Possibly with an added initdb switch to set the location.

How would an initdb switch be better than an option in postgresql.conf?
That sounds very backwards to me.  If you meant a *postmaster*/pg_ctl
option, that would make some sense to me, for folks who would like to
avoid having a single postgresql.conf at all...  Don't know if that
would work today or not tho.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Andres Freund

On 2014-01-16 17:20:19 +0100, Simon Riggs wrote:
> I'm comfortable with a session level parameter. I was mulling over a
> wal_rate_limit_scope parameter but I think that is too much.
> I will follow Craig's proposal to define this in terms of MB/s, adding
> that I would calc from beginning of statement to now, so it is
> averaged rate. Setting of zero means unlimited. The rate would be
> per-session (in this release) rather than a globally calculated
> setting.
> 
> I would like to call it CHECK_FOR_WAL_RATE_LIMIT()
> 
> That seems like a cheap enough call to allow it to be added in more
> places, so my expanded list of commands touched are
>  CLUSTER/VACUUM FULL
>  ALTER TABLE (only in phase 3 table rewrite)
>  ALTER TABLE SET TABLESPACE
>  CREATE INDEX
> which are already there, plus new ones suggested/implied
>  COPY
>  CREATE TABLE AS SELECT
>  INSERT/UPDATE/DELETE
> 
> Please let me know if I missed something (rather than debating what is
> or is not a "maintenance" command).

If we're going to do this for DML - which I am far from convinced of -
we also should do it for SELECT, since that can generate significant
amounts of WAL with checksums turned on.
Otherwise stuff like INSERT ... SELECT, UPDATE FROM et al. will behave
very confusingly since suddenly thez will not only block the WAL
generated by the INSERT but also the SELECT.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Stephen Frost

Avlaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Please explain.  I don't see a reason not to.  Are you saying you expect
> to put, say, Apache configuration files in the same directory as
> PostgreSQL's?  That doesn't sound really tenable to me, and it doesn't
> sounds like what any sane admin would do.

That doesn't mean we get to dictate it.

> In /etc you have /etc/postfix/ where you have Postfix files, and
> /etc/exim4 where you have Exim files, and /etc/apache2 or whatever.  And
> if each of those software packages uses a "conf.d" or "conf-enabled"
> directory, they use a hardcoded name for a place inside their own
> subdirectory, not one directly in /etc.

I missed the part of this where you point out that Apache on Debian has
some kind of problem because it's possible for an admin to remove the
'IncludeDir sites-enabled' line from apache2.conf.

> What you don't have is /etc/exim4.conf and /etc/postfix.conf and
> /etc/apache.conf, where each of those files specifies that I would
> please like to have extra files loaded from /etc/exim.conf.d and
> /etc/apache.conf.d, and that packages please figure out by parsing my
> config file in whatever random format I have invented to figure out
> where snippets are to be stored.

Right, there's absolutely zero cases of /etc/*.conf files these days.
Nor are there any /etc/*.conf.d directories.

Sure, maybe we all feel that there should be an /etc/pam directory
instead of /etc/pam.conf and /etc/pam.d, but that doesn't make it so,
and we're only talking about a subset (perhaps a majority, but it's
certainly not a totality) of PG users- not everyone is using Debian or
RH or even *Linux* or GNU tools to run PG with.

> > > 2. it is to be placed alongside postgresql.conf, not necessarily in
> > >PGDATA.
> > 
> > I certainly agree with this idea- but I feel it should be configurable.
> > The path which is passed in should be relative to the location of
> > postgresql.conf.
> 
> What would be the point of having it be configurable?

To allow admins to configure it?  Perhaps their policy is that it must
be postgresql.conf.d, or that they want it served off of an NFS mount
instead of being pulled from /etc, or maybe they only back up /srv but
they don't want to change where the package manager dumps the
postgresql.conf that's used for starting up the cluster?

> > > 4. there is no point in "disabling" it, and thus we offer no mechanism
> > >to do that.
> > 
> > No.  There should be a way to disable it.
> 
> What's you rationale for this?

I don't want PG randomly guessing at directories to go poking around in
for config files.  It's also documentation.

> > > 5. its entries override what is set in postgresql.conf, and are in turn
> > >overridden by what is set in postgresql.auto.conf.
> > 
> > For my part, I'd rather my configurations be deterministic and would
> > prefer that we error on obvious misconfigurations where values are set
> > and then reset.
> 
> Any convention we establish will make configurations deterministic.
> Random variations such as making the conf.d directory be configurable
> would make them non-deterministic.  ISTM you're contradicting yourself
> here.

Perhaps you see it that way, but I certainly don't.  Having conf.d be
configurable doesn't mean that we have one file with "work_mem = 1GB"
and another with "work_mem = 5GB" where it's not at all clear which one
is actually what gets used.

> > It will be a useful mechanism for puppet even with the ability to
> > disable it in postgresql.conf (you're probably managing that with puppet
> > also, or keeping it at the system default which will likely include the
> > include_dir option by default- but, yes, you'd need to check that) and
> > without the "override previous definition" option.
> 
> Meh.  External packages (say, postgis) will be able to add their own
> files into conf.d without having to check what its actual location is,
> and without having to check whether the thing is enabled in the first
> place.  If we make it disable-able, packages will not be able to do
> that; or rather, they will be able to add files, but they will have no
> effect.

Which cluster directory is the packager of PostGIS going to put his
config snippets into, exactly?  Even if you force the hard-code conf.d
idea, the packager will still need to figure out what clusters exist.

Although, if you could make the conf.d directory configurable, and have
more than one, then we actually could have a "system-wide" directory and
then a per-cluster directory, which *would* make things easier for
packagers.  But, oh wait, don't forget that the "system-wide" one would
still need to be major-version-specific...

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Simon Riggs

On 16 January 2014 17:22, Andres Freund  wrote:
> On 2014-01-16 11:19:28 -0500, Tom Lane wrote:
>> > I think the usecases that would want this for DML probably also wan this
>> > to work for unlogged, temp tables.
>>
>> Huh?  Unlogged tables generate *zero* WAL, by definition.
>
> Yes. That's my point. If we provide it as a generic resource control -
> which what's being discussed here sounds to me - it should be generic.
>
> If we provide as a measure to prevent standbys from getting out of date
> due to maintenance commands, then it only needs to cover those.

Agreed, but it won't happen in this release. I/O resource control to
follow in later releases.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Andres Freund

On 2014-01-16 11:12:55 -0500, Tom Lane wrote:
> If we were to do this, I'd argue that the location of this hard-wired
> config directory ought to be selected by a configure option.
> in fact, I'd argue that the default behavior with no such option be
> that there's no such hard-wired directory.  That puts it entirely
> on the packager to decide what makes sense as a location.  In the end
> it's going to be the packager's decision anyway --- it's just a matter
> of how painful we make it for him to configure it to his distro's
> conventions.

I don't think a configure option is going to help much to reach that
goal - unless the distribution only allows a single cluster and a single
version to be installed anything but a path relative to $PGDATA won't
work well if configured statically.

So, if we want to support antything but a) relative to pgdata b)
relative to postgresql.conf it needs to be configurable at startup
time. Possibly with an added initdb switch to set the location.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Tom Lane

Andres Freund  writes:
> On 2014-01-16 17:47:51 +0200, Heikki Linnakangas wrote:
>> In suitable safe places, check if you've written too much WAL, and sleep if
>> so.

> That'd be most places doing XLogInsert() if you want to be
> consistent.

See my other response.  There's no need for "consistency", only to be sure
that code paths that might generate a *lot* of WAL include such checks.
We've gotten along fine without any formal methodology for where to put
CHECK_FOR_INTERRUPTS() or vacuum_delay_point() calls, and this seems like
just more of the same.

> I think the usecases that would want this for DML probably also wan this
> to work for unlogged, temp tables.

Huh?  Unlogged tables generate *zero* WAL, by definition.

If your point is that WAL creation isn't a particularly good resource
consumption metric, that's an argument worth considering, but it seems
quite orthogonal to whether we can implement such throttling reasonably.
And in any case, you didn't provide such an argument.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Andres Freund

On 2014-01-16 11:19:28 -0500, Tom Lane wrote:
> > I think the usecases that would want this for DML probably also wan this
> > to work for unlogged, temp tables.
> 
> Huh?  Unlogged tables generate *zero* WAL, by definition.

Yes. That's my point. If we provide it as a generic resource control -
which what's being discussed here sounds to me - it should be generic.

If we provide as a measure to prevent standbys from getting out of date
due to maintenance commands, then it only needs to cover those.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Simon Riggs

On 16 January 2014 16:56, Andres Freund  wrote:
> On 2014-01-16 17:47:51 +0200, Heikki Linnakangas wrote:
>> On 01/16/2014 05:39 PM, Andres Freund wrote:
>> >On 2014-01-16 10:35:20 -0500, Tom Lane wrote:
>> >>I think possibly a more productive approach to this would be to treat
>> >>it as a session-level GUC setting, rather than hard-wiring it to affect
>> >>certain commands and not others.
>> >
>> >Do you see a reasonable way to implement this generically for all
>> >commands? I don't.
>>
>> In suitable safe places, check if you've written too much WAL, and sleep if
>> so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of
>> CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe
>> vacuum_delay_point() is a better analogy. Hopefully you don't need to
>> sprinkle them in too many places to be useful.
>
> That'd be most places doing XLogInsert() if you want to be
> consistent. Each needing to be analyzed whether we're blocking important
> resources, determining where in the callstack we can do the
> CHECK_FOR_WAL_BUDGET().
>
> I don't think the the add power by allowing bulk DML to be metered is
> worth the increased implementation cost.
>
> I think the usecases that would want this for DML probably also wan this
> to work for unlogged, temp tables. That'd require a much more generic
> resource control framework.

Thank you everyone for responding so positively to this proposal. It
is something many users have complained about.

In time, I think we might want both WAL Rate Limiting and I/O Rate
Limiting. IMHO I/O rate limiting is harder and so this proposal is
restricted solely to WAL rate limiting.

I'm comfortable with a session level parameter. I was mulling over a
wal_rate_limit_scope parameter but I think that is too much.
I will follow Craig's proposal to define this in terms of MB/s, adding
that I would calc from beginning of statement to now, so it is
averaged rate. Setting of zero means unlimited. The rate would be
per-session (in this release) rather than a globally calculated
setting.

I would like to call it CHECK_FOR_WAL_RATE_LIMIT()

That seems like a cheap enough call to allow it to be added in more
places, so my expanded list of commands touched are
 CLUSTER/VACUUM FULL
 ALTER TABLE (only in phase 3 table rewrite)
 ALTER TABLE SET TABLESPACE
 CREATE INDEX
which are already there, plus new ones suggested/implied
 COPY
 CREATE TABLE AS SELECT
 INSERT/UPDATE/DELETE

Please let me know if I missed something (rather than debating what is
or is not a "maintenance" command).

Any other thoughts before I cut v2 ?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Andres Freund

On 2014-01-16 10:35:20 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I don't really see much difficulty in determining what's a utility
> > command and what not for the purpose of this? All utility commands which
> > create WAL in O(table_size) or worse.
> 
> By that definition, INSERT, UPDATE, and DELETE can all be "utility
> commands".  So would a full-table-scan SELECT, if it's unfortunate
> enough to run into a lot of hint-setting or HOT-pruning work.

Well, I said *utility* command. So everything processed by
ProcessUtility() generating WAL like that.

> I think possibly a more productive approach to this would be to treat
> it as a session-level GUC setting, rather than hard-wiring it to affect
> certain commands and not others.

Do you see a reasonable way to implement this generically for all
commands? I don't.
We shouldn't let the the need for generic resource control stop us from
providing some for of resource control for a consistent subset of
commands.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Alvaro Herrera

Stephen Frost wrote:
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> > Peter Eisentraut wrote:
> > > In my mind, a conf.d directory is an extension of a single-file
> > > configuration, and so it should be handled that way.
> > 
> > +1 on this.  This means
> > 
> > 1. it is to be read automatically by the server without need for an
> >"include_dir conf.d" option in the main postgresql.conf.
> 
> I am not thrilled with the idea that we're claiming ownership of the
> directory which postgresql.conf happens to reside in and you had better
> not have any 'conf.d' directory in there unless it's the one for PG.

Please explain.  I don't see a reason not to.  Are you saying you expect
to put, say, Apache configuration files in the same directory as
PostgreSQL's?  That doesn't sound really tenable to me, and it doesn't
sounds like what any sane admin would do.

In /etc you have /etc/postfix/ where you have Postfix files, and
/etc/exim4 where you have Exim files, and /etc/apache2 or whatever.  And
if each of those software packages uses a "conf.d" or "conf-enabled"
directory, they use a hardcoded name for a place inside their own
subdirectory, not one directly in /etc.

What you don't have is /etc/exim4.conf and /etc/postfix.conf and
/etc/apache.conf, where each of those files specifies that I would
please like to have extra files loaded from /etc/exim.conf.d and
/etc/apache.conf.d, and that packages please figure out by parsing my
config file in whatever random format I have invented to figure out
where snippets are to be stored.

> > 2. it is to be placed alongside postgresql.conf, not necessarily in
> >PGDATA.
> 
> I certainly agree with this idea- but I feel it should be configurable.
> The path which is passed in should be relative to the location of
> postgresql.conf.

What would be the point of having it be configurable?

> > 4. there is no point in "disabling" it, and thus we offer no mechanism
> >to do that.
> 
> No.  There should be a way to disable it.

What's you rationale for this?

> > 5. its entries override what is set in postgresql.conf, and are in turn
> >overridden by what is set in postgresql.auto.conf.
> 
> For my part, I'd rather my configurations be deterministic and would
> prefer that we error on obvious misconfigurations where values are set
> and then reset.

Any convention we establish will make configurations deterministic.
Random variations such as making the conf.d directory be configurable
would make them non-deterministic.  ISTM you're contradicting yourself
here.

> > Taken together, these properties guarantee that it's an useful mechanism
> > to be used by system-level deployment/configuration tools such as Puppet
> > et al.
> 
> It will be a useful mechanism for puppet even with the ability to
> disable it in postgresql.conf (you're probably managing that with puppet
> also, or keeping it at the system default which will likely include the
> include_dir option by default- but, yes, you'd need to check that) and
> without the "override previous definition" option.

Meh.  External packages (say, postgis) will be able to add their own
files into conf.d without having to check what its actual location is,
and without having to check whether the thing is enabled in the first
place.  If we make it disable-able, packages will not be able to do
that; or rather, they will be able to add files, but they will have no
effect.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeremy Harris


On 14/01/14 22:23, Dave Chinner wrote:

On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote:

To quantify that, in a production setting we were seeing pauses of
up to two minutes with shared_buffers set to 8GB and default dirty

^

page settings for Linux, on a machine with 256GB RAM and 512MB

   ^
There's your problem.

By default, background writeback doesn't start until 10% of memory
is dirtied, and on your machine that's 25GB of RAM. That's way to
high for your workload.

It appears to me that we are seeing large memory machines much more
commonly in data centers - a couple of years ago 256GB RAM was only
seen in supercomputers. Hence machines of this size are moving from
"tweaking settings for supercomputers is OK" class to "tweaking
settings for enterprise servers is not OK"

Perhaps what we need to do is deprecate dirty_ratio and
dirty_background_ratio as the default values as move to the byte
based values as the defaults and cap them appropriately.  e.g.
10/20% of RAM for small machines down to a couple of GB for large
machines


  Perhaps the kernel needs a dirty-amount control measured
in time units rather than pages (it being up to the kernel to
measure the achievable write rate)...
--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Tom Lane

Stephen Frost  writes:
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
>> 1. it is to be read automatically by the server without need for an
>> "include_dir conf.d" option in the main postgresql.conf.

> I am not thrilled with the idea that we're claiming ownership of the
> directory which postgresql.conf happens to reside in and you had better
> not have any 'conf.d' directory in there unless it's the one for PG.

If we were to do this, I'd argue that the location of this hard-wired
config directory ought to be selected by a configure option.   And
in fact, I'd argue that the default behavior with no such option be
that there's no such hard-wired directory.  That puts it entirely
on the packager to decide what makes sense as a location.  In the end
it's going to be the packager's decision anyway --- it's just a matter
of how painful we make it for him to configure it to his distro's
conventions.

On the whole though, I find the argument that we can't configure this
in postgresql.conf to be exceedingly weak.  In particular, the idea
that you can puppet-ify a configuration without any knowledge of the
distro you're targeting is ridiculous on its face, as is the idea
that we/postgres can dictate configuration file location conventions
to packagers who have distro rules to follow.  There isn't going to
be a "one location to rule them all", and that means the argument
that a location determined by postgresql.conf is too unstable does
not really hold water.

Another point here is that hard-wiring a config directory location
into the executables completely breaks many scenarios for running
multiple clusters with the same executables.  Yeah, maybe you'd
like to share most of the same config across all your clusters.
But then again, maybe you wouldn't.  The proposed behavior would
make it practically impossible for a test cluster to not pick up
random subsets of the system primary cluster's configuration.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Andres Freund

On 2014-01-16 17:47:51 +0200, Heikki Linnakangas wrote:
> On 01/16/2014 05:39 PM, Andres Freund wrote:
> >On 2014-01-16 10:35:20 -0500, Tom Lane wrote:
> >>I think possibly a more productive approach to this would be to treat
> >>it as a session-level GUC setting, rather than hard-wiring it to affect
> >>certain commands and not others.
> >
> >Do you see a reasonable way to implement this generically for all
> >commands? I don't.
> 
> In suitable safe places, check if you've written too much WAL, and sleep if
> so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of
> CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe
> vacuum_delay_point() is a better analogy. Hopefully you don't need to
> sprinkle them in too many places to be useful.

That'd be most places doing XLogInsert() if you want to be
consistent. Each needing to be analyzed whether we're blocking important
resources, determining where in the callstack we can do the
CHECK_FOR_WAL_BUDGET().

I don't think the the add power by allowing bulk DML to be metered is
worth the increased implementation cost.

I think the usecases that would want this for DML probably also wan this
to work for unlogged, temp tables. That'd require a much more generic
resource control framework.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add force option to dropdb

2014-01-16 Thread salah jubeh


>If the user owns objects, that will prevent this from working also.  I
>have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY
>calls to this utility would be a bit excessive, but who knows.

Please find attached the first attempt to drop drop the client connections.

I have added an option -k, --kill instead of force since killing client 
connection does not guarantee -drop force-.  

Regards




On Tuesday, January 14, 2014 8:06 PM, Alvaro Herrera  
wrote:
 
salah jubeh wrote:

> For the sake of completeness:
> 1. I think also, I need also to temporary disallow conecting to the database, 
> is that right?
> 2. Is there other factors can hinder dropping database?

If the user owns objects, that will prevent this from working also.  I
have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY
calls to this utility would be a bit excessive, but who knows.


> 3. Should I write two patches one for pg_version>=9.2 and one for 
> pg_version<9.2  

No point -- nothing gets applied to branches older than current
development anyway.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--- dropdb_orig.c	2014-01-16 15:21:51.059243617 +0100
+++ dropdb.c	2014-01-16 16:38:57.419414200 +0100
@@ -32,6 +32,7 @@
 		{"echo", no_argument, NULL, 'e'},
 		{"interactive", no_argument, NULL, 'i'},
 		{"if-exists", no_argument, &if_exists, 1},
+		{"kill", no_argument, NULL, 'k'},
 		{"maintenance-db", required_argument, NULL, 2},
 		{NULL, 0, NULL, 0}
 	};
@@ -48,6 +49,8 @@
 	enum trivalue prompt_password = TRI_DEFAULT;
 	bool		echo = false;
 	bool		interactive = false;
+	bool		kill = false;
+	char *database_conn_limit = "-1";
 
 	PQExpBufferData sql;
 
@@ -59,7 +62,7 @@
 
 	handle_help_version_opts(argc, argv, "dropdb", help);
 
-	while ((c = getopt_long(argc, argv, "h:p:U:wWei", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "h:p:U:wWeik", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -84,6 +87,9 @@
 			case 'i':
 interactive = true;
 break;
+			case 'k':
+kill = true;
+break;
 			case 0:
 /* this covers the long options */
 break;
@@ -121,8 +127,6 @@
 
 	initPQExpBuffer(&sql);
 
-	appendPQExpBuffer(&sql, "DROP DATABASE %s%s;\n",
-	  (if_exists ? "IF EXISTS " : ""), fmtId(dbname));
 
 	/* Avoid trying to drop postgres db while we are connected to it. */
 	if (maintenance_db == NULL && strcmp(dbname, "postgres") == 0)
@@ -131,11 +135,64 @@
 	conn = connectMaintenanceDatabase(maintenance_db,
 			host, port, username, prompt_password, progname);
 
+	/*disallow database connections and terminate client connections*/
+	if (kill) 
+	{
+		appendPQExpBuffer(&sql, "SELECT datconnlimit FROM pg_database WHERE datname= '%s';",fmtId(dbname));
+		result = executeQuery(conn, sql.data, progname, echo);
+		// get the datconnÄºimit to do cleanup in case of dropdb fail
+		if (PQntuples(result) == 1)
+		{
+			database_conn_limit = PQgetvalue(result, 0, 0);
+		} else 
+		{
+			fprintf(stderr, _("%s: database removal failed: %s\n"),
+			progname, dbname);
+			PQclear(result);
+			PQfinish(conn);
+			exit(1);
+		}
+		PQclear(result);
+		resetPQExpBuffer(&sql);
+		//new connections are not allowed 
+		appendPQExpBuffer(&sql, "UPDATE pg_database SET datconnlimit = 0 WHERE datname= '%s';\n",fmtId(dbname));
+		if (echo)
+			printf("%s", sql.data);
+		result = PQexec(conn, sql.data);
+		if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		{
+			fprintf(stderr, _("%s: database removal failed: %s\n"),
+			progname, dbname);
+			PQclear(result);
+			PQfinish(conn);
+			exit(1);
+		}
+		PQclear(result);
+		resetPQExpBuffer(&sql);
+		// terminate client connections
+		appendPQExpBuffer(&sql, "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname ='%s';",fmtId(dbname));
+		result = executeQuery(conn, sql.data, progname, echo);
+		PQclear(result);
+		resetPQExpBuffer(&sql);
+	}
+
+	appendPQExpBuffer(&sql, "DROP DATABASE %s%s;\n",
+	  (if_exists ? "IF EXISTS " : ""), fmtId(dbname));
+
 	if (echo)
 		printf("%s", sql.data);
 	result = PQexec(conn, sql.data);
 	if (PQresultStatus(result) != PGRES_COMMAND_OK)
 	{
+		//cleanup: reset datconnlimit	
+		if (kill)
+		{
+			resetPQExpBuffer(&sql);
+			appendPQExpBuffer(&sql, "UPDATE pg_database SET datconnlimit = %s WHERE datname= '%s';",database_conn_limit,fmtId(dbname));
+			if (echo)
+printf("%s", sql.data);
+			result = PQexec(conn, sql.data);
+		}
 		fprintf(stderr, _("%s: database removal failed: %s"),
 progname, PQerrorMessage(conn));
 		PQfinish(conn);
@@ -159,6 +216,7 @@
 	printf(_("  -i, --interactive prompt before deleting anything\n"));
 	printf(_("  -V, --version output version information, then exit\n"));
 	printf(_("  --if-exists   don't report error if database doesn't exist\n"));
+	printf(_("  -k, --killkill client connections\n"));
 	printf(_("  -?, --help

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
> Given that the majority didn't seem to be convinced by this and that the
> feature was written differently this isn't a convincing argument for the
> location of the file given the current feature, no?

I'll start by pointing out that the only reason this ALTER SYSTEM thing
exists is because there are options that we can't set in the catalogs.

Now, I can't claim to conjecture on what the majority opinion is when it
disagreed with my own.  I remain of the opinion that having settings
which admins are familiar with being overridden by some file in the data
directory is going to cause confusion and problems.  Therefore, I feel
that the 'auto' file should be where postgresql.conf lives because
admins will at least have some hope of finding it there (and then
realizing they need to disable that functionality and lock down the
postgresql.conf permissions to be root-owned...).

Having the log files be sufficiently detailed to help an admin is better
than nothing, but it certainly won't be as good as having the config
files in the same directory when the admin is trying to figure out why
the cluster won't start any more.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Peter Eisentraut wrote:
> > In my mind, a conf.d directory is an extension of a single-file
> > configuration, and so it should be handled that way.
> 
> +1 on this.  This means
> 
> 1. it is to be read automatically by the server without need for an
>"include_dir conf.d" option in the main postgresql.conf.

I am not thrilled with the idea that we're claiming ownership of the
directory which postgresql.conf happens to reside in and you had better
not have any 'conf.d' directory in there unless it's the one for PG.

> 2. it is to be placed alongside postgresql.conf, not necessarily in
>PGDATA.

I certainly agree with this idea- but I feel it should be configurable.
The path which is passed in should be relative to the location of
postgresql.conf.

> 3. it might or might not be writable by the running server; it's an
>operating-system-owned configuration location.

Works well enough, I suppose.

> 4. there is no point in "disabling" it, and thus we offer no mechanism
>to do that.

No.  There should be a way to disable it.

> 5. its entries override what is set in postgresql.conf, and are in turn
>overridden by what is set in postgresql.auto.conf.

For my part, I'd rather my configurations be deterministic and would
prefer that we error on obvious misconfigurations where values are set
and then reset.

> Taken together, these properties guarantee that it's an useful mechanism
> to be used by system-level deployment/configuration tools such as Puppet
> et al.

It will be a useful mechanism for puppet even with the ability to
disable it in postgresql.conf (you're probably managing that with puppet
also, or keeping it at the system default which will likely include the
include_dir option by default- but, yes, you'd need to check that) and
without the "override previous definition" option.

> I also think, but this is a secondary point, that initdb should write
> its modified settings into a file in conf.d instead of generating a
> custom postgresql.conf.

This makes sense.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Tom Lane

Heikki Linnakangas  writes:
> On 01/16/2014 05:39 PM, Andres Freund wrote:
>> Do you see a reasonable way to implement this generically for all
>> commands? I don't.

> In suitable safe places, check if you've written too much WAL, and sleep 
> if so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of 
> CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe 
> vacuum_delay_point() is a better analogy. Hopefully you don't need to 
> sprinkle them in too many places to be useful.

We probably don't really need to implement it for "all" commands; I think
everyone can agree that, say, ALTER TABLE RENAME COLUMN isn't going to
emit enough WAL to really matter.  My point was that we should try to hit
everything that potentially *could* generate lots of WAL, rather than
arbitrarily deciding that some are utility commands and some are not.

For INSERT/UPDATE/DELETE, likely you could do this with a single 
CHECK_FOR_WAL_BUDGET() added at a strategic spot in nodeModifyTable.c.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Display oprcode and its volatility in \do+

2014-01-16 Thread Tom Lane

Marko Tiikkaja  writes:
> On 1/16/14 4:22 PM, Tom Lane wrote:
>> FWIW, I'm on board with the idea of printing the oprcode, but adding
>> volatility here seems like probably a waste of valuable terminal width.
>> I think that the vast majority of operators have immutable or at worst
>> stable underlying functions, so this doesn't seem like the first bit
>> of information I'd need about the underlying function.

> Completely unscientifically, 50% of the time I've wanted to know the 
> oprcode has been because I wanted to know its volatility (exactly 
> because of the stable oprcodes we have).  It seemed like a useful 
> addition, but I don't feel that strongly about it.

Hm.  Personally, I've lost count of the number of times I've had to
resort to "select ... from pg_operator" because \do lacked an oprcode
column, but I don't remember that many or indeed any were because
I wanted to check the volatility.

Anybody else have an opinion?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why conf.d should be default, and auto.conf and recovery.conf should be in it

2014-01-16 Thread Christian Kruse

Hi Alvaro,

On 16/01/14 10:21, Alvaro Herrera wrote:
> 1. it is to be read automatically by the server without need for an
>"include_dir conf.d" option in the main postgresql.conf.

+1

> 4. there is no point in "disabling" it, and thus we offer no mechanism
>to do that.

Not only there is „no point“ in disabling it, it makes this feature
nearly useless. One can't rely on it if the distro may disable
it. There are so many out there, it will never be a reliable feature
if it can be disabled.

> 5. its entries override what is set in postgresql.conf, and are in turn
>overridden by what is set in postgresql.auto.conf.

+1

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgpfpMOCECyB7.pgp
Description: PGP signature

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Tom Lane

Andres Freund  writes:
> I don't really see much difficulty in determining what's a utility
> command and what not for the purpose of this? All utility commands which
> create WAL in O(table_size) or worse.

By that definition, INSERT, UPDATE, and DELETE can all be "utility
commands".  So would a full-table-scan SELECT, if it's unfortunate
enough to run into a lot of hint-setting or HOT-pruning work.

I think possibly a more productive approach to this would be to treat
it as a session-level GUC setting, rather than hard-wiring it to affect
certain commands and not others.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Display oprcode and its volatility in \do+

2014-01-16 Thread Marko Tiikkaja


On 1/16/14 4:22 PM, Tom Lane wrote:

Marko Tiikkaja  writes:

On 1/16/14 9:53 AM, Rushabh Lathia wrote:

Even I personally felt the Function and Volatility is nice to have info
into \do+.


FWIW, I'm on board with the idea of printing the oprcode, but adding
volatility here seems like probably a waste of valuable terminal width.
I think that the vast majority of operators have immutable or at worst
stable underlying functions, so this doesn't seem like the first bit
of information I'd need about the underlying function.


Completely unscientifically, 50% of the time I've wanted to know the 
oprcode has been because I wanted to know its volatility (exactly 
because of the stable oprcodes we have).  It seemed like a useful 
addition, but I don't feel that strongly about it.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Display oprcode and its volatility in \do+

2014-01-16 Thread Tom Lane

Marko Tiikkaja  writes:
> On 1/16/14 9:53 AM, Rushabh Lathia wrote:
>> Even I personally felt the Function and Volatility is nice to have info
>> into \do+.

FWIW, I'm on board with the idea of printing the oprcode, but adding
volatility here seems like probably a waste of valuable terminal width.
I think that the vast majority of operators have immutable or at worst
stable underlying functions, so this doesn't seem like the first bit
of information I'd need about the underlying function.  And why print
this but not, say, security, owner, source code, or other columns
shown in \df?  ISTM the value of this addition is to give you what
you need to go look in \df, not to try to substitute for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-01-16 Thread Alvaro Herrera

Euler Taveira wrote:
> On 08-11-2013 06:20, Dilip kumar wrote:
> > On 08 November 2013 13:38, Jan Lentfer
> > 
> > 
> >> For this use case, would it make sense to queue work (tables) in order of 
> >> their size, starting on the largest one?
> > 
> >> For the case where you have tables of varying size this would lead to a 
> >> reduced overall processing time as it prevents large (read: long 
> >> processing time) tables to be processed in the last step. While processing 
> >> large tables at first and filling up "processing slots/jobs" when they get 
> >> free with smaller tables one after the other would safe overall execution 
> >> time.
> > Good point, I have made the change and attached the modified patch.
> > 
> Don't you submit it for a CF, do you? Is it too late for this CF?

Not too late.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)

2014-01-16 Thread Andres Freund

Hi,

On 2014-01-16 09:34:51 -0500, Robert Haas wrote:
> >> - ReplicationSlotAcquire probably needs to ignore slots that are not 
> >> active.
> > Not sure what you mean? If the slot isn't in_use we'll skip it in the loop.
> 
> active != in_use.
> 
> I suppose your point is that the slot can't be in_use if it's not also
> active.

Yes. There's asserts to that end...

> Maybe it would be better to get rid of active/in_use and have
> three states: REPLSLOT_CONNECTED, REPLSLOT_NOT_CONNECTED,
> REPLSLOT_FREE.  Or something like that.

Hm. Color me unenthusiastic. If you feel strongly I can change it, but
otherwise not.

> >> - If there's a coding rule that slot->database can't be changed while
> >> the slot is active, then the check to make sure that the user isn't
> >> trying to bind to a slot with a mis-matching database could be done
> >> before the code described in the previous point, avoiding the need to
> >> go back and release the resource.
> >
> > I don't think slot->database should be allowed to change at all...
> 
> Well, it can if the slot is dropped and a new one created.

Well. That obviously requires the lwlock to be acquired...

> >> - I think the critical section in ReplicationSlotDrop is bogus.  If
> >> DeleteSlot() fails, we scarcely need to PANIC.  The slot just isn't
> >> gone.
> >
> > Well, if delete slot fails, we don't really know at which point it
> > failed which means that the on-disk state might not correspond to the
> > in-memory state. I don't see a point in adding code trying to handle
> > that case correctly...
> 
> Deleting the slot should be an atomic operation.  There's some
> critical point before which the slot will be picked up by recovery and
> after which it won't.  You either did that operation, or not, and can
> adjust the in-memory state accordingly.

I am not sure I understand that point. We can either update the
in-memory bit before performing the on-disk operations or
afterwards. Either way, there's a way to be inconsistent if the disk
operation fails somewhere inbetween (it might fail but still have
deleted the file/directory!). The normal way to handle that in other
places is PANICing when we don't know so we recover from the on-disk
state.
I really don't see the problem here? Code doesn't get more robust by
doing s/PANIC/ERROR/, rather the contrary. It takes extra smarts to only
ERROR, often that's not warranted.

> >> - A comment in KillSlot wonders whether locking is required.  I say
> >> yes.  It's safe to take lwlocks and spinlocks during shmem exit, and
> >> failing to do so seems like a recipe for subtle corner-case bugs.
> >
> > I agree that it's safe to use spinlocks, but lwlocks? What if we are
> > erroring out while holding an lwlock? Code that runs inside a
> > TransactionCommand is protected against that, but if not ProcKill()
> > which invokes LWLockReleaseAll() runs pretty late in the teardown
> > process...
> 
> Hmm.  I guess it'd be fine to decide that a connected slot can be
> marked not-connected without the lock.

I now acquire the spinlock since that has to work, or we have much worse
problems... That guarantees that other backends see the value as well.

> I think you'd want a rule that
> a slot can't be freed except when it's not-connected; otherwise, you
> might end up marking the slot not-connected after someone else had
> already recycled it for an unrelated purpose (drop slot, create new
> slot).

Yea, that rule is there. Otherwise we'd get in great trouble.

> >> - There seems to be no interface to acquire or release slots from
> >> either SQL or the replication protocol, nor any way for a client of
> >> this code to update its slot details.
> >
> > I don't think either ever needs to do that - START_TRANSACTION SLOT slot
> > ...; and decoding_slot_*changes will acquire/release for them while
> > active. What would the usecase be to allow them to be acquired from SQL?
> 
> My point isn't so much about SQL as that with just this patch I don't
> see any way for anyone to ever acquire a slot for anything, ever.  So
> I think there's a piece missing, or three.

The slot is acquired by code using the slot. So when START_TRANSACTION
SLOT ... (in contrast to a START_TRANSACTION without SLOT) is sent,
walsender.c does an ReplicationSlotAcquire(cmd->slotname) in
StartReplication() and releases it after it has finished.

> > The slot details get updates by the respective replication code. For
> > streaming rep, that should happen via reply and feedback
> > messages. For changeset extraction it happens when
> > LogicalConfirmReceivedLocation() is called; the walsender interface
> > does that using reply messages, the SQL interface calls it when
> > finished (unless you use the _peek_ functions).
> 
> Right, but where is this code?  I don't see this updating the reply
> and feedback message processing code to touch slots.  Did I miss that?

It's in "wal_decoding: logical changeset extraction walsender interface"
currently :(. Splitting the streaming

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-16 Thread Robert Haas

On Thu, Jan 16, 2014 at 12:07 AM, Amit Kapila  wrote:
> On Thu, Jan 16, 2014 at 12:49 AM, Robert Haas  wrote:
>> On Wed, Jan 15, 2014 at 7:28 AM, Amit Kapila  wrote:
>>> Unpatched
>>> ---
>>> testname | wal_generated |
>>> duration
>>> --+--+--
>>>  one short and one long field, no change |1054923224 |  33.101135969162
>>>
>>> After pgrb_delta_encoding_v4
>>> -
>>>
>>> testname | wal_generated |
>>> duration
>>> --+--+--
>>>  one short and one long field, no change | 877859144 | 30.6749138832092
>>>
>>>
>>> Temporary Changes
>>> (Revert Max Chunksize = 4 and logic of finding longer match)
>>> -
>>>
>>>  testname| wal_generated |
>>> duration
>>> --+--+--
>>>  one short and one long field, no change | 677337304 | 25.4048750400543
>>
>> Sure, but watch me not care.
>>
>> If we're interested in taking advantage of the internal
>> compressibility of tuples, we can do a lot better than this patch.  We
>> can compress the old tuple and the new tuple.  We can compress
>> full-page images.  We can compress inserted tuples.  But that's not
>> the point of this patch.
>>
>> The point of *this* patch is to exploit the fact that the old and new
>> tuples are likely to be very similar, NOT to squeeze out every ounce
>> of compression from other sources.
>
>Okay, got your point.
>Another minor thing is that in latest patch which I have sent yesterday,
>I have modified it such that while formation of chunks if there is a data
>at end of string which doesn't have special pattern and is less than max
>chunk size, we also consider that as a chunk. The reason of doing this
>was that let us say if we have 104 bytes string which contains no special
>bit pattern, then it will just have one 64 byte chunk and will leave the
>remaining bytes, which might miss the chance of doing compression for
>that data.

Yeah, that sounds right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)

2014-01-16 Thread Robert Haas

On Wed, Jan 15, 2014 at 3:39 PM, Andres Freund  wrote:
> On 2014-01-15 13:28:25 -0500, Robert Haas wrote:
>> - I think you should just regard ReplicationSlotCtlLock as protecting
>> the "name" and "active" flags of every slot.  ReplicationSlotCreate()
>> would then not need to mess with the spinlocks at all, and
>> ReplicationSlotAcquire and ReplicationSlotDrop get a bit simpler too I
>> think.  Functions like ReplicationSlotsComputeRequiredXmin() can
>> acquire this lock in shared mode and only lock the slots that are
>> actually active, instead of all of them.
>
> I first thought you meant that we should get rid of the spinlock, but
> after rereading I think you just mean that ->name, ->active, ->in_use
> are only allowed to change while holding the lwlock exclusively so we
> don't need to spinlock in those cases? If so, yes, that works for me.

Yeah, that's about what I had in mind.

>> - If you address /* FIXME: apply sanity checking to slot name */, then
>> I think that also addresses /* XXX: do we want to use truncate
>> identifier instead? */. In other words, let's just error out if the
>> name is too long.  I'm not sure what other sanity checking is needed
>> here; maybe just restrict it to 7-bit characters to avoid problems
>> with encodings for individual databases varying.
>
> Yea, erroring out seems like a good idea. But I think we need to
> restrict slot names a bit more than that, given they are used as
> filenames... We could instead name the files using the slot's offset,
> but I'd prefer to not go that route.

OK.  Well, add some code, then.  :-)

>> - ReplicationSlotAcquire probably needs to ignore slots that are not active.
> Not sure what you mean? If the slot isn't in_use we'll skip it in the loop.

active != in_use.

I suppose your point is that the slot can't be in_use if it's not also
active.  Maybe it would be better to get rid of active/in_use and have
three states: REPLSLOT_CONNECTED, REPLSLOT_NOT_CONNECTED,
REPLSLOT_FREE.  Or something like that.

>> - If there's a coding rule that slot->database can't be changed while
>> the slot is active, then the check to make sure that the user isn't
>> trying to bind to a slot with a mis-matching database could be done
>> before the code described in the previous point, avoiding the need to
>> go back and release the resource.
>
> I don't think slot->database should be allowed to change at all...

Well, it can if the slot is dropped and a new one created.

>> - I think the critical section in ReplicationSlotDrop is bogus.  If
>> DeleteSlot() fails, we scarcely need to PANIC.  The slot just isn't
>> gone.
>
> Well, if delete slot fails, we don't really know at which point it
> failed which means that the on-disk state might not correspond to the
> in-memory state. I don't see a point in adding code trying to handle
> that case correctly...

Deleting the slot should be an atomic operation.  There's some
critical point before which the slot will be picked up by recovery and
after which it won't.  You either did that operation, or not, and can
adjust the in-memory state accordingly.

>> - cancel_before_shmem_exit is only guaranteed to remove the
>> most-recently-added callback.
>
> Yea :(. I think that's safe for the current usages but seems mighty
> fragile... Not sure what to do about it. Just register KillSlot once and
> keep it registered?

Yep.  Use a module-private flag to decide whether it needs to do anything.

>> - A comment in KillSlot wonders whether locking is required.  I say
>> yes.  It's safe to take lwlocks and spinlocks during shmem exit, and
>> failing to do so seems like a recipe for subtle corner-case bugs.
>
> I agree that it's safe to use spinlocks, but lwlocks? What if we are
> erroring out while holding an lwlock? Code that runs inside a
> TransactionCommand is protected against that, but if not ProcKill()
> which invokes LWLockReleaseAll() runs pretty late in the teardown
> process...

Hmm.  I guess it'd be fine to decide that a connected slot can be
marked not-connected without the lock.  I think you'd want a rule that
a slot can't be freed except when it's not-connected; otherwise, you
might end up marking the slot not-connected after someone else had
already recycled it for an unrelated purpose (drop slot, create new
slot).

>> - There seems to be no interface to acquire or release slots from
>> either SQL or the replication protocol, nor any way for a client of
>> this code to update its slot details.
>
> I don't think either ever needs to do that - START_TRANSACTION SLOT slot
> ...; and decoding_slot_*changes will acquire/release for them while
> active. What would the usecase be to allow them to be acquired from SQL?

My point isn't so much about SQL as that with just this patch I don't
see any way for anyone to ever acquire a slot for anything, ever.  So
I think there's a piece missing, or three.

> The slot details get updates by the respective replication code. For
> streaming rep, that should happen via reply

Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-16 Thread Andres Freund

Hi,

On 2014-01-16 11:25:59 -0300, Alvaro Herrera wrote:
> Eyeballing this patch, three thoughts:
> 
> 1. I wonder whether ilist.c/h should be moved to src/common so that
> frontend code could use it.

Sounds like a good idea. There's some debugging checks that elog, but
that should be fixable easily.

> 2. I wonder whether ilist.c should gain the ability to have
> singly-linked lists with a pointer to the tail node for appending to the
> end.  This code would use it, and also the code doing postgresql.conf
> parsing which has head/tail pointers all over the place.  I'm sure there
> are other uses.

I am not generaly adverse to it, but neither of those usecases seems to
warrant that. They just should use a doubly linked list, it's not like
the memory/runtime overhead is significant. And the implementation
overhead doesn't count either if they use ilist.h.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-16 Thread Alvaro Herrera

Eyeballing this patch, three thoughts:

1. I wonder whether ilist.c/h should be moved to src/common so that
frontend code could use it.

2. I wonder whether ilist.c should gain the ability to have
singly-linked lists with a pointer to the tail node for appending to the
end.  This code would use it, and also the code doing postgresql.conf
parsing which has head/tail pointers all over the place.  I'm sure there
are other uses.

3. How many definitions of atooid() do we have now?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-01-16 Thread Euler Taveira

On 08-11-2013 06:20, Dilip kumar wrote:
> On 08 November 2013 13:38, Jan Lentfer
> 
> 
>> For this use case, would it make sense to queue work (tables) in order of 
>> their size, starting on the largest one?
> 
>> For the case where you have tables of varying size this would lead to a 
>> reduced overall processing time as it prevents large (read: long processing 
>> time) tables to be processed in the last step. While processing large tables 
>> at first and filling up "processing slots/jobs" when they get free with 
>> smaller tables one after the other would safe overall execution time.
> Good point, I have made the change and attached the modified patch.
> 
Don't you submit it for a CF, do you? Is it too late for this CF?


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

2014-01-16 Thread Andres Freund

On 2014-01-16 09:06:30 -0500, Robert Haas wrote:
> > Seems like a really bad name if we are only slowing down some commands -
> > that seems to indicate we're slowing down all of them. I think it should be
> > something that indicates that it only affects the maintenance commands.
> 
> And why should it only affect the maintenance commands anyway, and who
> decides what's a maintenance command?

I think implementing it for everything might have some use, but it's a
much, much more complex task. You can't simply do rate limiting in
XLogInsert() or somesuch - we're holding page locks, buffer pins, other
locks... I don't see why that needs to be done in the same feature.

I don't really see much difficulty in determining what's a utility
command and what not for the purpose of this? All utility commands which
create WAL in O(table_size) or worse.

> I thought Heroku suggested something like this previously, and their
> use case was something along the lines of "we need to slow the system
> down enough to do a backup so we can delete some stuff before the disk
> fills".  For that, it seems likely to me that you would just want to
> slow everything down.

I think the usecase for this more along the lines of not slowing normal
operations or cause replication delays to standbys unduly, while
performing maintenance operations on relations.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-16 Thread Alvaro Herrera

Please keep the --help and the options in the SGML table in alphabetical
order within their respective sections.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 118 matches

Mail list logo