date:20150820

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Simon Riggs

On 20 August 2015 at 18:41, Peter Geoghegan  wrote:

> On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs 
> wrote:
> > On 20 August 2015 at 03:24, Peter Geoghegan  wrote:
> >>
> >>
> >> The patch is ~3.25x faster than master
> >
> >
> > I've tried to read this post twice and both times my work_mem overflowed.
> > ;-)
> >
> > Can you summarize what this patch does? I understand clearly what it
> doesn't
> > do...
>
> The most important thing that it does is always quicksort runs, that
> are formed by simply filling work_mem with tuples in no particular
> order, rather than trying to make runs that are twice as large as
> work_mem on average. That's what the ~3.25x improvement concerned.
> That's actually a significantly simpler algorithm than replacement
> selection, and appears to be much faster.

Then I think this is fine, not least because it seems like a first step
towards parallel sort.

This will give more runs, so merging those needs some thought. It will also
give a more predictable number of runs, so we'll be able to predict any
merging issues ahead of time. We can more easily find out the min/max tuple
in each run, so we only merge overlapping runs.

> You might even say that it's
> a dumb algorithm, because it is less sophisticated than replacement
> selection. However, replacement selection tends to use CPU caches very
> poorly, while its traditional advantages have become dramatically less
> important due to large main memory sizes in particular. Also, it hurts
> that we don't currently dump tuples in batches, for several reasons.
> Better to do memory intense operations in batch, rather than having a
> huge inner loop, in order to minimize or prevent instruction cache
> misses. And we can better take advantage of asynchronous I/O.
>
> The complicated aspect of considering the patch is whether or not it's
> okay to not use replacement selection anymore -- is that an
> appropriate trade-off?
>

Using a heapsort is known to be poor for large heaps. We previously
discussed the idea of quicksorting the first chunk of memory, then
reallocating the heap as a smaller chunk for the rest of the sort. That
would solve the cache miss problem.

I'd like to see some discussion of how we might integrate aggregation and
sorting. A heap might work quite well for that, whereas quicksort doesn't
sound like it would work as well.

> The reason that the code has not actually been simplified by this
> patch is that I still want to use replacement selection for one
> specific case: when it is anticipated that a "quicksort with
> spillover" can occur, which is only possible with incremental
> spilling. That may avoid most I/O, by spilling just a few tuples using
> a heap/priority queue, and quicksorting everything else. That's
> compelling when you can manage it, but no reason to always use
> replacement selection for the first run in the common case where there
> well be several runs in total.
>

I think its premature to retire that algorithm - I think we should keep it
for a while yet. I suspect it may serve well in cases where we have low
memory, though I accept that is no longer the case for larger servers that
we would now call typical.

This could cause particular issues in optimization, since heap sort is
wonderfully predictable. We'd need a cost_sort() that was slightly
pessimistic to cover the risk that a quicksort might not be as fast as we
hope.

> Is that any clearer?

Yes, thank you.

I'd like to see a more general and concise plan for how sorting evolves. We
are close to having the infrastructure to perform intermediate aggregation,
which would allow that to happen during sorting when required (aggregation,
sort distinct). We also agreed some time back that parallel sorting would
be the first incarnation of parallel operations, so we need to consider
that also.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Pavan Deolasee

On Fri, Aug 21, 2015 at 11:22 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2015-08-21 AM 06:27, David Fetter wrote:
>
> >> By the last sentence, do you mean only UPDATEs to the partition key that
> >> cause rows to jump partitions or simply any UPDATEs to the partition
> key?
> >
> > I don't know what Simon had in mind, but it seems to me that we have
> > the following in descending order of convenience to users, and I
> > presume, descending order of difficulty of implementation:
> >
> > 1.  Updates propagate transparently, moving rows between partitions if
> needed.
> >
> > 2.  Updates fail only if the change to a partition key would cause the
> > row to have to move to another partition.
> >
> > 3.  All updates to the partition key fail.
> >
>
> I was thinking I'd implement 2.
>
>
+1. IMHO thats a good starting point.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote

On 2015-08-21 AM 06:27, David Fetter wrote:

>> By the last sentence, do you mean only UPDATEs to the partition key that
>> cause rows to jump partitions or simply any UPDATEs to the partition key?
> 
> I don't know what Simon had in mind, but it seems to me that we have
> the following in descending order of convenience to users, and I
> presume, descending order of difficulty of implementation:
> 
> 1.  Updates propagate transparently, moving rows between partitions if needed.
> 
> 2.  Updates fail only if the change to a partition key would cause the
> row to have to move to another partition.
> 
> 3.  All updates to the partition key fail.
> 

I was thinking I'd implement 2.

There was some discussion last year[1] about how 1 could be realized.

Thanks,
Amit

[1]
http://www.postgresql.org/message-id/20140829172216.gf10...@awork2.anarazel.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote

On 2015-08-20 PM 10:19, David Fetter wrote:
> On Thu, Aug 20, 2015 at 06:58:24PM +0900, Amit Langote wrote:
>>
>> Ah, I understand the point of parameterization (TRUST). Seems like it
>> would be good to have with appropriate documentation of the same. Perhaps,
>> it might as well a parameter to the step 1 itself.
> 
> So TRUST would obviate step 2?  Better still!
> 

I really wish we could do this in one step, but...

As you know, primary intention behind keeping step 2 separate is to do the
heavy validation work with access exclusive lock on the other table
instead of on the master table.

Assuming we add TRUST TO step 1 (with implications clarified in docs),
when/if the user does not use TRUST, the default behavior being to perform
the validation, would it be OK as it would have to take the expensive lock
on the master table?

Motivation for keeping the step 2 (and TRUST parameter with it) is really
to avoid this last concern.

All that said, we'd still need some way to tell rows that came from a
TRUSTed table but do not really belong to the partition because we skipped
checking that at all. Obviously one can always write a query on the
partition that can find them, but some form of DDL would be what people
might expect.

>>
>> Do you mean ATTACH and DETACH, if they require access exclusive lock on
>> the parent, should not be in the first cut? Or am I misreading?
> 
> Sorry I was unclear.
> 
> ATTACH and DETACH should be in the first cut even if they require an
> access exclusive lock.
> 

Ah, got it.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2015-08-20 Thread Pavel Stehule

2015-08-20 21:16 GMT+02:00 Jim Nasby :

> On 8/19/15 7:22 PM, Tom Lane wrote:
>
>> Jim Nasby  writes:
>>
>>> Don't say "parse names for things other than tables".  Only a minority
 of the types of objects used in the database have names that meet this
 specification.

>>>
>> Really? My impression is that almost everything that's not a shared
>>> object allows for a schema...
>>>
>>
>> Tables meet this naming spec.  Columns, functions, operators, operator
>> classes/families, collations, constraints, and conversions do not (you
>> need more data to name them).  Schemas, databases, languages, extensions,
>> and some other things also do not, because you need *less* data to name
>> them.  Types also don't really meet this naming spec, because you need to
>> contend with special cases like "int[]" or "timestamp with time zone".
>> So this proposal doesn't seem very carefully thought-through to me,
>> or at least the use case is much narrower than it could be.
>>
>> Also, if "object does not exist" isn't supposed to be an error case,
>> what of "name is not correctly formatted"?  It seems a bit arbitrary
>> to me to throw an error in one case but not the other.
>>
>
> I think the important point here is this is *parse*_ident(). It's not
> meant to guarantee an object actually exists, what kind of object it is, or
> whether it's syntactically correct. It's meant only to separate an
> identifier into it's 3 (or in some cases 2) components. If this was as
> simple as string_to_array(foo, '.') then it'd be a bit pointless, but it's
> obviously a lot more complex than that.


parsing composite identifier is pretty complex - and almost all work is
done - it just need SQL envelope only

Pavel


>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-08-20 Thread Pavel Stehule

Hi

2015-08-20 16:42 GMT+02:00 Alvaro Herrera :

> Pavel Stehule wrote:
>
> Hi,
>
> > Job schedulers are important and sometimes very complex part of any
> > software. PostgreSQL miss it. I propose new contrib module, that can be
> > used simply for some tasks, and that can be used as base for other more
> > richer schedulers. I prefer minimalist design - but strong enough for
> > enhancing when it is necessary. Some complex logic can be implemented in
> PL
> > better than in C. Motto: Simply to learn, simply to use, simply to
> > customize.
>
> Have you made any progress on this?
>

I am working on second iteration prototype - resp. I worked one month ago.
I finished the basic design, basic infrastructure. I designed architecture
based on one coordinator and dynamicaly started workers.

I found, so probably some fair policy should be implemented in future.

I have to finish other requests now, so I am planning to continue at end of
autumn, but sources are public

https://github.com/okbob/generic-scheduler, Not sure, about code quality -
I had not time to debug it. But mental model (UI) is almost complete -
https://github.com/okbob/generic-scheduler/blob/master/schedulerx--1.0.sql

I found as interesting idea to handle not only time events, but handle our
notifications too. It can be perfect base for building some complex
workflow systems. But I did zero work on this topic.

Regards

Pavel

>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] 9.5 release notes

2015-08-20 Thread Tom Lane

Arthur Silva  writes:
> Are we landing pg_tgrm 1.2 in pg 9.5?

No, we aren't.

And please don't quote 70 lines of unrelated stuff before making
your point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan

On Thu, Aug 20, 2015 at 5:02 PM, Greg Stark  wrote:
> I haven't thought through the exponential
> growth carefully enough to tell if doubling the run size should
> decrease the number of passes linearly or by a constant number.

It seems that with 5 times the data that previously required ~30MB to
avoid a multi-pass sort (where ~2300MB is required for an internal
sort -- the benchmark query), it took ~60MB to avoid a multi-pass
sort. I guess I just didn't exactly determine either threshold due to
that taking too long, and that as predicted, every time the input size
quadruples, the required amount of work_mem to avoid multiple passes
only doubles. That will need to be verified more vigorously, but it
looks that way.

> But you're right that seems to be less and less a realistic scenario.
> Times when users are really processing data sets that large nowadays
> they'll just throw it into Hadoop or Biigquery or whatever to get the
> parallelism of many cpus. Or maybe Citus and the like.

I'm not sure that even that's generally true, simply because sorting a
huge amount of data is very expensive -- it's not really a "big data"
thing, so to speak. Look at recent results on this site:

http://sortbenchmark.org

Last year's winning "Gray" entrant, TritonSort, uses a huge parallel
cluster of 186 machines, but only sorts 100TB. That's just over 500GB
per node. Each node is a 32 core Intel Xeon EC2 instance with 244GB
memory, and lots of SSDs. It seems like the point of the 100TB minimum
rule in the "Gray" contest category is that that's practically
impossible to fit entirely in memory (to avoid merging).

Eventually, linearithmic growth becomes extremely painful, not matter
how much processing power you have. It takes a while, though.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous Transaction is back

2015-08-20 Thread Rajeev rastogi

On 18 August 2015 21:18, Robert Haas Wrote:

>This footnote goes to my point.
>
>It seems clear to me that having the autonomous transaction "see" the
>effects of the outer uncommitted transaction is a recipe for trouble.
>If the autonomous transaction updates a row and commits, and the outer
>transaction later aborts, the resulting state is inconsistent with any
>serial history.  I'm fairly certain that's going to leave us in an
>unhappy place.
>
>Even more obviously, ending up with two committed row versions that are
>both updates of a single ancestor version is no good.
>
>So, I agree that this scenario should be an error.  What I don't agree
>with is the idea that it should be the deadlock detector's job to throw
>that error.  Rather, I think that when we examine the xmax of the tuple
>we can see - which is the original one, not the one updated by the outer
>transaction - we should check whether that XID belongs to an outer
>transaction.  If it does, we should throw an error instead of trying to
>lock it.  That way (1) the error message will be clear and specific to
>the situation and (2) we don't need a separate PGPROC for each
>autonomous transaction.  The first of those benefits is agreeable; the
>second one is, in my opinion, a key design goal for this feature.

Yes I agree with this. I was in favor of error all the time without involving 
deadlock detector. 

Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-08-20 Thread Michael Paquier

On Thu, Jul 23, 2015 at 11:48 PM, Paul Ramsey 
wrote:
> On Wed, Jul 22, 2015 at 12:19 PM, Paul Ramsey 
wrote:
>>
>> I’ll have a look at doing invalidation for the case of changes to the FDW
>> wrappers and servers.
>
> Here's an updated patch that clears the cache on changes to foreign
> wrappers and servers.

Thanks. So I have finally looked at it and this is quite cool. Using for
example this setup:
CREATE EXTENSION seg;
CREATE EXTENSION postgres_fdw;
CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(host 'localhost', port '5432', dbname 'postgres', extensions 'seg');
CREATE USER MAPPING FOR PUBLIC SERVER postgres_server OPTIONS (password '');
CREATE FOREIGN TABLE aa_foreign (a seg) SERVER postgres_server OPTIONS
(table_name 'aa');
CREATE SERVER postgres_server_2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(host 'localhost', port '5432', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER postgres_server_2 OPTIONS (password
'');
CREATE FOREIGN TABLE aa_foreign_2 (a seg) SERVER postgres_server_2 OPTIONS
(table_name 'aa');

I can get the following differences by shipping an expression or not:
=# explain verbose select * from aa_foreign where a = '1 ... 2'::seg;
QUERY
PLAN
---
 Foreign Scan on public.aa_foreign  (cost=100.00..138.66 rows=11 width=12)
   Output: a
   Remote SQL: SELECT a FROM public.aa WHERE ((a OPERATOR(public.=) '1 ..
2'::public.seg))
(3 rows)
=# explain verbose select * from aa_foreign_2 where a = '1 ... 2'::seg;
 QUERY PLAN
-
 Foreign Scan on public.aa_foreign_2  (cost=100.00..182.44 rows=11 width=12)
   Output: a
   Filter: (aa_foreign_2.a = '1 .. 2'::seg)
   Remote SQL: SELECT a FROM public.aa
(4 rows)

if (needlabel)
appendStringInfo(buf, "::%s",
-
format_type_with_typemod(node->consttype,
-
node->consttypmod));
+
format_type_be_qualified(node->consttype));
Pondering more about this one, I think that we are going to need a new
routine in format_type.c to be able to call format_type_internal as
format_type_internal(type_oid, typemod, true/false, false, true). If
typemod is -1, then typemod_given (the third argument) is false, otherwise
typemod_given is true. That's close to what the C function format_type at
the SQL level can do except that we want it to be qualified. Regression
tests will need an update as well.

+   /* Option validation calls this function with NULL in the */
+   /* extensionOids parameter, to just do existence/syntax */
+   /* checking of the option */
Comment format is incorrect, this should be written like that:
+   /*
+* Option validation calls this function with NULL in the
+* extensionOids parameter, to just do existence/syntax
+* checking of the option.
+*/

+   if (!extension_list)
+   return false;
I would rather mark that as == NIL instead, NIL is what an empty list is.

+extern bool is_shippable(Oid procnumber, PgFdwRelationInfo *fpinfo);
There is no need to pass PgFdwRelationInfo to is_shippable. Only the list
of extension OIDs is fine.

+   bool shippable;/* extension the object appears within, or
InvalidOid if none */
That's nitpicking, but normally we avoid more than 80 characters per line
of code.

When attempting to create a server when an extension is not installed we
get an appropriate error:
=# CREATE SERVER postgres_server
 FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port '5432', dbname 'postgres', extensions
'seg');
ERROR:  42601: the "seg" extension must be installed locally before it can
be used on a remote server
LOCATION:  extractExtensionList, shippable.c:245
However, it is possible to drop an extension listed in a postgres_fdw
server. Shouldn't we invalidate cache as well when pg_extension is updated?
Thoughts?

+   if (!SplitIdentifierString(extensionString, ',', &extlist))
+   {
+   list_free(extlist);
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("unable to parse extension list \"%s\"",
+   extensionString)));
+   }
It does not matter much to call list_free here. The list is going to be
freed in the error context with ERROR.

+   foreach(ext, extension_list)
+   {
+   Oid extension_oid = (Oid) lfirst(ext);
+   if (foundDep->refobjid == extension_oid)
+   {
+   nresults++;
+   }
+   }
You could just use list_member_oid here. And why not just breaking the loop
once the

Re: [HACKERS] statistics for array types

2015-08-20 Thread Tomas Vondra


Hi,

On 08/11/2015 04:38 PM, Jeff Janes wrote:

When reviewing some recent patches, I decided the statistics gathered
 for arrays had some pre-existing shortcomings.

The main one is that when the arrays contain rare elements there is
no histogram to fall back upon when the MCE array is empty, the way
there is for scalar stats.  So it has to punt completely and resort
to saying that it is 0.5% selectivity without recourse to any data at
all.

The rationale for applying the threshold before things are eligible
for inclusion in the MCE array seems to be that this puts some
theoretical bound on the amount of error we are likely to have in
that element.  But I think it is better to exceed that theoretical
bound than it is to have no data at all.

The attached patch forces there to be at least one element in MCE,
keeping the one element with the highest predicted frequency if the
MCE would otherwise be empty.  Then any other element queried for is
assumed to be no more common than this most common element.


We only really need the frequency, right? So do we really need to keep
the actual MCV element? I.e. most_common_elem_freqs does not have the
same number of values as most_common_elems anyway:

  A list of the frequencies of the most common element values, i.e., the
  fraction of rows containing at least one instance of the given value.
  Two or three additional values follow the per-element frequencies;
  these are the minimum and maximum of the preceding per-element
  frequencies, and optionally the frequency of null elements.
  (Null when most_common_elems is.)

So we might modify it so that it's always defined - either it tracks the
same values as today (when most_common_elems is defined), or the
frequency of the most common element (when most_common_elems is NULL).

This way we can keep the current theoretical error-bound on the MCE
frequencies, and if that's not possible we can have at least the new
value without confusing existing code.


I'd also briefly considered just having the part of the code that
pulls the stats out of pg_stats interpret a MCE array as meaning
that nothing is more frequent than the threshold, but that would mean
that that part of the code needs to know about how the threshold is
chosen, which just seems wrong. And it would need to know the
difference between NULL MCE because no stats were gathered, versus
because stats were gathered but nothing met the threshold.


I'm not sure whether this is the same thing I just proposed ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

2015-08-20 Thread Kouhei Kaigai

> On 08/19/2015 03:53 PM, Tom Lane wrote:
> >
> > I don't see anything very wrong with constraining the initial
> > allocation to 1GB, or even less. That will prevent consuming insane
> > amounts of work_mem when the planner's rows estimate is too high
> > rather than too low. And we do have the ability to increase the hash
> > table size on the fly.
> 
> Perhaps. Limiting the initial allocation to 1GB might help with lowering
> memory consumed in case of over-estimation, and it's high enough so that
> regular queries don't run into that.
> 
> My initial thought was "if the memory consumption is a problem, lower
> the work_mem" but after thinking about it for a while I don't see a
> reason to limit the memory consumption this way, if possible.
> 
> But what is the impact on queries that actually need more than 1GB of
> buckets? I assume we'd only limit the initial allocation and still allow
> the resize based on the actual data (i.e. the 9.5 improvement), so the
> queries would start with 1GB and then resize once finding out the
> optimal size (as done in 9.5). The resize is not very expensive, but
> it's not free either, and with so many tuples (requiring more than 1GB
> of buckets, i.e. ~130M tuples) it's probably just a noise in the total
> query runtime. But I'd be nice to see some proofs of that ...
>
The problem here is we cannot know exact size unless Hash node doesn't
read entire inner relation. All we can do is relying planner's estimation,
however, it often computes a crazy number of rows.
I think resizing of hash buckets is a reasonable compromise.

> Also, why 1GB and not 512MB (which is effectively the current limit on
> 9.4, because we can't allocate 1GB there so we end up with 1/2 of that)?
> Why not some small percentage of work_mem, e.g. 5%?
>
No clear reason at this moment.

> Most importantly, this is mostly orthogonal to the original bug report.
> Even if we do this, we still have to fix the palloc after the resize.
> 
> So I'd say let's do a minimal bugfix of the actual issue, and then
> perhaps improve the behavior in case of significant overestimates. The
> bugfix should happen ASAP so that it gets into 9.5.0 (and backported).
> We already have patches for that.
> 
> Limiting the initial allocation deserves more thorough discussion and
> testing, and I'd argue it's 9.6 material at this point.
>
Indeed, the original bug was caused by normal MemoryContextAlloc().
The issue around memory over consumption is a problem newly caused
by this. I didn't notice it two months before.

> > The real problem here is the potential integer overflow in
> > ExecChooseHashTableSize. Now that we know there is one, that should
> > be fixed (and not only in HEAD/9.5).
> 
> I don't think there's any integer overflow. The problem is that we end
> up with nbuckets so high that (nbuckets*8) overflows 1GB-1.
> 
> There's a protection against integer overflow in place (it was there for
> a long time), but there never was any protection against the 1GB limit.
> Because we've been using much smaller work_mem values and
> NTUP_PER_BUCKET=10, so we could not actually reach it.
> 
> > But I believe Kaigai-san withdrew his initial proposed patch, and we
> > don't have a replacementas far as I saw.
> 
> I believe the patch proposed by KaiGai-san is the right one to fix the
> bug discussed in this thread. My understanding is KaiGai-san withdrew
> the patch as he wants to extend it to address the over-estimation issue.
>
> I don't think we should do that - IMHO that's an unrelated improvement
> and should be addressed in a separate patch.
>
OK, it might not be a problem we should conclude within a few days, just
before the beta release.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Greg Stark

On Thu, Aug 20, 2015 at 11:16 PM, Peter Geoghegan  wrote:
> It could reduce seek time, which might be the dominant cost (but not
> I/O as such).

No I didn't quite follow the argument to completion. Increasing the
run size is a win if it reduces the number of passes. In the
single-pass case it has to read all the data once, write it all out to
tapes, then read it all back in again.So 3x the data. If it's still
not sorted it
needs to write it all back out yet again and read it all back in
again. So 5x the data. If the tapes are larger it can avoid that 66%
increase in total I/O. In large data sets it can need 3, 4, or maybe
more passes through the data and saving one pass would be a smaller
incremental difference. I haven't thought through the exponential
growth carefully enough to tell if doubling the run size should
decrease the number of passes linearly or by a constant number.

But you're right that seems to be less and less a realistic scenario.
Times when users are really processing data sets that large nowadays
they'll just throw it into Hadoop or Biigquery or whatever to get the
parallelism of many cpus. Or maybe Citus and the like.

The main case where I expect people actually run into this is in
building indexes, especially for larger data types (which come to
think of it might be exactly where the comparison is expensive enough
that quicksort's cache efficiency isn't helpful).

But to do fair tests I would suggest you configure work_mem smaller
(since running tests on multi-terabyte data sets is a pain) and sort
some slower data types that don't fit in memory. Maybe arrays of text
or json?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Jim Nasby


On 8/20/15 3:44 PM, Josh Berkus wrote:

What could be added as an extension?


A method for preventing duplicate object keys.

Since I'm in the minority here lets just drop it. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

2015-08-20 Thread Peter Geoghegan

On Sat, Jun 13, 2015 at 3:53 PM, Peter Geoghegan  wrote:
> I think we should really address this. Attached patch adds a new
> release note item for it. It also adds to the documentation that
> explains why users should prefer varchar(n)/text to character(n); the
> lack of abbreviated key support now becomes a huge disadvantage for
> character(n), whereas in previous versions the disadvantages were
> fairly minor.
>
> In passing, I updated the existing sort item to reflect that only
> varchar(n), text, and numeric benefit from the abbreviation
> optimization (not character types more generally + numeric), and added
> a note on the effectiveness of the abbreviation optimization alone.

A recent e-mail from Kaigai-san [1] reminded me of this item. I really
think this limitation of char(n) needs to be documented along the
lines I proposed here back in June. Benchmarks like TPC-H use char(n)
extensively, since it's faster in other systems. However, PostgreSQL
now has hugely inferior sort performance for that type as compared to
text/varchar(n). This needs to be highlighted.

[1] 
http://www.postgresql.org/message-id/flat/CAM3SWZRRCs6KAyN-bDsh0_pG=8xm3fvcf1x9dlsvd3wvbt1...@mail.gmail.com#CAM3SWZRRCs6KAyN-bDsh0_pG=8xm3fvcf1x9dlsvd3wvbt1...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

2015-08-20 Thread Arthur Silva

On Sat, Aug 8, 2015 at 11:04 PM, Bruce Momjian  wrote:

> On Sun, Aug  9, 2015 at 01:24:33AM +1200, David Rowley wrote:
> >
> > On 7 August 2015 at 14:24, Bruce Momjian  wrote:
> >
> > On Tue, Jun 30, 2015 at 09:00:44PM +0200, Andres Freund wrote:
> > > * 2014-12-08 [519b075] Simon ..: Use GetSystemTimeAsFileTime
> directly in
> > win32
> > >   2014-12-08 [8001fe6] Simon ..: Windows: use
> > GetSystemTimePreciseAsFileTime if ..
> > >   Timer resolution isn't a unimportant thing for people using
> explain?
> >
> > This all seemed very internals-only, e.g.:
> >
> > On most Windows systems this change will actually have no
> significant
> > effect on
> > timestamp resolution as the system timer tick is typically
> between 1ms
> > and 15ms
> > depending on what timer resolution currently running
> applications have
> > requested. You can check this with clockres.exe from
> sysinternals.
> > Despite the
> > platform limiation this change still permits capture of finer
> > timestamps where
> > the system is capable of producing them and it gets rid of an
> > unnecessary
> > syscall.
> >
> > Was I wrong?
> >
> >
> >
> > This does have a user visible change. Timestamps are now likely to have 6
> > digits after the decimal point, if they're on a version of windows which
> > supports GetSystemTimePreciseAsFileTime();
> >
> > Master:
> >
> > postgres=# select now();
> >   now
> > ---
> >  2015-08-09 01:14:01.959645+12
> > (1 row)
> >
> > 9.4.4
> > postgres=# select now();
> > now
> > 
> >  2015-08-09 01:15:09.783+12
> > (1 row)
>
> Yes, this was already in the release notes:
>
> Allow higher-precision timestamp resolution on  class="osname">Windows 8 or Windows
> Server 2012 and later Windows systems (Craig Ringer)
>
> I am not sure why people were saying it was missing.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
>   + Everyone has their own god. +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Are we landing pg_tgrm 1.2 in pg 9.5?

If yes (we should), up to an order of magnitude improvements is a worthy
inclusion in the release notes.

--
Arthur Silva

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan

On Thu, Aug 20, 2015 at 6:05 AM, Greg Stark  wrote:
> On Thu, Aug 20, 2015 at 3:24 AM, Peter Geoghegan  wrote:
>> I believe, in general, that we should consider a multi-pass sort to be
>> a kind of inherently suspect thing these days, in the same way that
>> checkpoints occurring 5 seconds apart are: not actually abnormal, but
>> something that we should regard suspiciously. Can you really not
>> afford enough work_mem to only do one pass? Does it really make sense
>> to add far more I/O and CPU costs to avoid that other tiny memory
>> capacity cost?
>
> I think this is the crux of the argument. And I think you're
> basically, but not entirely, right.

I agree that that's the crux of my argument. I disagree about my not
being entirely right.  :-)

> The key metric there is not how cheap memory has gotten but rather
> what the ratio is between the system's memory and disk storage. The
> use case I think you're leaving out is the classic "data warehouse"
> with huge disk arrays attached to a single host running massive
> queries for hours. In that case reducing run size will reduce I/O
> requirements directly and halving the amount of I/O sort takes will
> halve the time it takes regardless of cpu efficiency. And I have a
> suspicion typical data distributions get much better than a 2x
> speedup.

It could reduce seek time, which might be the dominant cost (but not
I/O as such). I do accept that my argument did not really apply to
this case, but you seem to be making an additional non-conflicting
argument that certain data warehousing cases would be helped in
another way by my patch. My argument was only about multi-gigabyte
cases that I tested that were significantly improved, primarily due to
CPU caching effects. If this helps with extremely large sorts that do
require multiple passes by reducing seek time -- I think that they'd
have to be multi-terabyte sorts, which I am ill-equipped to test --
then so much the better, I suppose.

In any case, as I've said the way we allow run size to be dictated
only by available memory (plus whatever replacement selection can do
to make on-tape runs longer) is bogus. In the future there should be a
cost model for an optimal run size, too.

> But I think you're basically right that this is the wrong use case to
> worry about for most users. Even those users that do have large batch
> queries are probably not processing so much that they should be doing
> multiple passes. The ones that do are are probably more interested in
> parallel query, federated databases, column stores, and so on rather
> than worrying about just how many hours it takes to sort their
> multiple terabytes on a single processor.

I suppose so. If you can afford multiple terabytes of storage, you can
probably still afford gigabytes of memory to do a single pass. My
laptop is almost 3 years old, weighs about 1.5 Kg, and has 16 GiB of
memory. It's usually always that simple, and not really because we
assume that Postgres doesn't have to deal with multi-terabyte sorts.
Maybe I lack perspective, having never really dealt with a real data
warehouse. I didn't mean to imply that in no circumstances could
anyone profit from a multi-pass sort. If you're using Hadoop or
something, I imagine that it still makes sense.

In general, I think you'll agree that we should strongly leverage the
fact that a multi-pass sort just isn't going to be needed when things
are set up correctly under standard operating conditions nowadays.

> I am quite suspicious of quicksort though. It has O(n^2) worst case
> and I think it's only a matter of time before people start worrying
> about DOS attacks from users able to influence the data ordering. It's
> also not very suitable for GPU processing. Quicksort gets most of its
> advantage from cache efficiency, it isn't a super efficient algorithm
> otherwise, are there not other cache efficient algorithms to consider?

I think that high quality quicksort implementations [1] will continue
to be the way to go for sorting integers internally at the very least.
Practically speaking, problems with the worst case performance have
been completely ironed out since the early 1990s. I think it's
possible to DOS Postgres by artificially introducing a worst-case, but
it's very unlikely to be the easiest way of doing that in practice. I
admit that it's probably the coolest way, though.

I think that the benefits of offloading sorting to the GPU are not in
evidence today. This may be especially true of a "street legal"
implementation that takes into account all of the edge cases, as
opposed to a hand customized thing for sorting uniformly distributed
random integers. GPU sorts tend to use radix sort, and I just can't
see that catching on.

[1] https://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Tomas Vondra


Hi,

On 08/20/2015 11:04 PM, Stefan Kaltenbrunner wrote:

On 08/20/2015 06:09 PM, Tom Lane wrote:

Stefan Kaltenbrunner  writes:

I wonder if we should have a default of capping the dump to say 1k lines
or such and only optionally do a full one.


-1. It's worked like this for the last fifteen years or
thereabouts, and you're the first one to complain. I suspect some
weirdness in your logging setup, rather than any systemic problem
that we need to lobotomize our diagnostic output in order to
prevent.


not sure what you consider weird in the logging setup here - the
context dump is imho borderline on internal diagnostic output at a
debug level (rather than making sense to an average sysadmin) already
(and no way to control it). But having (like in our case) the backend
dumping 2 million basically identical lines into a general logfile
per event seems excessive and rather abusive towards the rest of the
system (just from an IO perspective for example or from a log file
post processing tool perspective)


I've seen similar issues too, on systems with perfectly common logging 
setups. So even if your logging setup would be strange, it's not the 
main factor here.



(The reason I say "lobotomize" is that there's no particularly
good reason to assume that the first N lines will tell you what you
need to know. And the filter rule would have to be *very* stupid,
because we can't risk trying to allocate any additional memory to
track what we're doing here.)


I do understand that there migt be challenges there but in the last
15 years machines got way faster and pg got way capable and some of
those capabilities might need to get revisited in that regards - and
while it is very nice that pg survives multiple oom cases pretty
nicely I dont think it is entitled to put an imho unreasonable burden
on the rest of the system by writing "insane" amounts of data.

Just from a sysadmin perspective it also means that it is trivial for
a misbehaving app to fill up the logfile on a system because unlike
almost all other actual other logging output there seems to be no way
to control/disabled it on a per role/database level.


IMHO the first thing we might do is provide log_memory_stats GUC 
controlling that. I'm not a big fan of adding GUC for everything, but in 
this case it seems appropriate, just like the other log_ options.


I also don't think logging just subset of the stats is a lost case. 
Sure, we can't know which of the lines are important, but for example 
logging just the top-level contexts with a summary of the child contexts 
would be OK. So something like this:


TopMemoryContext: 582728880 total in 71126 blocks; 6168 free  ...
  TopTransactionContext: 8192 total in 1 blocks; 6096 free (1 ...
3 child contexts, 10 blocks, 34873 free
  PL/pgSQL function context: 8192 total in 1 blocks; 1152 fre ...
  PL/pgSQL function context: 24576 total in 2 blocks; 11400 f ...
  Type information cache: 24576 total in 2 blocks; 11888 free ...
  PL/pgSQL function context: 8192 total in 1 blocks; 1120 fre ...
  PL/pgSQL function context: 24576 total in 2 blocks; 10984 f ...
  PL/pgSQL function context: 57344 total in 3 blocks; 29928 f ...
  PL/pgSQL function context: 57344 total in 3 blocks; 28808 f ...
  PL/pgSQL function context: 24576 total in 2 blocks; 5944 fr ...
  RI compare cache: 24576 total in 2 blocks; 15984 free (5 ch ...
  RI query cache: 24576 total in 2 blocks; 11888 free (5 chun ...
  PL/pgSQL function context: 57344 total in 3 blocks; 31832 f ...
  PL/pgSQL function context: 57344 total in 3 blocks; 29600 f ...
  PL/pgSQL function context: 57344 total in 3 blocks; 39688 f ...
  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 65 ...
  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free ...
  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free  ...
  Prepared Queries: 125886512 total in 25 blocks; 4764208 fre ...
  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chun ...
  Operator lookup cache: 24576 total in 2 blocks; 11888 free  ...
  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks ...
  Operator class cache: 8192 total in 1 blocks; 1680 free (0  ...
  smgr relation table: 24576 total in 2 blocks; 5696 free (4  ...
  TransactionAbortContext: 32768 total in 1 blocks; 32736 fre ...
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks);  ...
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); ...
4 child contexts, 34 blocks, 87283 free
  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chu ...
  CacheMemoryContext: 42236592 total in 28 blocks; 7160904 fr ...
1487438 child contexts, 14874382 blocks, 3498349 free

instead of the millions of lines with all the datails.

So either

log_memory_stats = (on|off)

or

log_memory_stats = (full|short|off)

might work.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread David Fetter

On Thu, Aug 20, 2015 at 11:16:37AM +0900, Amit Langote wrote:
> On 2015-08-19 PM 09:23, Simon Riggs wrote:
> > On 18 August 2015 at 11:30, Amit Langote 
> > wrote:

> > You haven't specified what would happen if an UPDATE would change a row's
> > partition. I'm happy to add this to the list of restrictions by saying that
> > the partition key cannot be updated.
> 
> UPDATEs that change a row's partition would cause error. I haven't
> implemented that yet but will that way in the next patch.
> 
> By the last sentence, do you mean only UPDATEs to the partition key that
> cause rows to jump partitions or simply any UPDATEs to the partition key?

I don't know what Simon had in mind, but it seems to me that we have
the following in descending order of convenience to users, and I
presume, descending order of difficulty of implementation:

1.  Updates propagate transparently, moving rows between partitions if needed.

2.  Updates fail only if the change to a partition key would cause the
row to have to move to another partition.

3.  All updates to the partition key fail.

Whichever of these we choose, we should document it with great
clarity.

> > We'll need regression tests that cover each restriction and docs
> > that match. This is not something we should leave until last.
> > People read the docs to understand the feature, helping them to
> > reach consensus. So it is for you to provide the docs before, not
> > wait until later. I will begin a code review once you tell me docs
> > and tests are present. We all want the feature, so its all about
> > the details now.
> 
> Sorry, should have added tests and docs already. I will add them in
> the next version of the patch. Thanks for willing to review.

Docs and tests are crucial, and thanks again for taking this on.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] remove unused ExecGetScanType

2015-08-20 Thread Alvaro Herrera

I don't see any reason not to remove this.  It's been unused since
a191a169d6d0b9558da4519e66510c4540204a51, dated Jan 10 2007.

-- 
Álvaro Herrera   Peñalolén, Chile
"La realidad se compone de muchos sueños, todos ellos diferentes,
pero en cierto aspecto, parecidos..." (Yo, hablando de sueños eróticos)
commit 603323089ec6d86dbc9a504c8772b8a8699c00e8[m
Author: Alvaro Herrera 
AuthorDate: Thu Aug 20 18:06:33 2015 -0300
CommitDate: Thu Aug 20 18:06:33 2015 -0300

Remove unused function ExecGetScanType

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 3c611b9..a735815 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -713,18 +713,6 @@ ExecFreeExprContext(PlanState *planstate)
  */
 
 /* 
- *		ExecGetScanType
- * 
- */
-TupleDesc
-ExecGetScanType(ScanState *scanstate)
-{
-	TupleTableSlot *slot = scanstate->ss_ScanTupleSlot;
-
-	return slot->tts_tupleDescriptor;
-}
-
-/* 
  *		ExecAssignScanType
  * 
  */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 193a654..226f905 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -344,7 +344,6 @@ extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 extern void ExecAssignProjectionInfo(PlanState *planstate,
 		 TupleDesc inputDesc);
 extern void ExecFreeExprContext(PlanState *planstate);
-extern TupleDesc ExecGetScanType(ScanState *scanstate);
 extern void ExecAssignScanType(ScanState *scanstate, TupleDesc tupDesc);
 extern void ExecAssignScanTypeFromOuterPlan(ScanState *scanstate);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Annotating pg_dump/pg_restore

2015-08-20 Thread Guillaume Lelarge

2015-08-20 18:43 GMT+02:00 Kevin Burke :

> Hi,
> Normally I try to annotate incoming queries, to make it easier to diagnose
> slow ones. For example:
>
> -- Users.findByPhoneNumber
> SELECT * FROM 
>
> The pg_dump and pg_restore commands issue a COPY with no possibility of
> adding a comment. It would be useful to know who or what exactly is
> performing a COPY against a database - maybe a nightly backup script, maybe
> a developer copying a table.
>
> I was wondering if you could have a command line flag that let you attach
> a comment to the query?
>
>
You already have the application name. You just need to log it.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com

Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Stefan Kaltenbrunner

On 08/20/2015 06:09 PM, Tom Lane wrote:
> Stefan Kaltenbrunner  writes:
>> I wonder if we should have a default of capping the dump to say 1k lines 
>> or such and only optionally do a full one.
> 
> -1.  It's worked like this for the last fifteen years or thereabouts,
> and you're the first one to complain.  I suspect some weirdness in
> your logging setup, rather than any systemic problem that we
> need to lobotomize our diagnostic output in order to prevent.

not sure what you consider weird in the logging setup here - the context
dump is imho borderline on internal diagnostic output at a debug level
(rather than making sense to an average sysadmin) already (and no way to
control it). But having (like in our case) the backend dumping 2 million
basically identical lines into a general logfile per event seems
excessive and rather abusive towards the rest of the system (just from
an IO perspective for example or from a log file post processing tool
perspective)

> 
> (The reason I say "lobotomize" is that there's no particularly good
> reason to assume that the first N lines will tell you what you need
> to know.  And the filter rule would have to be *very* stupid, because
> we can't risk trying to allocate any additional memory to track what
> we're doing here.)

I do understand that there migt be challenges there but in the last 15
years machines got way faster and pg got way capable and some of those
capabilities might need to get revisited in that regards - and while it
is very nice that pg survives multiple oom cases pretty nicely I dont
think it is entitled to put an imho unreasonable burden on the rest of
the system by writing "insane" amounts of data.

Just from a sysadmin perspective it also means that it is trivial for a
misbehaving app to fill up the logfile on a system because unlike almost
all other actual other logging output there seems to be no way to
control/disabled it on a per role/database level.

Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Josh Berkus

On 08/20/2015 12:24 PM, Jim Nasby wrote:
> On 8/17/15 4:25 PM, Josh Berkus wrote:
>> On 08/17/2015 02:18 PM, Jim Nasby wrote:
>>> >On 8/17/15 3:33 PM, Josh Berkus wrote:
 >>Again, how do we handle missing keys?  Just return NULL?  or
 ERROR?  I'd
 >>prefer the former, but there will be arguments the other way.
>>> >
>>> >I've been wondering if we should add some kind of "strict" JSON. My big
>>> >concern is throwing an error if you try to provide duplicate keys, but
>>> >it seems reasonable that json_strict would throw an error if you try to
>>> >reference something that doesn't exist.
>> Only if there's demand for it.  Is there?
> 
> I'm certainly worried (paranoid?) about it. Postgres is very good about
> not silently dropping data and this seems a glaring departure from that.
> I haven't looked yet but I'm hoping this could at least be added as an
> extension without duplicating a bunch of the existing JSON stuff.

There's a big difference between silently dropping data and implicitly
creating it. As I said, the only reason I could see wanting a strict
mode is because AppDev users expect it to be consistent with their
programming languages.  Otherwise, from a user perspective, being able
to create a whole nested chain in one statement is a big win.

What could be added as an extension?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Andrew Dunstan

On 08/20/2015 03:24 PM, Jim Nasby wrote:

On 8/17/15 4:25 PM, Josh Berkus wrote:

On 08/17/2015 02:18 PM, Jim Nasby wrote:

>On 8/17/15 3:33 PM, Josh Berkus wrote:
>>Again, how do we handle missing keys?  Just return NULL?  or 
ERROR?  I'd

>>prefer the former, but there will be arguments the other way.

>
>I've been wondering if we should add some kind of "strict" JSON. My 
big

>concern is throwing an error if you try to provide duplicate keys, but
>it seems reasonable that json_strict would throw an error if you 
try to

>reference something that doesn't exist.

Only if there's demand for it.  Is there?

I'm certainly worried (paranoid?) about it. Postgres is very good 
about not silently dropping data and this seems a glaring departure 
from that. I haven't looked yet but I'm hoping this could at least be 
added as an extension without duplicating a bunch of the existing JSON 
stuff.

What on earth does this have to do with duplicate keys? The jsonb input 
rules on duplicate keys are very clear, incidentally, and follow the 
practice of most JSON processors (duplicates are discarded, last one 
wins.) But that has nothing at all to do with this feature, really.

This feature can NOT be added as an extension, since it requires grammar 
processing modification.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Feng Tian

On Thu, Aug 20, 2015 at 1:16 PM, Peter Geoghegan  wrote:

> On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian  wrote:
> > Just a quick anecdotal evidence.  I did similar experiment about three
> years
> > ago.   The conclusion was that if you have SSD, just do quick sort and
> > forget the longer runs, but if you are using hard drives, longer runs is
> the
> > winner (and safer, to avoid cliffs).I did not experiment with
> RAID0/5 on
> > many spindles though.
> >
> > Not limited to sort, more generally, SSD is different enough from HDD,
> > therefore it may worth the effort for backend to "guess" what storage
> device
> > it has, then choose the right thing to do.
>
> The devil is in the details. I cannot really comment on such a general
> statement.
>
> I would be willing to believe that that's true under
> unrealistic/unrepresentative conditions. Specifically, when multiple
> passes are required with a sort-merge strategy where that isn't the
> case with replacement selection. This could happen with a tiny
> work_mem setting (tiny in an absolute sense more than a relative
> sense). With an HDD, where sequential I/O is so much faster, this
> could be enough to make replacement selection win, just as it would
> have in the 1970s with magnetic tapes.
>
> As I've said, the solution is to simply avoid multiple passes, which
> should be possible in virtually all cases because of the quadratic
> growth in a classic hybrid sort-merge strategy's capacity to avoid
> multiple passes (growth relative to work_mem's growth). Once you
> ensure that, then you probably have a mostly I/O bound workload, which
> can be made faster by adding sequential I/O capacity (or, on the
> Postgres internals side, adding asynchronous I/O, or with memory
> prefetching). You cannot really buy a faster CPU to make a degenerate
> heapsort faster.
>
> --
> Peter Geoghegan
>

Agree everything in principal,except one thing -- no, random IO on HDD in
2010s (relative to CPU/Memory/SSD), is not any faster than tape in 1970s.
:-)

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan

On Thu, Aug 20, 2015 at 1:28 PM, Feng Tian  wrote:
> Agree everything in principal,except one thing -- no, random IO on HDD in
> 2010s (relative to CPU/Memory/SSD), is not any faster than tape in 1970s.
> :-)

Sure. The advantage of replacement selection could be a deciding
factor in unrepresentative cases, as I mentioned, but even then it's
not going to be a dramatic difference as it would have been in the
past.

By the way, please don't top-post.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan

On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian  wrote:
> Just a quick anecdotal evidence.  I did similar experiment about three years
> ago.   The conclusion was that if you have SSD, just do quick sort and
> forget the longer runs, but if you are using hard drives, longer runs is the
> winner (and safer, to avoid cliffs).I did not experiment with RAID0/5 on
> many spindles though.
>
> Not limited to sort, more generally, SSD is different enough from HDD,
> therefore it may worth the effort for backend to "guess" what storage device
> it has, then choose the right thing to do.

The devil is in the details. I cannot really comment on such a general
statement.

I would be willing to believe that that's true under
unrealistic/unrepresentative conditions. Specifically, when multiple
passes are required with a sort-merge strategy where that isn't the
case with replacement selection. This could happen with a tiny
work_mem setting (tiny in an absolute sense more than a relative
sense). With an HDD, where sequential I/O is so much faster, this
could be enough to make replacement selection win, just as it would
have in the 1970s with magnetic tapes.

As I've said, the solution is to simply avoid multiple passes, which
should be possible in virtually all cases because of the quadratic
growth in a classic hybrid sort-merge strategy's capacity to avoid
multiple passes (growth relative to work_mem's growth). Once you
ensure that, then you probably have a mostly I/O bound workload, which
can be made faster by adding sequential I/O capacity (or, on the
Postgres internals side, adding asynchronous I/O, or with memory
prefetching). You cannot really buy a faster CPU to make a degenerate
heapsort faster.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

2015-08-20 Thread Tomas Vondra


Hi,

On 08/19/2015 03:53 PM, Tom Lane wrote:


I don't see anything very wrong with constraining the initial
allocation to 1GB, or even less. That will prevent consuming insane
amounts of work_mem when the planner's rows estimate is too high
rather than too low. And we do have the ability to increase the hash
table size on the fly.


Perhaps. Limiting the initial allocation to 1GB might help with lowering 
memory consumed in case of over-estimation, and it's high enough so that 
regular queries don't run into that.


My initial thought was "if the memory consumption is a problem, lower 
the work_mem" but after thinking about it for a while I don't see a 
reason to limit the memory consumption this way, if possible.


But what is the impact on queries that actually need more than 1GB of 
buckets? I assume we'd only limit the initial allocation and still allow 
the resize based on the actual data (i.e. the 9.5 improvement), so the 
queries would start with 1GB and then resize once finding out the 
optimal size (as done in 9.5). The resize is not very expensive, but 
it's not free either, and with so many tuples (requiring more than 1GB 
of buckets, i.e. ~130M tuples) it's probably just a noise in the total 
query runtime. But I'd be nice to see some proofs of that ...


Also, why 1GB and not 512MB (which is effectively the current limit on 
9.4, because we can't allocate 1GB there so we end up with 1/2 of that)? 
Why not some small percentage of work_mem, e.g. 5%?


Most importantly, this is mostly orthogonal to the original bug report. 
Even if we do this, we still have to fix the palloc after the resize.


So I'd say let's do a minimal bugfix of the actual issue, and then 
perhaps improve the behavior in case of significant overestimates. The 
bugfix should happen ASAP so that it gets into 9.5.0 (and backported). 
We already have patches for that.


Limiting the initial allocation deserves more thorough discussion and 
testing, and I'd argue it's 9.6 material at this point.



The real problem here is the potential integer overflow in
ExecChooseHashTableSize. Now that we know there is one, that should
be fixed (and not only in HEAD/9.5).


I don't think there's any integer overflow. The problem is that we end 
up with nbuckets so high that (nbuckets*8) overflows 1GB-1.


There's a protection against integer overflow in place (it was there for 
a long time), but there never was any protection against the 1GB limit. 
Because we've been using much smaller work_mem values and 
NTUP_PER_BUCKET=10, so we could not actually reach it.



But I believe Kaigai-san withdrew his initial proposed patch, and we
don't have a replacementas far as I saw.


I believe the patch proposed by KaiGai-san is the right one to fix the 
bug discussed in this thread. My understanding is KaiGai-san withdrew 
the patch as he wants to extend it to address the over-estimation issue.


I don't think we should do that - IMHO that's an unrelated improvement 
and should be addressed in a separate patch.



kind regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing wal_level change at run time

2015-08-20 Thread Andres Freund

On 2015-08-20 15:11:02 -0400, Peter Eisentraut wrote:
> It seems to me that this would effectively replace the wal_level
> parameter with the presence or absence of a magic replication slot.
> That doesn't seem like a net improvement.  It just replaces one
> well-known configuration mechanism with a new ad hoc one.

Well, with the difference that it can be changed automatically. We could
alternatively automagically use ALTER SYSTEM, but that's not really
guaranteed to work and isn't available via the replication protocol
currently. But I could live with that as well.

> >> Also note that this scheme and any similar one requires merging the
> >> archive and hot_standby levels, which was the core of your original
> >> proposal [1]
> > 
> > It doesn't really, we could continue to keep them separate. I'm
> > proposing to maintain wal_level automatically, so it'd not be
> > configurable anymore, so it'd not matter much, as long as we're not less
> > efficient.
> 
> But, under any scheme to set wal_level automatically, how will the
> primary know whether it needs to use level archive or hot_standby?

I'd have said archive_mode triggered archive and everything else
hot_standby.  I am still of the opinion though that the difference
between those two levels is meaningless and that we should remove
archive.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Feng Tian

On Thu, Aug 20, 2015 at 10:41 AM, Peter Geoghegan  wrote:

> On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs 
> wrote:
> > On 20 August 2015 at 03:24, Peter Geoghegan  wrote:
> >>
> >>
> >> The patch is ~3.25x faster than master
> >
> >
> > I've tried to read this post twice and both times my work_mem overflowed.
> > ;-)
> >
> > Can you summarize what this patch does? I understand clearly what it
> doesn't
> > do...
>
> The most important thing that it does is always quicksort runs, that
> are formed by simply filling work_mem with tuples in no particular
> order, rather than trying to make runs that are twice as large as
> work_mem on average. That's what the ~3.25x improvement concerned.
> That's actually a significantly simpler algorithm than replacement
> selection, and appears to be much faster. You might even say that it's
> a dumb algorithm, because it is less sophisticated than replacement
> selection. However, replacement selection tends to use CPU caches very
> poorly, while its traditional advantages have become dramatically less
> important due to large main memory sizes in particular. Also, it hurts
> that we don't currently dump tuples in batches, for several reasons.
> Better to do memory intense operations in batch, rather than having a
> huge inner loop, in order to minimize or prevent instruction cache
> misses. And we can better take advantage of asynchronous I/O.
>
> The complicated aspect of considering the patch is whether or not it's
> okay to not use replacement selection anymore -- is that an
> appropriate trade-off?
>
> The reason that the code has not actually been simplified by this
> patch is that I still want to use replacement selection for one
> specific case: when it is anticipated that a "quicksort with
> spillover" can occur, which is only possible with incremental
> spilling. That may avoid most I/O, by spilling just a few tuples using
> a heap/priority queue, and quicksorting everything else. That's
> compelling when you can manage it, but no reason to always use
> replacement selection for the first run in the common case where there
> well be several runs in total.
>
> Is that any clearer? To borrow a phrase from the processor
> architecture community, from a high level this is a "Brainiac versus
> Speed Demon" [1] trade-off. (I wish that there was a widely accepted
> name for this trade-off.)
>
> [1]
> http://www.lighterra.com/papers/modernmicroprocessors/#thebrainiacdebate
> --
> Peter Geoghegan
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Hi, Peter,

Just a quick anecdotal evidence.  I did similar experiment about three
years ago.   The conclusion was that if you have SSD, just do quick sort
and forget the longer runs, but if you are using hard drives, longer runs
is the winner (and safer, to avoid cliffs).I did not experiment with
RAID0/5 on many spindles though.

Not limited to sort, more generally, SSD is different enough from HDD,
therefore it may worth the effort for backend to "guess" what storage
device it has, then choose the right thing to do.

Cheers.

Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Jim Nasby

On 8/17/15 4:25 PM, Josh Berkus wrote:

On 08/17/2015 02:18 PM, Jim Nasby wrote:

>On 8/17/15 3:33 PM, Josh Berkus wrote:

>>Again, how do we handle missing keys?  Just return NULL?  or ERROR?  I'd
>>prefer the former, but there will be arguments the other way.

>
>I've been wondering if we should add some kind of "strict" JSON. My big
>concern is throwing an error if you try to provide duplicate keys, but
>it seems reasonable that json_strict would throw an error if you try to
>reference something that doesn't exist.

Only if there's demand for it.  Is there?

I'm certainly worried (paranoid?) about it. Postgres is very good about 
not silently dropping data and this seems a glaring departure from that. 
I haven't looked yet but I'm hoping this could at least be added as an 
extension without duplicating a bunch of the existing JSON stuff.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2015-08-20 Thread Jim Nasby


On 8/19/15 7:22 PM, Tom Lane wrote:

Jim Nasby  writes:

Don't say "parse names for things other than tables".  Only a minority
of the types of objects used in the database have names that meet this
specification.



Really? My impression is that almost everything that's not a shared
object allows for a schema...


Tables meet this naming spec.  Columns, functions, operators, operator
classes/families, collations, constraints, and conversions do not (you
need more data to name them).  Schemas, databases, languages, extensions,
and some other things also do not, because you need *less* data to name
them.  Types also don't really meet this naming spec, because you need to
contend with special cases like "int[]" or "timestamp with time zone".
So this proposal doesn't seem very carefully thought-through to me,
or at least the use case is much narrower than it could be.

Also, if "object does not exist" isn't supposed to be an error case,
what of "name is not correctly formatted"?  It seems a bit arbitrary
to me to throw an error in one case but not the other.


I think the important point here is this is *parse*_ident(). It's not 
meant to guarantee an object actually exists, what kind of object it is, 
or whether it's syntactically correct. It's meant only to separate an 
identifier into it's 3 (or in some cases 2) components. If this was as 
simple as string_to_array(foo, '.') then it'd be a bit pointless, but 
it's obviously a lot more complex than that.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing wal_level change at run time

2015-08-20 Thread Peter Eisentraut

On 8/19/15 9:32 AM, Andres Freund wrote:
>> I agree that we want both.  But requiring a restart is a hard stop,
>> whereas making configuration easier is a soft feature.
> 
> I don't think it makes that much of a difference for people new to
> postgres.

People new to postgres are not the only audience for this change.

>>> To deal with streaming replication, we automatically create slots for
>>> replicas, but, by default, *without* having them reserve WAL. The slot
>>> name would be determined in some automatic fashion (uuid or something)
>>> and stored in a new file in the data directory.  That allows us to
>>> increase the internal wal_level to hot_standby/archive whenever a
>>> replica has connected (and thus a physical slot exists), and to logical
>>> whenever a logical slot exists.
>>
>> That seems kind of weird.  So every time a replication client connects,
>> we create a new replication slot?  Won't they accumulate?
> 
> No, that's not what I was thinking of. The name of the slot would be
> stored somewhere in the data directory, and then be reused henceforth.

It seems to me that this would effectively replace the wal_level
parameter with the presence or absence of a magic replication slot.
That doesn't seem like a net improvement.  It just replaces one
well-known configuration mechanism with a new ad hoc one.

>> Also note that this scheme and any similar one requires merging the
>> archive and hot_standby levels, which was the core of your original
>> proposal [1]
> 
> It doesn't really, we could continue to keep them separate. I'm
> proposing to maintain wal_level automatically, so it'd not be
> configurable anymore, so it'd not matter much, as long as we're not less
> efficient.

But, under any scheme to set wal_level automatically, how will the
primary know whether it needs to use level archive or hot_standby?
There is no way to know.  So if we are going to keep these levels
separate, there would need to be a manual way to switch between them.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan

On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs  wrote:
> On 20 August 2015 at 03:24, Peter Geoghegan  wrote:
>>
>>
>> The patch is ~3.25x faster than master
>
>
> I've tried to read this post twice and both times my work_mem overflowed.
> ;-)
>
> Can you summarize what this patch does? I understand clearly what it doesn't
> do...

The most important thing that it does is always quicksort runs, that
are formed by simply filling work_mem with tuples in no particular
order, rather than trying to make runs that are twice as large as
work_mem on average. That's what the ~3.25x improvement concerned.
That's actually a significantly simpler algorithm than replacement
selection, and appears to be much faster. You might even say that it's
a dumb algorithm, because it is less sophisticated than replacement
selection. However, replacement selection tends to use CPU caches very
poorly, while its traditional advantages have become dramatically less
important due to large main memory sizes in particular. Also, it hurts
that we don't currently dump tuples in batches, for several reasons.
Better to do memory intense operations in batch, rather than having a
huge inner loop, in order to minimize or prevent instruction cache
misses. And we can better take advantage of asynchronous I/O.

The complicated aspect of considering the patch is whether or not it's
okay to not use replacement selection anymore -- is that an
appropriate trade-off?

The reason that the code has not actually been simplified by this
patch is that I still want to use replacement selection for one
specific case: when it is anticipated that a "quicksort with
spillover" can occur, which is only possible with incremental
spilling. That may avoid most I/O, by spilling just a few tuples using
a heap/priority queue, and quicksorting everything else. That's
compelling when you can manage it, but no reason to always use
replacement selection for the first run in the common case where there
well be several runs in total.

Is that any clearer? To borrow a phrase from the processor
architecture community, from a high level this is a "Brainiac versus
Speed Demon" [1] trade-off. (I wish that there was a widely accepted
name for this trade-off.)

[1] http://www.lighterra.com/papers/modernmicroprocessors/#thebrainiacdebate
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests are badly named

2015-08-20 Thread Andrew Dunstan




On 08/16/2015 08:30 PM, Michael Paquier wrote:

On Mon, Aug 17, 2015 at 7:15 AM, Noah Misch  wrote:

On Sun, Aug 16, 2015 at 05:08:56PM -0400, Andrew Dunstan wrote:

On 08/16/2015 02:23 PM, Noah Misch wrote:

-sub tapcheck
+sub tap_check
  {
-   InstallTemp();
+   die "Tap tests not enabled in configuration"
+ unless $config->{tap_tests};

I endorse Heikki's argument for having omitted the configuration flag:
http://www.postgresql.org/message-id/55b90161.5090...@iki.fi


That argument is not correct.  None of the tap tests can be run via make if
--enable-tap-tests is not set. This doesn't just apply to the check-world
target as Heikki asserted. Have a look at the definitions of prove_check and
prove_installcheck in src/Makefile.global.in if you don't believe me.

While that one piece of Heikki's argument was in error, I didn't feel that it
affected the conclusion.  Please reply to the aforementioned email to dispute
the decision; some involved parties may not be following this thread.

FWIW, I agree with Andrew's approach here. We are going to need this
parameter switch once TAP routines are used in another code path than
src/bin to control if a test can be run or not based on the presence
of t/ and the value of this parameter. See for example the patch aimed
at testing dumpable tables with pg_dump which should be part of the
target modulescheck as argued upthread.
My 2c.



I spoke to Heikki about this the other day, and he's fine with using the 
test if there's a need for it. In addition to Michael's point, the 
buildfarm has a need for it - if the flag isn't set it won't run the 
checks, so the flag should be supported.  I'm therefore going to stick 
with the code above.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Josh Berkus

On 08/20/2015 06:19 AM, David Fetter wrote:
> On Thu, Aug 20, 2015 at 06:58:24PM +0900, Amit Langote wrote:
>> Do you mean ATTACH and DETACH, if they require access exclusive lock on
>> the parent, should not be in the first cut? Or am I misreading?
> 
> Sorry I was unclear.
> 
> ATTACH and DETACH should be in the first cut even if they require an
> access exclusive lock.
> 
> Cheers,
> David.

I don't see a way for them to *ever* not require an access exclusive lock.

We could eventually implement:

DETACH PARTITION CONCURRENTLY

... but that's the only way I can see around it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan

On Thu, Aug 20, 2015 at 6:54 AM, Tom Lane  wrote:
> Greg Stark  writes:
>> Alternately, has anyone tested whether Timsort would work well?
>
> I think that was proposed a few years ago and did not look so good
> in simple testing.

I tested it in 2012. I got as far as writing a patch.

Timsort is very good where comparisons are expensive -- that's why
it's especially compelling when your comparator is written in Python.
However, when testing it with text, even though there were
significantly fewer comparisons, it was still slower than quicksort.
Quicksort is cache oblivious, and that's an enormous advantage. This
was before abbreviated keys; these days, the difference must be
larger.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] deparsing utility commands

2015-08-20 Thread Alvaro Herrera

Shulgin, Oleksandr wrote:

> A particularly nasty one is:
> 
> ERROR:  index "cwi_replaced_pkey" does not exist
> 
> The test statement that's causing it is this one:
> 
> ALTER TABLE cwi_test DROP CONSTRAINT cwi_uniq_idx,
> ADD CONSTRAINT cwi_replaced_pkey PRIMARY KEY
> USING INDEX cwi_uniq2_idx;
> 
> Which gets deparsed as:
> 
> ALTER TABLE cwi_test DROP CONSTRAINT cwi_uniq_idx,
> ADD CONSTRAINT cwi_replaced_pkey PRIMARY KEY
> USING INDEX cwi_replaced_pkey;
> 
> The problem is that during constraint transformation, the index is being
> renamed to match the constraint name and at the deparse stage the original
> index name appears to be lost completely...  I haven't figure out if
> there's a way around unless we introduce a new field in IndexStmt
> (originalName?) to cover exactly this corner case.

Oh :-(  Hmm, modifying parse nodes should normally be considered last
resort, and at the same time identifying objects based on names rather
than OIDs is frowned upon.  But I don't see any way to handle this
otherwise.  I'm not sure what's the best solution for this one.

> The updated versions of the core-support patch and the contrib modules are
> attached.

Please try to split things up, or at least don't mix more than it
already is.  For instance, the tsconfig mapping stuff should be its own
patch; we can probably make a case for pushing that on its own.

Also I see you added one caller of EventTriggerInhibitCommandCollection.
I don't like that crap at all and I think we would be better off if we
were able to remove it completely.  Please see whether it's possible to
handle this case in some other way.

You seem to have squashed the patches?  Please keep the split out.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Annotating pg_dump/pg_restore

2015-08-20 Thread Kevin Burke

Hi,
Normally I try to annotate incoming queries, to make it easier to diagnose
slow ones. For example:

-- Users.findByPhoneNumber
SELECT * FROM 

The pg_dump and pg_restore commands issue a COPY with no possibility of
adding a comment. It would be useful to know who or what exactly is
performing a COPY against a database - maybe a nightly backup script, maybe
a developer copying a table.

I was wondering if you could have a command line flag that let you attach a
comment to the query?

Thanks!
Kevin

-- 
kevin

Re: [HACKERS] deparsing utility commands

2015-08-20 Thread Alvaro Herrera

Shulgin, Oleksandr wrote:

> Another quirk of ALTER TABLE is that due to multi-pass processing in
> ATRewriteCatalogs, the same command might be collected a number of times.
> For example, in src/test/regress/sql/inherit.sql:
> 
> alter table a alter column aa type integer using bit_length(aa);
> 
> the "alter column type" then appears 4 times in the deparsed output as
> identical subcommands of a single ALTER TABLE command.

Yeah, I had a hack somewhere in the collection code that if the relation
ID was different from what was specified, then the command was ignored.
I removed that before commit because it seemed possible that for some
cases you actually want the command reported separately for each child.

I think our best option in this case is to ignore commands that are
reported for different relations during JSON deparsing.  Not sure how
easy that is.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Corey Huinker

>
>
> My experiences with Oracle's hash function were generally not good -
> there's a reason many hash algorithms exist.  If/when we do hash
> partitioning in Postgres I'd like to see the hash function be
> user-definable.
>

+1

In my experience, hash partitioning had one use: When you had run out of
ways to logically partition the data, AND you had two tables in a foreign
key relationship, AND that relationship was the most common use-case for
selecting the two tables. In which case, your one big join became 8 or 16
little ones, and all was well in the universe...

...until those little joins started to spill to disk, and now you need to
repartition by 2x hashes, and that basically means a reload of your primary
fact table, and a talk with a boss about why you painted yourself into a
corner when you first did that.




>
> Meanwhile, I think list and range are a good start.  I'd prefer to see
> sub-partitioning before hash partitioning.
>

+1

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread David Brownlee

On 20 August 2015 at 14:54, Greg Stark  wrote:
> On Wed, Jun 25, 2014 at 6:05 PM, John Klos  wrote:
>> While I wouldn't be surprised if you remove the VAX code because not many
>> people are going to be running PostgreSQL, I'd disagree with the assessment
>> that this port is broken. It compiles, it initializes databases, it runs, et
>> cetera, albeit not with the default postgresql.conf.
>
> So I've been playing with this a bit. I have simh running on my home
> server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
> intended to be a SAN and its cpu is woefully underpowered so the
> resulting VAX is actually very very slow. So slow I wonder if there's
> a bug in the emulator but anyways.

I've run NetBS/vax in simh on a laptop with a 2.5Ghz i5-2520M and it
feels "reasonably fast or a VAX" (make of that what you will :)

> I'm coming to the conclusion that the port doesn't really work
> practically speaking but the failures are more interesting than I
> expected. They come in a few varieties:

Mmm. edge cases and failing (probably reasonable :) assumptions.

> 1) Vax does not have IEEE fp. This manifests in a few ways, some of
> which may be our own bugs or missing expected outputs. The numeric
> data type maths often produce numbers rounded off differently, the
> floating point tests print numbers one digit shorter than our expected
> results expect and sometimes in scientific notation where we don't
> expect. The overflow tests generate floating point exceptions rather
> than overflows. Infinity and NaN don't work. The Json code in
> particular generates large numbers where +/- Infinity literals are
> supplied.
>
> There are some planner tests that fail with floating point exceptions
> -- that's probably a bug on our part. And I've seen at least one
> server crash (maybe two) apparently caused by one as well which I
> don't believe is expected.

Sounds like some useful test cases there.

> 2) The initdb problem is actually not our fault. It looks like a
> NetBSD kernel bug when allocating large shared memory blocks on a
> machine without lots of memory. There's not much initdb can do with a
> kernel panic...

That should definitely be fixed...

>BSD still has the problem of kern.maxfiles defaulting
> to a value low enough that even two connections causes the regression
> tests to run out of file descriptors. That's documented and it would
> be a right pain for initdb to detect that case.

Is initdb calling ulimit() to check/set open files? Its probably worth
it as a sanity check if nothing else.

I think the VAX default open_max is 128. The 'bigger' ports have a
default of 1024, and I think they should probably all be updated to
that, though that is orthogonal to a ulimit() check.

> 3) The tests take so long to run that autovacuum kicks in and the
> tests start producing rows in inconsistent orderings. I assume that's
> a problem we've run into on the CLOBBER_CACHE animals as well?

Can the autovaccum daemon settings be bumped/disabled while running the tests?

> 4) One of the tablesample tests seems to freeze indefinitely. I
> haven't looked into why yet. That might indeed indicate that the
> spinlock code isn't working?
>
> So my conclusion tentatively is that while the port doesn't actually
> work practically speaking it is nevertheless uncovering some
> interesting bugs.

Good to hear. Looks like bugs in both the OS and software side, so fun for all.

Thanks for taking the time to do this!

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] deparsing utility commands

2015-08-20 Thread Shulgin, Oleksandr

On Thu, Aug 20, 2015 at 4:28 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:


> Which gets deparsed as:
>
> ALTER TABLE cwi_test DROP CONSTRAINT cwi_uniq_idx,
> ADD CONSTRAINT cwi_replaced_pkey PRIMARY KEY
> USING INDEX cwi_replaced_pkey;
>
> The problem is that during constraint transformation, the index is being
> renamed to match the constraint name and at the deparse stage the original
> index name appears to be lost completely...  I haven't figure out if
> there's a way around unless we introduce a new field in IndexStmt
> (originalName?) to cover exactly this corner case.
>
> The updated versions of the core-support patch and the contrib modules are
> attached.
>

Another quirk of ALTER TABLE is that due to multi-pass processing in
ATRewriteCatalogs, the same command might be collected a number of times.
For example, in src/test/regress/sql/inherit.sql:

alter table a alter column aa type integer using bit_length(aa);

the "alter column type" then appears 4 times in the deparsed output as
identical subcommands of a single ALTER TABLE command.

Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Tom Lane

Stefan Kaltenbrunner  writes:
> I wonder if we should have a default of capping the dump to say 1k lines 
> or such and only optionally do a full one.

-1.  It's worked like this for the last fifteen years or thereabouts,
and you're the first one to complain.  I suspect some weirdness in
your logging setup, rather than any systemic problem that we
need to lobotomize our diagnostic output in order to prevent.

(The reason I say "lobotomize" is that there's no particularly good
reason to assume that the first N lines will tell you what you need
to know.  And the filter rule would have to be *very* stupid, because
we can't risk trying to allocate any additional memory to track what
we're doing here.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Joshua D. Drake



On 08/20/2015 08:51 AM, Stefan Kaltenbrunner wrote:


This is 9.1.14 on Debian Wheezy/amd64 fwiw - but I dont think we have
made relevant changes in more recent versions.


It seems we may also want to consider a way to drop those prepared 
queries after a period time of non use.


JD





regards


Stefan





--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Andres Freund

On 2015-08-20 16:42:21 +0100, Greg Stark wrote:
> Ah, I was wrong. It's not the tablesample test -- I think that was the
> last one to complete. Annoyingly we don't seem to print test names
> until they finish.
> 
> It was groupingsets. And it's stuck again on the same query:
> 
> regression=# select
> pid,now()-query_start,now()-state_change,waiting,state,query from
> pg_stat_activity where pid <> pg_backend_pid();
> +--+-+-+-++--+
> | pid  |?column? |?column? | waiting | state  |
> query |
> +--+-+-+-++--+
> | 9185 | 00:53:38.571552 | 00:53:38.571552 | f   | active | select
> a, b, grouping(a,b), sum(v), count(*), max(v)#|
> |  | | | ||   from
> gstest1 group by rollup (a,b);|
> +--+-+-+-++--+
> 
> It's only been stuck an hour so it's possible it's still running but
> this morning it was the same query that was running for 7 hours so I'm
> guessing not.

Interesting.

> Unfortunately I appear to have built without debugging symbols so
> it'll be a couple days before I can rebuild with symbols to get a back
> trace. (I vaguely remember when builds took hours but I don't recall
> ever having to wait 48 hours for a build even back then)

Without any further clues I'd guess it's stuck somewhere in
bipartite_match.c. That's the only place where floating point problmes
would possibly result in getting stuck.


I'm all for making sure these issues are indeed caused by platform
specific float oddities, and not a more fundamental problem. But to me
the state of this port, as evidenced in this thread, seems to be too bad
to be worthwhile keeping alive. Especially since there's really no
imaginable use case except for playing around.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Corey Huinker

>
>
> It seems the way of specifying per-partition definition/constraint,
> especially for range partitioning, would have a number of interesting
> alternatives.
>
> By the way, the [USING opclass_name] bit is just a way of telling that a
> particular key column has user-defined notion of "ordering" in case of
> range partitioning and "equality" for list partitioning. The opclass would
> eventually determine which WHERE clauses (looking at operators, operand
> types) are candidates to help prune partitions. If we use the range_op
> range_literal::range_type notation to describe partition constraint for
> each partition, it might not offer much beyond the readability. We are not
> really going to detect range operators being applied in WHERE conditions
> to trigger partition pruning, for example. Although I may be missing
> something...
>

I don't think it's important that ranges be used internally for partition
pruning, though I don't see a reason why they couldn't. I was making the
case for range types being used at partition creation/declaration time,
because you were proposing new syntax to describe a range of values in text
when we already have that in range types. We should eat our own dog food.

But mostly, I wanted to make sure that range partitions could have
inclusive and exclusive bounds.


>
> > - No partitioning scheme survives first contact with reality. So you will
> > need a facility for splitting and joining existing partitions. For
> > splitting partitions, it's sufficient to require that the new partition
> > share either a upper/lower bound (with the same inclusivity/exclusivity)
> of
> > an existing partition, thus uniquely identifying the partition to be
> split,
> > and require that the other bound be within the range of the partition to
> be
> > split. Similarly, it's fair to require that the partitions to be joined
> be
> > adjacent in range. In both cases, range operators make these tests
> simple.
> >
>
> SPLIT/MERGE can be done in later patches/release, I think.
>

Later patches for sure. When a partition "fills up", it's a critical
performance issue, so the fewer steps needed to amend it the better.

[HACKERS] (full) Memory context dump considered harmful

2015-08-20 Thread Stefan Kaltenbrunner


Hi all!


We just had a case of a very long running process of ours that creates 
does a lot of prepared statements through Perls DBD:Pg running into:


https://rt.cpan.org/Public/Bug/Display.html?id=88827

This resulted in millions of prepared statements created, but not 
removed in the affected backends over the course of 1-2 hours until the 
backends in question ran out of memory.
The "out of memory" condition resulted in one memory context dump 
generated per occurance each consisting of >2M lines(!) (basically a 
line of CachedPlan/CachePlanSource per statement/function).
In the 20 Minutes or it took monitoring to alert and operations to react 
this cause a followup incident because repeated "out of memory" 
conditions caused over 400M(!!) loglines amounting to some 15GB of data 
running the log partition dangerously close to full.


an example memory context dump looks like this:


TopMemoryContext: 582728880 total in 71126 blocks; 6168 free (52 
chunks); 582722712 used
  TopTransactionContext: 8192 total in 1 blocks; 6096 free (1 chunks); 
2096 used

ExecutorState: 8192 total in 1 blocks; 5392 free (0 chunks); 2800 used
  ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
SPI Exec: 0 total in 0 blocks; 0 free (0 chunks); 0 used
SPI Proc: 8192 total in 1 blocks; 5416 free (0 chunks); 2776 used
  PL/pgSQL function context: 8192 total in 1 blocks; 1152 free (1 
chunks); 7040 used
  PL/pgSQL function context: 24576 total in 2 blocks; 11400 free (1 
chunks); 13176 used
  Type information cache: 24576 total in 2 blocks; 11888 free (5 
chunks); 12688 used
  PL/pgSQL function context: 8192 total in 1 blocks; 1120 free (1 
chunks); 7072 used
  PL/pgSQL function context: 24576 total in 2 blocks; 10984 free (1 
chunks); 13592 used
  PL/pgSQL function context: 57344 total in 3 blocks; 29928 free (2 
chunks); 27416 used
  PL/pgSQL function context: 57344 total in 3 blocks; 28808 free (2 
chunks); 28536 used
  PL/pgSQL function context: 24576 total in 2 blocks; 5944 free (3 
chunks); 18632 used
  RI compare cache: 24576 total in 2 blocks; 15984 free (5 chunks); 
8592 used
  RI query cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 
used
  PL/pgSQL function context: 57344 total in 3 blocks; 31832 free (2 
chunks); 25512 used
  PL/pgSQL function context: 57344 total in 3 blocks; 29600 free (2 
chunks); 27744 used
  PL/pgSQL function context: 57344 total in 3 blocks; 39688 free (5 
chunks); 17656 used

  CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 
chunks); 6512 used
  PLpgSQL function cache: 24520 total in 2 blocks; 3744 free (0 
chunks); 20776 used
  Prepared Queries: 125886512 total in 25 blocks; 4764208 free (91 
chunks); 121122304 used

  TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
  Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 
chunks); 12688 used

  MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 
6512 used
  smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks); 
18880 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 
chunks); 32 used

  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
  ExecutorState: 57344 total in 3 blocks; 21856 free (2 chunks); 
35488 used

ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
  Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chunks); 
11744 used
  CacheMemoryContext: 42236592 total in 28 blocks; 7160904 free (298 
chunks); 35075688 used

CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks); 
3264 used

CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks); 
5816 used

CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks); 
5816 used

CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks); 
3264 used

CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks); 
5816 used

CachedPlan: 15360 total in 4 blocks; 6440 free (1 chunks); 8920 used
CachedPlanSource: 7168 total in 3 blocks; 1352 free (0 chunks); 
5816 used

CachedPlan: 7168 total in 3 blocks; 1544 free (0 chunks); 5624 used
CachedPlanSource: 7168 total in 3 blocks; 3904 free (1 chunks); 
3264 used

CachedPlan: 153

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Greg Stark

On Thu, Aug 20, 2015 at 3:29 PM, Tom Lane  wrote:
>
>> 4) One of the tablesample tests seems to freeze indefinitely. I
>> haven't looked into why yet. That might indeed indicate that the
>> spinlock code isn't working?
>
> The tablesample tests seem like a not-very-likely first place for such a
> thing to manifest.  What I'm thinking is that there are places in there
> where we loop till we get an expected result.  Offhand I thought they were
> all integer math; but if one was float and the VAX code wasn't doing what
> was expected, maybe we could blame this on float discrepancies as well.

Ah, I was wrong. It's not the tablesample test -- I think that was the
last one to complete. Annoyingly we don't seem to print test names
until they finish.

It was groupingsets. And it's stuck again on the same query:

regression=# select
pid,now()-query_start,now()-state_change,waiting,state,query from
pg_stat_activity where pid <> pg_backend_pid();
+--+-+-+-++--+
| pid  |?column? |?column? | waiting | state  |
query |
+--+-+-+-++--+
| 9185 | 00:53:38.571552 | 00:53:38.571552 | f   | active | select
a, b, grouping(a,b), sum(v), count(*), max(v)#|
|  | | | ||   from
gstest1 group by rollup (a,b);|
+--+-+-+-++--+

It's only been stuck an hour so it's possible it's still running but
this morning it was the same query that was running for 7 hours so I'm
guessing not.

Unfortunately I appear to have built without debugging symbols so
it'll be a couple days before I can rebuild with symbols to get a back
trace. (I vaguely remember when builds took hours but I don't recall
ever having to wait 48 hours for a build even back then)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Greg Stark

On Thu, Aug 20, 2015 at 4:13 PM, David Brownlee  wrote:
>> 2) The initdb problem is actually not our fault. It looks like a
>> NetBSD kernel bug when allocating large shared memory blocks on a
>> machine without lots of memory. There's not much initdb can do with a
>> kernel panic...
>
> That should definitely be fixed...

cf
http://mail-index.netbsd.org/port-vax/2015/08/19/msg002524.html
http://comments.gmane.org/gmane.os.netbsd.ports.vax/5773

It's possible it's a simh bug it smells more like a simple overflow to me.

>>BSD still has the problem of kern.maxfiles defaulting
>> to a value low enough that even two connections causes the regression
>> tests to run out of file descriptors. That's documented and it would
>> be a right pain for initdb to detect that case.
>
> Is initdb calling ulimit() to check/set open files? Its probably worth
> it as a sanity check if nothing else.

Yup, we do that.

> I think the VAX default open_max is 128. The 'bigger' ports have a
> default of 1024, and I think they should probably all be updated to
> that, though that is orthogonal to a ulimit() check.

That's the problem. initdb tests how many connections can start up
when writing the default config. But we assume that each process can
use up to the rlimit file descriptors without running into a
system-wide limit. Raising it to 1024 lets me get two processes
running which is how I'm running them currently.

Also I forgot to mention, I also have to raise the stack limit with
ulimit. The default is so small that Postgres calculates the maximum
safe value for its max_stack_depth config is 0.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Simon Riggs

On 20 August 2015 at 03:24, Peter Geoghegan  wrote:

>
> The patch is ~3.25x faster than master

I've tried to read this post twice and both times my work_mem overflowed.
;-)

Can you summarize what this patch does? I understand clearly what it
doesn't do...

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-08-20 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/20/2015 06:59 AM, Andrew Dunstan wrote:
> Is there any significant interest in either of these?
> 
> Josh Berkus tells me that he would like pg_controldata information,
> and I was a bit interested in pg_config information, for this
> reason: I had a report of someone who had configured using
> --with-libxml but the xml tests actually returned the results that
> are expected without xml being configured. The regression tests
> thus passed, but should not have. It occurred to me that if we had
> a test like
> 
> select pg_config('configure') ~ '--with-libxml' as has_xml;
> 
> in the xml tests then this failure mode would be detected.

I've found use for them both in the past. A fair amount of bit-rot has
set it no doubt, and these were quick and dirty to begin with, but I
have these hacks from a while back:

https://github.com/jconway/pg_config
https://github.com/jconway/pg_controldata

- -- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJV1epAAAoJEDfy90M199hlDjkP/AjAMtF4Q4rwfy5CsqA1rCgX
E/hLoTxExwU11nS2Q6IxC1unsXCDQrr2lkutKsw5Ybo78O7pMvR39GqShQ6ItaOr
xLxkYmxU1pEC4MAzZ6TR7RCAiP5SOgDEC3X6ArBqc0zub6FrnuYI3zv8eIgcTWqT
cFan4vCHYZnWUp3KQ0sOpSl4I/5jeW3AwrfTlnwskC8NwpP0Oa0DiXXTtXoRUYZI
CaWUsV9FgfzGvhyQCJpwcldEU9TprU24U09CpIVzSmw6Q9eQBHO4k+nT/Xw3BRjH
LxPM7gH9LweQOJhzP3J8agrJqbnSntayPZKG9ZsqMvC/8Ly+mlIO/X4cuEYpKO94
De9jO+aly6NhUdCpOdM6cZdqsTggXExaafl61wazYBUcLWotBnL9I1E9n59fm+yu
wgec7vdWIzZn6FYr+Ox2sgOBbxXVs3l/FLTCkoUgsZWRvlEL/naePr5TxMJMyqpo
pt15r1WRd4KwDEN4qxYHrzOab/T7QG1RS9Qr2v0GMPvQp4lIXgCq2aJJXy+iCDPE
lifDpHipk39h0r0RN377UFmfW3Z8DNLj0UQpuw+bOXtFLZpcA4WQdg64qXBaUU26
7icScC7+PpEr+HialFyA8lbDb9EVRrUaJ6CJajrGy8iuH/vpME2+40sgFTvavZtk
a0mfnPIdWJjzkldZGZ23
=hbBg
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-08-20 Thread Alvaro Herrera

Jim Nasby wrote:

> I think things like pageinspect are very different; I really can't see any
> use for those beyond debugging (and debugging by an expert at that).

I don't think that necessarily means it must continue to be in contrib.
Quite the contrary, I think it is a tool critical enough that it should
not be relegated to be a second-class citizen as it is now (let's face
it, being in contrib *is* second-class citizenship).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: contrib module - generic command scheduler

2015-08-20 Thread Alvaro Herrera

Pavel Stehule wrote:

Hi,

> Job schedulers are important and sometimes very complex part of any
> software. PostgreSQL miss it. I propose new contrib module, that can be
> used simply for some tasks, and that can be used as base for other more
> richer schedulers. I prefer minimalist design - but strong enough for
> enhancing when it is necessary. Some complex logic can be implemented in PL
> better than in C. Motto: Simply to learn, simply to use, simply to
> customize.

Have you made any progress on this?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Paul Ramsey

On August 20, 2015 at 7:21:10 AM, Tom Lane 
(t...@sss.pgh.pa.us(mailto:t...@sss.pgh.pa.us)) wrote:

> I'm not sure that the situation you describe can be expected to work
> reliably; the problems are far wider than just GUC variables. If two
> different .so's are exposing broadly the same set of C function names,
> how can we be sure which one will get called by the dynamic linker?
> Isn't it likely that we'd end up continuing to call some functions
> out of the old .so, probably leading to disaster?

Well, when you put it that way it sounds pretty scary :) 

For whatever it’s worth, we’ve had versioned .so’s for a very long time, and 
every time an upgrade happens there is a period during which a backend will 
have two versions loaded simultaneously. But we only noticed recently when we 
got the GUC collision (our first GUC was only introduced in PostGIS 2.1). 
Perhaps because after upgrading most people disconnect, and then the next time 
they connect they only get the new library henceforth. In some cases (start a 
fresh backend and then do the upgrade immediately) it’s even possible to do an 
upgrade without triggering the double-load situation.

> I don't necessarily object to providing some solution for GUCs, but
> I think first we need to question whether that isn't just the tip of
> a large iceberg.

There’s no doubt that the GUC issue is just a symptom of a potentially awful 
larger disease, but so far it’s the only symptom we’ve observed. Maybe because 
only a small % of functionality ever changes in a given release, combined with 
the relatively short lifespan of the double-loaded condition, and the fact the 
problem goes away immediately on re-connect, any problems have always just been 
ignored.

P.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Tom Lane

Greg Stark  writes:
> So I've been playing with this a bit. I have simh running on my home
> server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
> intended to be a SAN and its cpu is woefully underpowered so the
> resulting VAX is actually very very slow. So slow I wonder if there's
> a bug in the emulator but anyways.

Fun fun!

> There are some planner tests that fail with floating point exceptions
> -- that's probably a bug on our part. And I've seen at least one
> server crash (maybe two) apparently caused by one as well which I
> don't believe is expected.

That seems worth poking into.

> 3) The tests take so long to run that autovacuum kicks in and the
> tests start producing rows in inconsistent orderings. I assume that's
> a problem we've run into on the CLOBBER_CACHE animals as well?

I'd tentatively bet that it's more like planner behavioral differences
due to different floating-point roundoff.

> 4) One of the tablesample tests seems to freeze indefinitely. I
> haven't looked into why yet. That might indeed indicate that the
> spinlock code isn't working?

The tablesample tests seem like a not-very-likely first place for such a
thing to manifest.  What I'm thinking is that there are places in there
where we loop till we get an expected result.  Offhand I thought they were
all integer math; but if one was float and the VAX code wasn't doing what
was expected, maybe we could blame this on float discrepancies as well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Marko Tiikkaja


On 8/20/15 4:25 PM, Magnus Hagander wrote:

On Aug 20, 2015 4:16 PM, "Tom Lane"  wrote:

Our expectations about forward compatibility for postgresql.conf entries
have always been pretty low; even more so for not-widely-used settings.

In any case, wouldn't what you describe simply put off the pain for awhile
(replacing it with confusion in the short term)?  Eventually we're going
to break it.



Well, that's true of any depreciation. And if we throw a log message then
people will know about it...

That said, I'm not against a clean break with compatibility either. As long
as it's clean - like renaming the parameters.


All right.  I'll submit a patch for that.

Thanks for the input, both.


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Magnus Hagander

On Aug 20, 2015 4:16 PM, "Tom Lane"  wrote:
>
> Magnus Hagander  writes:
> > On Thu, Aug 20, 2015 at 2:36 AM, Marko Tiikkaja  wrote:
> >> On 2015-08-20 02:29, Tom Lane wrote:
> >>> Why add new GUCs for that?  Can't we just redefine radiusserver as a
list
> >>> of servers to try in sequence, and similarly split radiusport into a
list?
>
> > We could change it to radius_servers and radius_ports, and deprecate but
> > keep accepting the old parameters for a release or two. To make it
easy, we
> > make sure that both parameter names accepts the same format parameter,
so
> > it's easy enough to just replace it once deprecated.
>
> Considering that we did no such thing for listen_address, which is of
> concern to probably two or three orders-of-magnitude more users than
> radiusserver, I don't really see why we'd do it for the RADIUS settings.
>
> Our expectations about forward compatibility for postgresql.conf entries
> have always been pretty low; even more so for not-widely-used settings.
>
> In any case, wouldn't what you describe simply put off the pain for awhile
> (replacing it with confusion in the short term)?  Eventually we're going
> to break it.
>

Well, that's true of any depreciation. And if we throw a log message then
people will know about it...

That said, I'm not against a clean break with compatibility either. As long
as it's clean - like renaming the parameters.

/Magnus

Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Tom Lane

Paul Ramsey  writes:
> On August 20, 2015 at 2:17:31 AM, Simon Riggs 
> (si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:
>> Sounds like we need RedefineCustomStringVariable()Â 

> Yes, if that had existed we would not have had any problems (as long as it 
> delegated back to Define..() in the case where the variable hadnât been 
> created yetâ¦, since one of the problems is knowing if/to-what-extent a 
> custom variable has already been defined).

I'm not sure that the situation you describe can be expected to work
reliably; the problems are far wider than just GUC variables.  If two
different .so's are exposing broadly the same set of C function names,
how can we be sure which one will get called by the dynamic linker?
Isn't it likely that we'd end up continuing to call some functions
out of the old .so, probably leading to disaster?

I don't necessarily object to providing some solution for GUCs, but
I think first we need to question whether that isn't just the tip of
a large iceberg.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Tom Lane

Magnus Hagander  writes:
> On Thu, Aug 20, 2015 at 2:36 AM, Marko Tiikkaja  wrote:
>> On 2015-08-20 02:29, Tom Lane wrote:
>>> Why add new GUCs for that?  Can't we just redefine radiusserver as a list
>>> of servers to try in sequence, and similarly split radiusport into a list?

> We could change it to radius_servers and radius_ports, and deprecate but
> keep accepting the old parameters for a release or two. To make it easy, we
> make sure that both parameter names accepts the same format parameter, so
> it's easy enough to just replace it once deprecated.

Considering that we did no such thing for listen_address, which is of
concern to probably two or three orders-of-magnitude more users than
radiusserver, I don't really see why we'd do it for the RADIUS settings.

Our expectations about forward compatibility for postgresql.conf entries
have always been pretty low; even more so for not-widely-used settings.

In any case, wouldn't what you describe simply put off the pain for awhile
(replacing it with confusion in the short term)?  Eventually we're going
to break it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] exposing pg_controldata and pg_config as functions

2015-08-20 Thread Andrew Dunstan


Is there any significant interest in either of these?

Josh Berkus tells me that he would like pg_controldata information, and 
I was a bit interested in pg_config information, for this reason: I had 
a report of someone who had configured using --with-libxml but the xml 
tests actually returned the results that are expected without xml being 
configured. The regression tests thus passed, but should not have. It 
occurred to me that if we had a test like


select pg_config('configure') ~ '--with-libxml' as has_xml;

in the xml tests then this failure mode would be detected.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

2015-08-20 Thread Greg Stark

On Wed, Jun 25, 2014 at 6:05 PM, John Klos  wrote:
> While I wouldn't be surprised if you remove the VAX code because not many
> people are going to be running PostgreSQL, I'd disagree with the assessment
> that this port is broken. It compiles, it initializes databases, it runs, et
> cetera, albeit not with the default postgresql.conf.

So I've been playing with this a bit. I have simh running on my home
server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
intended to be a SAN and its cpu is woefully underpowered so the
resulting VAX is actually very very slow. So slow I wonder if there's
a bug in the emulator but anyways.

I'm coming to the conclusion that the port doesn't really work
practically speaking but the failures are more interesting than I
expected. They come in a few varieties:

1) Vax does not have IEEE fp. This manifests in a few ways, some of
which may be our own bugs or missing expected outputs. The numeric
data type maths often produce numbers rounded off differently, the
floating point tests print numbers one digit shorter than our expected
results expect and sometimes in scientific notation where we don't
expect. The overflow tests generate floating point exceptions rather
than overflows. Infinity and NaN don't work. The Json code in
particular generates large numbers where +/- Infinity literals are
supplied.

There are some planner tests that fail with floating point exceptions
-- that's probably a bug on our part. And I've seen at least one
server crash (maybe two) apparently caused by one as well which I
don't believe is expected.

2) The initdb problem is actually not our fault. It looks like a
NetBSD kernel bug when allocating large shared memory blocks on a
machine without lots of memory. There's not much initdb can do with a
kernel panic...  BSD still has the problem of kern.maxfiles defaulting
to a value low enough that even two connections causes the regression
tests to run out of file descriptors. That's documented and it would
be a right pain for initdb to detect that case.

3) The tests take so long to run that autovacuum kicks in and the
tests start producing rows in inconsistent orderings. I assume that's
a problem we've run into on the CLOBBER_CACHE animals as well?

4) One of the tablesample tests seems to freeze indefinitely. I
haven't looked into why yet. That might indeed indicate that the
spinlock code isn't working?

So my conclusion tentatively is that while the port doesn't actually
work practically speaking it is nevertheless uncovering some
interesting bugs.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Tom Lane

Greg Stark  writes:
> Alternately, has anyone tested whether Timsort would work well?

I think that was proposed a few years ago and did not look so good
in simple testing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread David Fetter

On Thu, Aug 20, 2015 at 06:58:24PM +0900, Amit Langote wrote:
> On 2015-08-19 PM 09:52, David Fetter wrote:
> > On Wed, Aug 19, 2015 at 04:30:39PM +0900, Amit Langote wrote:
> > 
> > There are use cases where we need to warn people that their assertions
> > need to be true, and if those assertions are not true, this will
> > explode, leaving them to pick the resulting shrapnel out of their
> > faces.  There are other parts of the system where this is true, as
> > when people write UDFs in C.
> > 
> > As I understand it, NOT VALID means, "I assert that the tuples already
> > here fit the constraint.  Any changes will be checked against the
> > constraint."
> > 
> > I've seen cases where a gigantic amount of data is coming out of some
> > distributed system which holds the constraint as an invariant.  This
> > let a DBA decide to add a NOT VALID constraint in order not to take
> > the hit of a second full scan of the data, which might have made the
> > import, and possibly the entire project, untenable.
> 
> Ah, I understand the point of parameterization (TRUST). Seems like it
> would be good to have with appropriate documentation of the same. Perhaps,
> it might as well a parameter to the step 1 itself.

So TRUST would obviate step 2?  Better still!

>  5. Detach partition
> 
>  ALTER TABLE partitioned_table
>  DETACH PARTITION partition_name [USING table_name]
> 
>  This removes partition_name as partition of partitioned_table.
>  The table continues to exist with the same name or 'table_name',
>  if specified.  pg_class.relispartition is set to false for the
>  table, so it behaves like a normal table.
> >>>
> >>> Could this take anything short of an access exclusive lock on the
> >>> parent?
> >>
> >> Yes, both the step 1 of ATTACH command and DETACH command take
> >> access exclusive lock on the parent. They are rather quick metadata
> >> changes, so should not stall others significantly, I think.
> > 
> > So no.  Weakening required locks has been something of an ongoing
> > project, project-wide, and need not be part of the first cut of this
> > long-needed feature.
> > 
> 
> Do you mean ATTACH and DETACH, if they require access exclusive lock on
> the parent, should not be in the first cut? Or am I misreading?

Sorry I was unclear.

ATTACH and DETACH should be in the first cut even if they require an
access exclusive lock.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Greg Stark

On Thu, Aug 20, 2015 at 3:24 AM, Peter Geoghegan  wrote:
> I believe, in general, that we should consider a multi-pass sort to be
> a kind of inherently suspect thing these days, in the same way that
> checkpoints occurring 5 seconds apart are: not actually abnormal, but
> something that we should regard suspiciously. Can you really not
> afford enough work_mem to only do one pass? Does it really make sense
> to add far more I/O and CPU costs to avoid that other tiny memory
> capacity cost?

I think this is the crux of the argument. And I think you're
basically, but not entirely, right.

The key metric there is not how cheap memory has gotten but rather
what the ratio is between the system's memory and disk storage. The
use case I think you're leaving out is the classic "data warehouse"
with huge disk arrays attached to a single host running massive
queries for hours. In that case reducing run size will reduce I/O
requirements directly and halving the amount of I/O sort takes will
halve the time it takes regardless of cpu efficiency. And I have a
suspicion typical data distributions get much better than a 2x
speedup.

But I think you're basically right that this is the wrong use case to
worry about for most users. Even those users that do have large batch
queries are probably not processing so much that they should be doing
multiple passes. The ones that do are are probably more interested in
parallel query, federated databases, column stores, and so on rather
than worrying about just how many hours it takes to sort their
multiple terabytes on a single processor.

I am quite suspicious of quicksort though. It has O(n^2) worst case
and I think it's only a matter of time before people start worrying
about DOS attacks from users able to influence the data ordering. It's
also not very suitable for GPU processing. Quicksort gets most of its
advantage from cache efficiency, it isn't a super efficient algorithm
otherwise, are there not other cache efficient algorithms to consider?

Alternately, has anyone tested whether Timsort would work well?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Simon Riggs

On 20 August 2015 at 13:21, Paul Ramsey  wrote:

> On August 20, 2015 at 2:17:31 AM, Simon Riggs (si...@2ndquadrant.com
> (mailto:si...@2ndquadrant.com)) wrote:
>
> > On 18 August 2015 at 21:03, Paul Ramsey wrote:
> >
> > > So I need a way to either (a) notice when I already have a (old) copy
> > > of the library loaded and avoid trying to setup the GUC in that case
> > > or (b) set-up the GUC in a somewhat less brittle way than
> > > DefineCustomStringVariable() allows, something that can overwrite
> > > things instead of just erroring out.
> >
> > Are you trying to preserve the in-memory state across upgrade as well?
> It sounds unlikely we can support that directly in the general case.
>
> I’m not sure what you mean by this.
>

The value of the global variable can't be maintained across upgrade.


> > Sounds like we need RedefineCustomStringVariable()
>
> Yes, if that had existed we would not have had any problems (as long as it
> delegated back to Define..() in the case where the variable hadn’t been
> created yet…, since one of the problems is knowing if/to-what-extent a
> custom variable has already been defined).
>
> We do now have a fix, which involved copying about 100 lines of core code
> (find_option, guc_var_compare, guc_name_compare) up, that does a low level
> search to see if there is a config_generic for a particular variable name,
> and if so whether it’s a placeholder or not. The presence of a
> non-placeholding definition is used as a “uh oh, there’s already a library
> in here” warning which keeps us from re-defining the variable and causing
> trouble.
>

I'm sure we all agree PostGIS is an important use case. Core is the right
place to put such code.

Please submit a patch that does that - better than someone else trying to
get it right for you. Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread David Steele

On 8/20/15 5:45 AM, Amit Langote wrote:
> On 2015-08-20 PM 06:27, Pavan Deolasee wrote:
>> On Tue, Aug 18, 2015 at 4:00 PM, Amit Langote >> wrote:
>>>
>>> PARTITION BY LIST ON (name)
>>> PARTITION BY RANGE ON (year, month)
>>>
>>> PARTITION BY LIST ON ((lower(left(name, 2)))
>>> PARTITION BY RANGE ON ((extract(year from d)), (extract(month from d)))
>>>
>>>
>>
>> How about HASH partitioning? Are there plans to support foreign tables as
>> partitions?
>>
> 
> I've not given HASH partitioning a lot of thought yet. About the latter,
> it seems it should not be too difficult to incorporate into the proposed
> partitioning internals.

Hash partitioning seems like it could wait.  If fact, I've nearly always
implemented hash partitions as list partitions.  This gives me control
over the hash function and allows me to use properties of the key to my
advantage.  For instance - if your key is a sha1 hash there's no need to
rehash, just grab the required number of bits off the end of the key.

My experiences with Oracle's hash function were generally not good -
there's a reason many hash algorithms exist.  If/when we do hash
partitioning in Postgres I'd like to see the hash function be
user-definable.

Meanwhile, I think list and range are a good start.  I'd prefer to see
sub-partitioning before hash partitioning.

-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Paul Ramsey

On August 20, 2015 at 2:17:31 AM, Simon Riggs 
(si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:

> On 18 August 2015 at 21:03, Paul Ramsey wrote:
>  
> > So I need a way to either (a) notice when I already have a (old) copy
> > of the library loaded and avoid trying to setup the GUC in that case
> > or (b) set-up the GUC in a somewhat less brittle way than
> > DefineCustomStringVariable() allows, something that can overwrite
> > things instead of just erroring out.
>  
> Are you trying to preserve the in-memory state across upgrade as well? It 
> sounds unlikely we can support that directly in the general case. 

I’m not sure what you mean by this.

> Sounds like we need RedefineCustomStringVariable() 

Yes, if that had existed we would not have had any problems (as long as it 
delegated back to Define..() in the case where the variable hadn’t been created 
yet…, since one of the problems is knowing if/to-what-extent a custom variable 
has already been defined).

We do now have a fix, which involved copying about 100 lines of core code 
(find_option, guc_var_compare, guc_name_compare) up, that does a low level 
search to see if there is a config_generic for a particular variable name, and 
if so whether it’s a placeholder or not. The presence of a non-placeholding 
definition is used as a “uh oh, there’s already a library in here” warning 
which keeps us from re-defining the variable and causing trouble.

P.

>  
 > --
> Simon Riggs http://www.2ndQuadrant.com/(http://www.2ndquadrant.com/)
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Marko Tiikkaja


On 8/20/15 12:57 PM, Magnus Hagander wrote:

I mean that you could write radius_server=foo or radius_servers=foo as well
as radius_server=foo,bar and radius_servers=foo,bar. As long as you don't
specify both radius_server and radius_servers, either one of them should
accept either one server or multiple servers.


I think if we do it this way, I'd prefer if "radiusserver" was kept 
fully backwards compatible.  So either specify "radiusserver" or 
"radius_servers", but only the second one accepts a list.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Magnus Hagander

On Thu, Aug 20, 2015 at 12:55 PM, Marko Tiikkaja  wrote:

> On 8/20/15 12:53 PM, Magnus Hagander wrote:
>
>> We could change it to radius_servers and radius_ports, and deprecate but
>> keep accepting the old parameters for a release or two.
>>
>
> That's one option..
>
> To make it easy, we
>> make sure that both parameter names accepts the same format parameter, so
>> it's easy enough to just replace it once deprecated.
>>
>
> I'm not sure I understand what you mean by this.

I mean that you could write radius_server=foo or radius_servers=foo as well
as radius_server=foo,bar and radius_servers=foo,bar. As long as you don't
specify both radius_server and radius_servers, either one of them should
accept either one server or multiple servers.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Marko Tiikkaja


On 8/20/15 12:53 PM, Magnus Hagander wrote:

We could change it to radius_servers and radius_ports, and deprecate but
keep accepting the old parameters for a release or two.


That's one option..


To make it easy, we
make sure that both parameter names accepts the same format parameter, so
it's easy enough to just replace it once deprecated.


I'm not sure I understand what you mean by this.


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting fallback RADIUS server(s)

2015-08-20 Thread Magnus Hagander

On Thu, Aug 20, 2015 at 2:36 AM, Marko Tiikkaja  wrote:

> On 2015-08-20 02:29, Tom Lane wrote:
>
>> Marko Tiikkaja  writes:
>>
>>> So I'm developing a patch to fix this issue, but I'm not
>>> exactly sure what the configuration should look like.  I see multiple
>>> options, but the one I like the best is the following:
>>>
>>
>> Add two new HBA configuration options: radiusfallbackservers and
>>> radiusfallbackports; both lists parsed with SplitIdentifierString (Ã  la
>>> listen_addresses).
>>>
>>
>> Why add new GUCs for that?  Can't we just redefine radiusserver as a list
>> of servers to try in sequence, and similarly split radiusport into a list?
>>
>> Or maybe better, rename it radius_servers.  But what you have here seems
>> weird, and it certainly doesn't follow the precedent of what we did when
>> we replaced listen_address with listen_addresses.
>>
>
> If we were adding RADIUS support today, this would be the best option. But
> seeing that we still only support one server today, this seems like a
> backwards incompatibility which practically 100% of users won't benefit
> from.  But I don't care much either way.  If we think breaking
> compatibility here is acceptable, I'd say we go for radius_servers and
> radius_ports.


We could change it to radius_servers and radius_ports, and deprecate but
keep accepting the old parameters for a release or two. To make it easy, we
make sure that both parameter names accepts the same format parameter, so
it's easy enough to just replace it once deprecated.

And we could consider throwing a WARNING or at least a LOG when the old
name is used, indicating that it's deprecated.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-20 Thread Andres Freund

On 2015-08-20 15:38:36 +0530, Amit Kapila wrote:
> On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund  wrote:
> > I spent some time today reviewing the commited patch. So far my only
> > major complaint is that I think the comments are only insufficiently
> > documenting the approach taken:
> > Stuff like avoiding ABA type problems by clearling the list entirely and
> > it being impossible that entries end up on the list too early absolutely
> > needs to be documented explicitly.
> >
> 
> I think more comments can be added to explain such behaviour if it is
> not clear via looking at current code and comments.

It's not mentioned at all, so yes.

> I think you are right and here we need to use something like what is
> suggested below by you.  Originally the code was similar to what you
> have written below, but it was using a different (new) variable to achieve
> what you have achieved with lwWaiting and to avoid the use of new
> variable the code has been refactored in current way.  I think we should
> do this change (I can write a patch) unless Robert feels otherwise.

I think we can just rename lwWaiting to something more generic.


> >Consider what happens if such a follower enqueues in another
> >transaction. It is not, as far as I could find out, guaranteed on all
> >types of cpus that a third backend can actually see nextClearXidElem
> >as INVALID_PGPROCNO. That'd likely require SMT/HT cores and multiple
> >sockets. If the write to nextClearXidElem is entered into the local
> >store buffer (leader #1) a hyper-threaded process (leader #2) can
> >possibly see it (store forwarding) while another backend doesn't
> >yet.
> >
> >I think this is very unlikely to be an actual problem due to
> >independent barriers until enqueued again, but I don't want to rely
> >on it undocumentedly. It seems safer to replace
> >+wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem);
> >+pg_atomic_write_u32(&proc->nextClearXidElem,
> INVALID_PGPROCNO);
> >with a pg_atomic_exchange_u32().
> >
> 
> I didn't follow this point, if we ensure that follower can never return
> before leader wakes it up, then why it will be a problem to update
> nextClearXidElem like above.

Because it doesn't generally enforce that *other* backends have seen the
write as there's no memory barrier.


> > +/*
> > + * ProcArrayGroupClearXid -- group XID clearing
> > + *
> > + * When we cannot immediately acquire ProcArrayLock in exclusive mode at
> > + * commit time, add ourselves to a list of processes that need their XIDs
> > + * cleared.  The first process to add itself to the list will acquire
> > + * ProcArrayLock in exclusive mode and perform
> ProcArrayEndTransactionInternal
> > + * on behalf of all group members.  This avoids a great deal of context
> > + * switching when many processes are trying to commit at once, since the
> lock
> > + * only needs to be handed from the last share-locker to one process
> waiting
> > + * for the exclusive lock, rather than to each one in turn.
> > + */
> > +static void
> > +ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
> > +{
> >
> > This comment, in my opinion, is rather misleading. If you have a
> > workload that primarily is slowed down due to transaction commits, this
> > patch doesn't actually change the number of context switches very
> > much. Previously all backends enqueued in the lwlock and got woken up
> > one-by-one. Safe backends 'jumping' the queue while a lock has been
> > released but the woken up backend doesn't yet run, there'll be exactly
> > as many context switches as today.
> >
> 
> I think this is debatable as in previous mechanism all the backends
> one-by-one clears their transaction id's (which is nothing but context
> switching) which lead to contention not only with read lockers
> but Write lockers as well.

Huh? You can benchmark it, there's barely any change in the number of
context switches.

I am *not* saying that there's no benefit to the patch, I am saying that
context switches are the wrong explanation. Reduced contention (due to
shorter lock holding times, fewer cacheline moves etc.) is the reason
this is beneficial.


> > I don't think it's a good idea to use the variable name in PROC_HDR and
> > PGPROC, it's confusing.

> What do you mean by this, are you not happy with variable name?

Yes. I think it's a bad idea to have the same variable name in PROC_HDR
and PGPROC.

struct PGPROC
{
...
/* Support for group XID clearing. */
volatile pg_atomic_uint32   nextClearXidElem;
...
}

typedef struct PROC_HDR
{
...
/* First pgproc waiting for group XID clear */
volatile pg_atomic_uint32 nextClearXidElem;
...
}

PROC_HDR's variable imo isn't well named.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-20 Thread Amit Kapila

On Thu, Aug 20, 2015 at 3:38 PM, Amit Kapila 
wrote:
>
> On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund  wrote:
> >
> >
> > How hard did you try checking whether this causes regressions? This
> > increases the number of atomics in the commit path a fair bit. I doubt
> > it's really bad, but it seems like a good idea to benchmark something
> > like a single full-throttle writer and a large number of readers.
> >
>
> I think the case which you want to stress is when the patch doesn't
> have any benefit (like single writer case)
>

I mean to say single writer, multiple readers.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Reduce ProcArrayLock contention

2015-08-20 Thread Amit Kapila

On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund  wrote:
>
> Hi,
>
> > On Wed, Aug 5, 2015 at 10:59 AM, Amit Kapila 
wrote:
> > OK, committed.
>
> I spent some time today reviewing the commited patch. So far my only
> major complaint is that I think the comments are only insufficiently
> documenting the approach taken:
> Stuff like avoiding ABA type problems by clearling the list entirely and
> it being impossible that entries end up on the list too early absolutely
> needs to be documented explicitly.
>

I think more comments can be added to explain such behaviour if it is
not clear via looking at current code and comments.

> I think I found a few minor correctness issues. Mostly around the fact
> that we, so far, tried to use semaphores in a way that copes with
> unrelated unlocks "arriving". I actually think I removed all of the
> locations that caused that to happen, but I don't think we should rely
> on that fact. Looking at the following two pieces of code:
> /* If the list was not empty, the leader will clear our XID. */
> if (nextidx != INVALID_PGPROCNO)
> {
> /* Sleep until the leader clears our XID. */
> while (pg_atomic_read_u32(&proc->nextClearXidElem) !=
INVALID_PGPROCNO)
> {
> extraWaits++;
> PGSemaphoreLock(&proc->sem);
> }
>
> /* Fix semaphore count for any absorbed wakeups */
> while (extraWaits-- > 0)
> PGSemaphoreUnlock(&proc->sem);
> return;
> }
> ...
> /*
>  * Now that we've released the lock, go back and wake everybody
up.  We
>  * don't do this under the lock so as to keep lock hold times to a
>  * minimum.  The system calls we need to perform to wake other
processes
>  * up are probably much slower than the simple memory writes we
did while
>  * holding the lock.
>  */
> while (wakeidx != INVALID_PGPROCNO)
> {
> PGPROC  *proc = &allProcs[wakeidx];
>
> wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem);
> pg_atomic_write_u32(&proc->nextClearXidElem,
INVALID_PGPROCNO);
>
> if (proc != MyProc)
> PGSemaphoreUnlock(&proc->sem);
> }
>
> There's a bunch of issues with those two blocks afaics:
>
> 1) The first block (in one backend) might read INVALID_PGPROCNO before
>ever locking the semaphore if a second backend quickly enough writes
>INVALID_PGPROCNO. That way the semaphore will be permanently out of
>"balance".
>

I think you are right and here we need to use something like what is
suggested below by you.  Originally the code was similar to what you
have written below, but it was using a different (new) variable to achieve
what you have achieved with lwWaiting and to avoid the use of new
variable the code has been refactored in current way.  I think we should
do this change (I can write a patch) unless Robert feels otherwise.

> 2) There's no memory barriers around dealing with nextClearXidElem in
>the first block. Afaics there needs to be a read barrier before
>returning, otherwise it's e.g. not guaranteed that the woken up
>backend sees its own xid set to InvalidTransactionId.
>
> 3) If a follower returns before the leader has actually finished woken
>that respective backend up we can get into trouble:
>

Surely that can lead to a problem and I think the reason and fix
for this is same as first point.

>Consider what happens if such a follower enqueues in another
>transaction. It is not, as far as I could find out, guaranteed on all
>types of cpus that a third backend can actually see nextClearXidElem
>as INVALID_PGPROCNO. That'd likely require SMT/HT cores and multiple
>sockets. If the write to nextClearXidElem is entered into the local
>store buffer (leader #1) a hyper-threaded process (leader #2) can
>possibly see it (store forwarding) while another backend doesn't
>yet.
>
>I think this is very unlikely to be an actual problem due to
>independent barriers until enqueued again, but I don't want to rely
>on it undocumentedly. It seems safer to replace
>+wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem);
>+pg_atomic_write_u32(&proc->nextClearXidElem,
INVALID_PGPROCNO);
>with a pg_atomic_exchange_u32().
>

I didn't follow this point, if we ensure that follower can never return
before leader wakes it up, then why it will be a problem to update
nextClearXidElem like above.

>
> I think to fix these ProcArrayGroupClearXid() should use a protocol
> similar to lwlock.c. E.g. make the two blocks somethign like:
> while (wakeidx != INVALID_PGPROCNO)
> {
> PGPROC  *proc = &allProcs[wakeidx];
>
> wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem);
>

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote

On 2015-08-19 PM 09:52, David Fetter wrote:
> On Wed, Aug 19, 2015 at 04:30:39PM +0900, Amit Langote wrote:
>>>
>>> One small change to make this part more efficient:
>>>
>>> 1. Take the access exclusive lock on table_name.
>>> 2. Check for a matching constraint on it.
>>> 3. If it's there, mark it as a valid partition.
>>> 4. If not, check for values outside the boundaries as above.
>>>
>>
>> That's an interesting idea. Thanks!
> 
> I hope I'm advancing this feature rather than bogging it down...
> 

Definitely advancing.

>> By a matching constraint, I guess you mean a 'valid' constraint from
>> which the declared partition constraint can be proven to follow. For
>> (a simple) example, from a CHECK (a >= 100 AND a < 150) on
>> table_name, the partition constraint implied by FOR VALUES START
>> (100) END (200) can be assumed to hold.
> 
> Well, I was assuming an exact match, but a stricter match seems like a
> nice-to-have...possibly later.
> 
>>> Should the be a *valid* constraint?  Perhaps that should be
>>> parameterized, as I'm not yet seeing a compelling argument either
>>> direction.  I'm picturing something like:
>>>
>>> ALTER TABLE table_name SET VALID PARTITION OF  [TRUST]
>>>
>>> where TRUST would mean that an existing constraint need not be VALID.
>>
>> Hmm, I'd think this step must be able to assert the partition
>> constraint beyond any doubt.  If the DBA added the constraint and
>> marked it invalid, she should first VALIDATE the constraint to make
>> it valid by performing whatever steps necessary before. IOW, a full
>> heap scan at least once is inevitable (the reason why we might want
>> to make this a two step process at all). Am I missing something?
> 
> There are use cases where we need to warn people that their assertions
> need to be true, and if those assertions are not true, this will
> explode, leaving them to pick the resulting shrapnel out of their
> faces.  There are other parts of the system where this is true, as
> when people write UDFs in C.
> 
> As I understand it, NOT VALID means, "I assert that the tuples already
> here fit the constraint.  Any changes will be checked against the
> constraint."
> 
> I've seen cases where a gigantic amount of data is coming out of some
> distributed system which holds the constraint as an invariant.  This
> let a DBA decide to add a NOT VALID constraint in order not to take
> the hit of a second full scan of the data, which might have made the
> import, and possibly the entire project, untenable.
> 
> See above.
> 

Ah, I understand the point of parameterization (TRUST). Seems like it
would be good to have with appropriate documentation of the same. Perhaps,
it might as well a parameter to the step 1 itself.

 5. Detach partition

 ALTER TABLE partitioned_table
 DETACH PARTITION partition_name [USING table_name]

 This removes partition_name as partition of partitioned_table.
 The table continues to exist with the same name or 'table_name',
 if specified.  pg_class.relispartition is set to false for the
 table, so it behaves like a normal table.
>>>
>>> Could this take anything short of an access exclusive lock on the
>>> parent?
>>
>> Yes, both the step 1 of ATTACH command and DETACH command take
>> access exclusive lock on the parent. They are rather quick metadata
>> changes, so should not stall others significantly, I think.
> 
> So no.  Weakening required locks has been something of an ongoing
> project, project-wide, and need not be part of the first cut of this
> long-needed feature.
> 

Do you mean ATTACH and DETACH, if they require access exclusive lock on
the parent, should not be in the first cut? Or am I misreading?

If so, there is no way to drop partitions. With the patch, it would be
achieved with detach and drop (if required).

> Thanks so much for working on this!
> 

Thanks for the feedback and suggestions!

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote

On 2015-08-19 AM 02:59, Corey Huinker wrote:
> 
> Quick thoughts borne of years of slugging it out with partitions on Oracle:
> 
> - Finally!!!
> 
> - Your range partitioning will need to express exclusive/inclusive bounds,
> or go to the Oracle model where every partition is a cascading "values less
> than" test context dependent on the partitions defined before it. I would
> suggest that leveraging existing range types (or allowing the user to
> specify a range type, like for a specific collation of a text range) would
> allow for the most flexible and postgres-ish range definition. You seem to
> do this with the "[USING] opclass_name" bit, but don't follow through on
> the START(...) and END(...).  Something like FOR VALUES <@
> '[''2014-01-01'',''2015-01-01)'::daterange would cover most needs
> succinctly, though I admit the syntax for complex ranges could be
> cumbersome, though something like FOR VALUES <@
> '[(''a'',1),(''b'',1))'::letter_then_number_range is still readable.
> 

It seems the way of specifying per-partition definition/constraint,
especially for range partitioning, would have a number of interesting
alternatives.

By the way, the [USING opclass_name] bit is just a way of telling that a
particular key column has user-defined notion of "ordering" in case of
range partitioning and "equality" for list partitioning. The opclass would
eventually determine which WHERE clauses (looking at operators, operand
types) are candidates to help prune partitions. If we use the range_op
range_literal::range_type notation to describe partition constraint for
each partition, it might not offer much beyond the readability. We are not
really going to detect range operators being applied in WHERE conditions
to trigger partition pruning, for example. Although I may be missing
something...

> - No partitioning scheme survives first contact with reality. So you will
> need a facility for splitting and joining existing partitions. For
> splitting partitions, it's sufficient to require that the new partition
> share either a upper/lower bound (with the same inclusivity/exclusivity) of
> an existing partition, thus uniquely identifying the partition to be split,
> and require that the other bound be within the range of the partition to be
> split. Similarly, it's fair to require that the partitions to be joined be
> adjacent in range. In both cases, range operators make these tests simple.
> 

SPLIT/MERGE can be done in later patches/release, I think.

> - Your features 4 and 5 are implemented in Oracle with SWAP PARTITION,
> which is really neat for doing ETLs and index rebuilds offline in a copy
> table, and then swapping the data segment of that table with the partition
> specified. Which could be considered cheating because none of the partition
> metadata changed, just the pointers to the segments. We already do this
> with adding removing INHERIT. I'm not saying they can't be separate
> functionality, but keeping an atomic SWAP operation would be grand.
> 

I think we can manage to find ways to make the proposed ATTACH/DETACH as
useful and convenient. Thanks for reminding of SWAP PARTITION.

Regards,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote

On 2015-08-20 PM 06:34, Simon Riggs wrote:
> On 20 August 2015 at 10:10, Amit Langote 
> wrote:
> 
>> On 2015-08-20 AM 05:10, Josh Berkus wrote:
PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' )
 START WITH value;
>>>
>>> Oh, I like that syntax!
>>>
>>> How would it work if there were multiple columns?  Maybe we don't want
>>> to allow that for this form?
>>>
>>
>> Yea, we could simply restrict it to the single column case, which does not
>> sound like a major restriction.
>>
> 
> PARTITION BY ...
> SUBPARTITION BY ...
> 
> We should plan for that in the way we develop the internals, but full
> support can wait until later patches.
> 

At the moment, a form of SUBPARTITION BY is to allow PARTITION BY in a
partition definition. But I can see that may not be what people would expect.

> My view has long been that the internals are they aspect here, not the
> syntax. We need to be able to have a very fast partition-selection
> mechanism that can be used in the planner or executor for each tuple.
> Working backwards, we need a relcache representation that allows that, and
> a catalog representation that allows that and syntax to match.
> 

Agreed.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote

On 2015-08-20 PM 06:27, Pavan Deolasee wrote:
> On Tue, Aug 18, 2015 at 4:00 PM, Amit Langote > wrote:
>>
>> PARTITION BY LIST ON (name)
>> PARTITION BY RANGE ON (year, month)
>>
>> PARTITION BY LIST ON ((lower(left(name, 2)))
>> PARTITION BY RANGE ON ((extract(year from d)), (extract(month from d)))
>>
>>
> 
> How about HASH partitioning? Are there plans to support foreign tables as
> partitions?
> 

I've not given HASH partitioning a lot of thought yet. About the latter,
it seems it should not be too difficult to incorporate into the proposed
partitioning internals.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Simon Riggs

On 20 August 2015 at 10:10, Amit Langote 
wrote:

> On 2015-08-20 AM 05:10, Josh Berkus wrote:
> > On 08/19/2015 04:59 AM, Simon Riggs wrote:
> >> I like the idea of a regular partitioning step because it is how you
> >> design such tables - "lets use monthly partitions".
> >>
> >> This gives sanely terse syntax, rather than specifying pages and pages
> >> of exact values in DDL
> >>
> >>PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' )
> >> START WITH value;
> >
> > Oh, I like that syntax!
> >
> > How would it work if there were multiple columns?  Maybe we don't want
> > to allow that for this form?
> >
>
> Yea, we could simply restrict it to the single column case, which does not
> sound like a major restriction.
>

PARTITION BY ...
SUBPARTITION BY ...

We should plan for that in the way we develop the internals, but full
support can wait until later patches.

My view has long been that the internals are they aspect here, not the
syntax. We need to be able to have a very fast partition-selection
mechanism that can be used in the planner or executor for each tuple.
Working backwards, we need a relcache representation that allows that, and
a catalog representation that allows that and syntax to match.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Pavan Deolasee

On Tue, Aug 18, 2015 at 4:00 PM, Amit Langote  wrote:

>
> Hi,
>
> I would like propose $SUBJECT for this development cycle. Attached is a
> WIP patch that implements most if not all of what's described below. Some
> yet unaddressed parts are mentioned below, too. I'll add this to the
> CF-SEP.
>
> Syntax
> ==
>
> 1. Creating a partitioned table
>
> CREATE TABLE table_name
> PARTITION BY {RANGE|LIST}
> ON (column_list);
>
> Where column_list consists of simple column names or expressions:
>
> PARTITION BY LIST ON (name)
> PARTITION BY RANGE ON (year, month)
>
> PARTITION BY LIST ON ((lower(left(name, 2)))
> PARTITION BY RANGE ON ((extract(year from d)), (extract(month from d)))
>
>

How about HASH partitioning? Are there plans to support foreign tables as
partitions?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Extension upgrade and GUCs

2015-08-20 Thread Simon Riggs

On 18 August 2015 at 21:03, Paul Ramsey  wrote:


> So I need a way to either (a) notice when I already have a (old) copy
> of the library loaded and avoid trying to setup the GUC in that case
> or (b) set-up the GUC in a somewhat less brittle way than
> DefineCustomStringVariable() allows, something that can overwrite
> things instead of just erroring out.
>

Are you trying to preserve the in-memory state across upgrade as well? It
sounds unlikely we can support that directly in the general case.

Sounds like we need RedefineCustomStringVariable()

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Amit Langote

On 2015-08-20 AM 05:10, Josh Berkus wrote:
> On 08/19/2015 04:59 AM, Simon Riggs wrote:
>> I like the idea of a regular partitioning step because it is how you
>> design such tables - "lets use monthly partitions".
>>
>> This gives sanely terse syntax, rather than specifying pages and pages
>> of exact values in DDL
>>
>>PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' )
>> START WITH value;
> 
> Oh, I like that syntax!
> 
> How would it work if there were multiple columns?  Maybe we don't want
> to allow that for this form?
> 

Yea, we could simply restrict it to the single column case, which does not
sound like a major restriction.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Simon Riggs

On 20 August 2015 at 03:16, Amit Langote 
wrote:


> Sorry, should have added tests and docs already. I will add them in the
> next version of the patch. Thanks for willing to review.
>

Thanks for picking up this challenge. It's easier if you have someone
interested all the way.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] proposal: function parse_ident

2015-08-20 Thread Pavel Stehule

2015-08-20 2:22 GMT+02:00 Tom Lane :

> Jim Nasby  writes:
> >> Don't say "parse names for things other than tables".  Only a minority
> >> of the types of objects used in the database have names that meet this
> >> specification.
>
> > Really? My impression is that almost everything that's not a shared
> > object allows for a schema...
>
> Tables meet this naming spec.  Columns, functions, operators, operator
> classes/families, collations, constraints, and conversions do not (you
> need more data to name them).  Schemas, databases, languages, extensions,
> and some other things also do not, because you need *less* data to name
> them.  Types also don't really meet this naming spec, because you need to
> contend with special cases like "int[]" or "timestamp with time zone".
> So this proposal doesn't seem very carefully thought-through to me,
> or at least the use case is much narrower than it could be.
>
> Also, if "object does not exist" isn't supposed to be an error case,
> what of "name is not correctly formatted"?  It seems a bit arbitrary
> to me to throw an error in one case but not the other.
>

When I would to work with living object, then behave of cast to regclass is
correct, but I can work with object, that will be created in future, and I
need to take some other information about this future object - and then
cast has to fail.

Regards

Pavel



>
> regards, tom lane
>

Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Michael Paquier

On Thu, Aug 20, 2015 at 11:16 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2015-08-19 PM 09:23, Simon Riggs wrote:
> > We'll need regression tests that cover each restriction and docs that
> > match. This is not something we should leave until last. People read the
> > docs to understand the feature, helping them to reach consensus. So it is
> > for you to provide the docs before, not wait until later. I will begin a
> > code review once you tell me docs and tests are present. We all want the
> > feature, so its all about the details now.
> >
>
> Sorry, should have added tests and docs already. I will add them in the
> next version of the patch.
>

Yes, those would be really good to have before any review so as it is
possible to grasp an understanding of what this patch does. I would like to
look at it as well more in depths.


> Thanks for willing to review.
>

Really thanks for working on that! I am sure you are going to get a lot of
feedback.
Regards,
-- 
Michael

86 matches

Mail list logo