Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Gavin Flower

On 03/10/17 04:02, Joshua D. Drake wrote:

On 10/01/2017 04:22 PM, Robert Haas wrote:

On Sun, Oct 1, 2017 at 3:48 PM, Greg Stark  wrote:

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.


+1, well said.


In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.


Yeah.

I think Alexander Korotkov's points are quite good, too.



+1 to both of these as well.

jD


Did a calculation:

# probability of collision
54561        0.43
54562    0.55

Essentially, you hit a greater than 50% chance of a collision before you 
get to 55 thousand statements.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Gavin Flower

On 13/08/17 16:19, Thomas Munro wrote:

On Sat, Aug 12, 2017 at 3:24 AM, Tom Lane  wrote:

[...]


I'd vote for including this in v10.  There doesn't seem to be any
downside to this: it's a no brainer to avoid our exploding hash table
case when we can see it coming.


But explosions are fun!
< ducks, and runs away VERY rapidly>

More seriously:
Up to now I'd been avoiding hash indexes, so great news!


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why does logical replication launcher exit with exit code 1?

2017-08-01 Thread Gavin Flower

On 02/08/17 11:03, Andres Freund wrote:

Hi,

On 2017-08-02 10:58:32 +1200, Thomas Munro wrote:

When I shut down a cluster that isn't using logical replication, it
always logs a line like the following.  So do the build farm members I
looked at.  I didn't see anything about this in the open items list --
isn't it a bug?

2017-08-02 10:39:25.007 NZST [34781] LOG:  worker process: logical
replication launcher (PID 34788) exited with exit code 1

Exit code 0 signals that a worker should be restarted. Therefore
graceful exit can't really use that.  I think a) we really need to
improve bgworker infrastructure around that b) shows the limit of using
bgworkers for this kinda thing - we should probably have a more bgworker
like infrastructure for internal workers.

- Andres


Returning zero to indicate success is a holdover to the time computers 
could only run one program at a time.  At the end of the code there was 
a jump table of 4 byte entries.  The first entry with a displacement of 
zero was the location to jump to for a normal exit, subsequent entries 
where for various error conditions.  This why often return codes where 
in positive multiples of 4, since we don't use jump tables now - more & 
more people are using any integers they want.


So apart from convention, returning zero is no longer held to be a 
sacred to indicate something exited okay.  In fact since, zero could 
simply mean a value was not explicitly assigned, hence it is actually a 
very dangerous value  to be used to indicate success!



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?

2017-06-05 Thread Gavin Flower

On 06/06/17 10:12, Gavin Flower wrote:

On 06/06/17 09:41, Mark Kirkwood wrote:

On 05/06/17 09:30, Tom Lane wrote:


I've been thinking about the behavior discussed in
https://www.postgresql.org/message-id/flat/20170522132017.29944.48391%40wrigleys.postgresql.org 

and it seems to me that there are a couple of things we ought to do 
about

it.

First, I think we need a larger hard floor on the number of occurrences
of a value that're required to make ANALYZE decide it is a "most common
value".  The existing coding is willing to believe that anything that
appears at least twice in the sample is a potential MCV, but that 
design
originated when we were envisioning stats samples of just a few 
thousand
rows --- specifically, default_statistics_target was originally just 
10,
leading to a 3000-row sample size.  So accepting two-appearance 
values as

MCVs would lead to a minimum MCV frequency estimate of 1/1500. Now it
could be a tenth or a hundredth of that.

As a round number, I'm thinking that a good floor would be a frequency
estimate of 1/1000.  With today's typical sample size of 3 rows,
a value would have to appear at least 30 times in the sample to be
believed to be an MCV.  That seems like it gives us a reasonable margin
of error against the kind of sampling noise seen in the above-cited
thread.

Second, the code also has a rule that potential MCVs need to have an
estimated frequency at least 25% larger than what it thinks the 
"average"
value's frequency is.  A rule of that general form seems like a good 
idea,
but I now think the 25% threshold is far too small to do anything 
useful.
In particular, in any case like this where there are more distinct 
values

than there are sample rows, the "average frequency" estimate will
correspond to less than one occurrence in the sample, so that this 
rule is
totally useless to filter anything that we would otherwise consider 
as an
MCV.  I wonder if we shouldn't make it be "at least double the 
estimated

average frequency".



Or possibly calculate the sample standard deviation and make use of 
that to help decide on a more flexible cutoff than twice the avg 
frequency?


Are there any research papers that might help us here (I'm drowning 
in a sea of barely relevant search results for most phrases I've 
tried so far)? I recall there were some that Tom referenced when this 
stuff was originally written.


On the other hand I do have access to some mathematicians 
specializing in statistics - so can get their thoughts on this issue 
if you feel it would be worthwhile.


Cheers

Mark


The standard deviation (sd) is proportional to the square root of the 
number in the sample in a Normal Distribution.


In a Normal Distribution, about 2/3 the values will be within plus or 
minus one sd of the mean.


There seems to be an implicit assumption that the distribution of 
values follows the Normal Distribution - has this been verified? I 
suspect that real data will have a skewed distribution of values, and 
may even be multi modal (multiple peaks)  The Normal Distribution has 
one central peak with 2 tails of the same shape & size.


So in a sample of 100 the sd is proportional to 10%,
for 10,000 the sd is proportional to 1%.

So essentially, the larger the sample size the more reliable is 
knowledge of the most common values (ignoring pathologically extreme 
distributions!) - the measure of reliability depends on the distribution.


How about selecting the cut off as the mean plus one sd, or something 
of that nature?  Note that the cut off point may result in no mcv's 
being selected - especially for small samples.


If practicable, it would be good to sample real datasets. Suggest 
looking at datasets were the current mechanism looks reasonable, and 
ones were the estimates are too far off.  Also, if possible, try any 
new selection method on the datasets and see what the difference is.


The above is based on what I remember from my university statistics 
papers, I took it up to 4th year level many moons ago.



Cheers,
Gavin




Suddenly realized, that the distribution of the FREQUENCIES of values in 
a Normal Distribution are probably NOT normally distributed!


For some shapes, and fancy names for them (like 'leptokurtic'), see: 
http://onlinestatbook.com/2/introduction/distributions.html


For multi modal examples: 
http://www.statisticshowto.com/multimodal-distribution


I tried looking for useful stuff to clarify things without success - so 
I think that asking a practising statistician is probably a very good 
idea!  :-)



Cheers,
Gavin


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?

2017-06-05 Thread Gavin Flower

On 06/06/17 09:41, Mark Kirkwood wrote:

On 05/06/17 09:30, Tom Lane wrote:


I've been thinking about the behavior discussed in
https://www.postgresql.org/message-id/flat/20170522132017.29944.48391%40wrigleys.postgresql.org 

and it seems to me that there are a couple of things we ought to do 
about

it.

First, I think we need a larger hard floor on the number of occurrences
of a value that're required to make ANALYZE decide it is a "most common
value".  The existing coding is willing to believe that anything that
appears at least twice in the sample is a potential MCV, but that design
originated when we were envisioning stats samples of just a few thousand
rows --- specifically, default_statistics_target was originally just 10,
leading to a 3000-row sample size.  So accepting two-appearance 
values as

MCVs would lead to a minimum MCV frequency estimate of 1/1500. Now it
could be a tenth or a hundredth of that.

As a round number, I'm thinking that a good floor would be a frequency
estimate of 1/1000.  With today's typical sample size of 3 rows,
a value would have to appear at least 30 times in the sample to be
believed to be an MCV.  That seems like it gives us a reasonable margin
of error against the kind of sampling noise seen in the above-cited
thread.

Second, the code also has a rule that potential MCVs need to have an
estimated frequency at least 25% larger than what it thinks the 
"average"
value's frequency is.  A rule of that general form seems like a good 
idea,
but I now think the 25% threshold is far too small to do anything 
useful.
In particular, in any case like this where there are more distinct 
values

than there are sample rows, the "average frequency" estimate will
correspond to less than one occurrence in the sample, so that this 
rule is
totally useless to filter anything that we would otherwise consider 
as an

MCV.  I wonder if we shouldn't make it be "at least double the estimated
average frequency".



Or possibly calculate the sample standard deviation and make use of 
that to help decide on a more flexible cutoff than twice the avg 
frequency?


Are there any research papers that might help us here (I'm drowning in 
a sea of barely relevant search results for most phrases I've tried so 
far)? I recall there were some that Tom referenced when this stuff was 
originally written.


On the other hand I do have access to some mathematicians specializing 
in statistics - so can get their thoughts on this issue if you feel it 
would be worthwhile.


Cheers

Mark


The standard deviation (sd) is proportional to the square root of the 
number in the sample in a Normal Distribution.


In a Normal Distribution, about 2/3 the values will be within plus or 
minus one sd of the mean.


There seems to be an implicit assumption that the distribution of values 
follows the Normal Distribution - has this been verified? I suspect that 
real data will have a skewed distribution of values, and may even be 
multi modal (multiple peaks)  The Normal Distribution has one central 
peak with 2 tails of the same shape & size.


So in a sample of 100 the sd is proportional to 10%,
for 10,000 the sd is proportional to 1%.

So essentially, the larger the sample size the more reliable is 
knowledge of the most common values (ignoring pathologically extreme 
distributions!) - the measure of reliability depends on the distribution.


How about selecting the cut off as the mean plus one sd, or something of 
that nature?  Note that the cut off point may result in no mcv's being 
selected - especially for small samples.


If practicable, it would be good to sample real datasets. Suggest 
looking at datasets were the current mechanism looks reasonable, and 
ones were the estimates are too far off.  Also, if possible, try any new 
selection method on the datasets and see what the difference is.


The above is based on what I remember from my university statistics 
papers, I took it up to 4th year level many moons ago.



Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] password_encryption, default and 'plain' support

2017-05-06 Thread Gavin Flower

On 06/05/17 22:44, Vik Fearing wrote:

On 05/05/2017 02:42 PM, Michael Paquier wrote:

+This option is obsolete but still accepted for backwards
+compatibility.
Isn't that incorrect English?

No.


It seems to me that this be non-plural,
as "for backward compatibility".

"Backwards" is not plural, it's a regional variation of "backward" (or
vice versa depending on which region you come from).  Both are correct.


I am English, born & bred, and 'Backwards' feels a lot more natural to me.


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-04 Thread Gavin Flower

On 05/05/17 06:39, Tomas Vondra wrote:



On 5/4/17 8:03 PM, Joe Conway wrote:

On 05/04/2017 10:56 AM, Andrew Dunstan wrote:



On 05/04/2017 01:52 PM, Joe Conway wrote:

On 05/04/2017 10:33 AM, Alvaro Herrera wrote:

I'm not sure what your point is.  We know that for some cases the
optimization barrier semantics are useful, which is why the 
proposal is

to add a keyword to install one explicitely:

  with materialized r as
  (
 select json_populate_record(null::mytype, myjson) as x
 from mytable
  )
  select (x).*
  from r;

this would preserve the current semantics.

I haven't been able to follow this incredibly long thread, so please
excuse me if way off base, but are we talking about that a CTE 
would be

silently be rewritten as an inline expression potentially unless it is
decorated with some new syntax?

I would find that very disconcerting myself. For example, would 
this CTE

potentially get rewritten with multiple evaluation as follows?

DROP SEQUENCE IF EXISTS foo_seq;
CREATE SEQUENCE foo_seq;

WITH a(f1) AS (SELECT nextval('foo_seq'))
SELECT a.f1, a.f1 FROM a;
  f1 | ?column?
+--
   1 |1
(1 row)

ALTER SEQUENCE foo_seq RESTART;
SELECT nextval('foo_seq'), nextval('foo_seq');
  nextval | ?column?
-+--
1 |2
(1 row)



I think that would be a change in semantics, which we should definitely
not be getting. Avoiding a change in semantics might be an interesting
exercise, but we have lots of clever coders ...


Well I think my point is that I always have understood CTEs to be
executed precisely once producing a temporary result set that is then
referenced elsewhere. I don't think that property of CTEs should change.
Somewhere else in the thread someone mentioned predicate push down --
that makes sense and maybe some clever coder can come up with a patch
that does that, but I would not be in favor of CTEs being inlined and
therefore evaluated multiple times.



I agree with this, but there's a difference between "executed exactly 
once" and "producing the same result as if executed exactly once".


I may be misunderstanding what other people proposed in this thread, 
but I think the plan was to only inline CTEs where we know it won't 
change the results, etc. So e.g. CTEs with volatile functions would 
not get inlined, which includes nextval() for example.


regards

It was the behaviour of "producing the same result as if executed 
exactly once" that I was thinking of - I think this is still valid for 
triggers & volatile functions, but such behaviour should be clearly 
documented.  This what I implicitly thought about CTE's when I first 
came across them - to me it is the intuitively obvious behaviour.  
However, limiting the rows based on the body of the SELECT would often 
be a very useful optimisation



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-04 Thread Gavin Flower

On 30/04/17 16:28, Tom Lane wrote:

Craig Ringer  writes:

- as you noted, it is hard to decide when it's worth inlining vs
materializing for CTE terms referenced more than once.

[ raised eyebrow... ]  Please explain why the answer isn't trivially
"never".

There's already a pretty large hill to climb here in the way of
breaking peoples' expectations about CTEs being optimization
fences.  Breaking the documented semantics about CTEs being
single-evaluation seems to me to be an absolute non-starter.

regards, tom lane


Could not each CTE be only evaluated once, but restricted (as far as is 
practicable) to the rows actually needed by the body of the SELECT?



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-04 Thread Gavin Flower

On 04/05/17 05:33, Alvaro Herrera wrote:
> David Fetter wrote:
>
>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>> Either we support it forever somehow, or we put it on a deprecation
>> schedule immediately and expect to be answering questions about it for
>> years after it's been removed.
>>
>> -1 for the GUC.
>
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11. What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.

+1 (Mario, proxied by me)

I've been asked to pass on comments by my colleague Mario of True Group
regarding the pain of the current behaviour of CTE's being optimisation
fences (as he doesn't normally follow this email group, I'll pass on any
feedback):

"Greetings,

Jumping onto the bandwagon here. At True Group in Auckland, we use
PostgreSQL as the platform for large-scale software development, often
with extensive and complex schemas, views and queries.

We frequently encounter poor query performance due to CTEs being an
optimisation barrier. We are forced to use workarounds such as a
set-returning function (parameterised view) in place of a view, and
manually placing the needed quals (WHERE clauses) into each CTE. This
creates headaches in large systems, where writing expressive SQL is
essential, rather than employing workarounds that make code harder to
understand and reuse.

Our general assumption is that we should write SQL that describes what
we want to achieve. The planner's job is to determine how to do that
efficiently.

There is an argument that pushing quals into CTEs could reduce
performance for some queries, especially if the qual is expensive. My
answer is twofold:

a) most queries will either experience no change or benefit from qual
push-down, as usually the expensive part is the query subtree, not the
top-level qual.

b) if a small proportion of queries are negatively affected, this is
better addressed by improving the planner's cost estimation. At worst,
an explicit OPTIMIZATION BARRIER hint could be added. But I know there
is much philosophical objection in the PostgreSQL community to planner
hints. The irony is that a main argument for retaining current CTE
behaviour is that people rely on CTEs as implicit barrier hints!

As an aside, we also encounter several other instances where qual
push-down fails, including where rewording a query in a way that is
semantically identical can change whether push-down takes place. But
probably the greatest bugbear is inability to push-down a qual into 
more

than one subtree, meaning if a query has two or more subtrees, each of
which is expensive, but only one row is needed from each, determined by
one qual at the top level, one subtree will get the qual pushed into it
and run fast, while the others will do a full (expensive) table 
scan and
be filtered only afterwards. There are also gains to be had in 
improving

qual push-down in sub-selects with aggregates, and others. But I think
these issues are only due to lack of resource to implement, rather than
philosophical/political objections.

We would like to see a Postgres version in the future that does much
better planning in the areas I've mentioned, including but not limited
to the present issue of CTEs. Our organisation may be able to 
contribute

materially to this, if the political will is there, and others affected
pitch in to achieve common goals. Not being expert in Postgres
internals, I am not certain how difficult each of the problems is.

I hope we have contributed usefully to the conversation, and invite
feedback.


Sincerely,
Mario"


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-03 Thread Gavin Flower

On 04/05/17 05:33, Alvaro Herrera wrote:

David Fetter wrote:


When we add a "temporary" GUC, we're taking on a gigantic burden.
Either we support it forever somehow, or we put it on a deprecation
schedule immediately and expect to be answering questions about it for
years after it's been removed.

-1 for the GUC.

Absolutely.

So ISTM we have three choices:

1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
likely to happen for a user that upgrades to pg11 is that 5 out of 10
CTE-using queries are going to become faster than with pg10, and they
are going to be happy; 4 out of five are going to see no difference, but
they didn't have to do anything about it; and the remaining query is
going to become slower, either indistinguishably so (in which case they
don't care and they remain happy because of the other improvements) or
notably so, in which case they can easily figure where to add the
MATERIALIZED option and regain the original performance.

+1

I've watched a colleague spend hours trying to optimise a complicated 
query with nested views, then find that this 'optimisation fence' was 
the heart of the problem.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

2017-04-14 Thread Gavin Flower

On 15/04/17 13:44, Andreas Karlsson wrote:

On 04/14/2017 11:54 PM, Tom Lane wrote:

I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki.  So I've pushed a patch that does
that.  We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.


Looked some at this and what take time now for me seems to mainly be 
these four things (out of a total runtime of 560 ms).


1. setup_conversion:140 ms
2. select_default_timezone:  90 ms
3. bootstrap_template1:  80 ms
4. setup_schema: 65 ms

These four take up about two thirds of the total runtime, so it seems 
likely that we may still have relatively low hanging fruit (but not 
worth committing for PostgreSQL 10).


I have not done profiling of these functions yet, so am not sure how 
they best would be fixed but maybe setup_conversion could be converted 
into bki entries to speed it up.


Andreas



How much could be done concurrently?


Cheers.
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2017-03-06 Thread Gavin Flower

On 07/03/17 02:46, Amit Kapila wrote:

On Mon, Mar 6, 2017 at 6:49 PM, Robert Haas  wrote:

On Mon, Mar 6, 2017 at 6:33 AM, Amit Kapila  wrote:

I was going to do it after index and index-only scans and parallel
bitmap heap scan were committed, but I've been holding off on
committing parallel bitmap heap scan waiting for Andres to fix the
simplehash regressions, so maybe I should just go do an update now and
another one later once that goes in.


As you wish, but one point to note is that as of now parallelism for
index scans can be influenced by table level parameter
parallel_workers.  It sounds slightly awkward, but if we want to keep
that way, then maybe we can update the docs to indicate the same.
Another option is to have a separate parameter for index scans.



My immediate gut feeling was to have separate parameters.

On thinking about it, I think they serve different use cases.  I don't 
think of workers when I think of Index scans, and I suspect I'd find 
more reasons to keep them separate if I looked deeper.



Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] case_preservation_and_insensitivity = on

2017-02-24 Thread Gavin Flower

On 25/02/17 09:02, Jim Nasby wrote:

On 2/24/17 12:28 AM, Robert Haas wrote:

On Thu, Feb 23, 2017 at 6:59 PM, Tom Lane  wrote:

I think these are straw-man arguments, really.  Consider the actual use
case for such a feature: it's for porting some application that was not
written against Postgres to begin with.

I'm not sure that's totally true.  I think at least some requests for
this feature are intended at satisfying somebody's sense of
aesthetics.


If I had $1 for every time I had to chase someone away from using 
camelcase I'd be able to sponsor a key at the next conference. And 
honetly I'd actually like to be able to use camelcase and still get 
easy to read output from \d & co.


IOW, this is definitely NOT driven just by porting efforts. I think 
the only reason we don't hear more requests about it is people 
(grudgingly) just muddle on without it.


I'd love to be able to successfully use camelcase for things like 
variable and table names in pg, without having to quote everything - but 
never felt it worthwhile to ask for it.



Cheers,

Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: EXPLAIN ANALYZE formatting

2017-01-28 Thread Gavin Flower

On 29/01/17 05:31, Pavel Stehule wrote:



2017-01-28 17:09 GMT+01:00 Tom Lane >:


Pavel Stehule > writes:
> Now EXPLAIN ANALYZE produce too wide rows for usage in presentations

> What do you think about possibility to implement >>optional<<
alternative
> formatting.
> Now:
>   node name (estimation) (actual)
> Alternative:
>   node name (estimation)
>(actual)

Seems like that would make a difference in only a tiny minority of
situations.  In a deeply nested plan you'll have trouble no matter
what, and it's not uncommon that the node name line isn't the widest
thing anyway.


It is related to presentation where you have to use large type - and 
where usually don't present complex nested plans, or you present only 
fragments.


A output of EXPLAIN is usually ok - EXPLAIN ANALYZE does a overflow

This feature is in nice to have category - probably interesting for 
lectures or presenters only - can helps and doesn't need lot of work. 
So I am ask for community opinion.


The result should not be exactly how I proposed - any form what is 
more friendly for tiny monitor (projectors) is welcome


Regards

Pavel


regards, tom lane



How about have a GUC to control the formatting of how it is displayed?

Could also include maximum line width (default 'infinite'), and word 
wrapping rules, ...



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RustgreSQL

2017-01-08 Thread Gavin Flower

On 09/01/17 11:31, Michael Paquier wrote:

On Mon, Jan 9, 2017 at 6:51 AM, Jan de Visser  wrote:

Your fear of C in unfounded. The definitive c89 reference is a little book of
about 250 pages, more than half of which is about the standard library of
which you'll never use more than half. If you have some notepaper laying about
on which to scribble pointer diagrams you can be a C programmer to :-)

The reference guide of Brian Kernighan and Dennis Ritchie? Definitely
a must-have!


I learnt C from the original version of K, and bought the ANSI version 
when that came out - at the time I was a COBOL programmer on a mighty 
MainFrame (with a 'massive' 1 MB of Core Memory and a 'fast' 2 MHz 
processor).  Now I use Java.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RustgreSQL

2017-01-08 Thread Gavin Flower

On 08/01/17 22:09, Joel Jacobson wrote:

Hi all,

Is anyone working on porting PostgreSQL to Rust?

Corrode looks a bit limited for the task, but maybe it can be a start.
It doesn't support goto or switch, but maybe the gotos patterns are 
not too complicated.


My motivation is primarily I don't want to learn all the 
over-complicated details of C,
but at the same time I would like to be productive in a safe system 
language,

a category in which Rust seems to be alone.

Porting PostgreSQL to Rust would be a multi-year project,
and it could only be done if the process could be fully automated,
by supporting all the coding patterns used by the project,
otherwise a Rust-port would quickly fall behind the master branch.
But if all git commits could be automatically converted to Rust,
then the RustgreSQL project could pull all commits from upstream
until all development has switched over to Rust among all developers.

Is this completely unrealistic or is it carved in stone PostgreSQL 
will always be a C project forever and ever?


From my very limited understanding, PostgreSQL is more likely to be 
converted to C++!



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-12-05 Thread Gavin Flower

On 05/12/16 17:00, Mithun Cy wrote:
[...]
errorMessage even outside PQconnectPoll. But that seems not required. 
Attacting the new patch which fixes the same.


--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com 

[...]

Is that meant to be 'attaching' or 'attacking'???



MORE SERIOUSLY
thanks for your and others efforts for making pg great!


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] emergency outage requiring database restart

2016-10-17 Thread Gavin Flower

On 18/10/16 14:12, Michael Paquier wrote:

On Tue, Oct 18, 2016 at 4:21 AM, Alvaro Herrera
 wrote:

Merlin Moncure wrote:


We had several good backups since the previous outage so it's not
clear the events are related but after months of smooth operation I
find that coincidence highly suspicious. As always, we need to suspect
hardware problems but I'm highly abstracted from them -- using esx +
san.

Ah, so you're subject not only to hardware flaws but also to
virtualization layer bugs :-)

Wait a couple of more years, and we'll get more complains about
Postgres running in containers running in VMs. Even more fun waiting
ahead.


that started life on different hardware with a different O/S




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v19)

2016-10-04 Thread Gavin Flower

On 04/10/16 20:37, Dean Rasheed wrote:

On 4 October 2016 at 04:25, Michael Paquier  wrote:

OK. A second thing was related to the use of schemas in the new system
catalogs. As mentioned in [1], those could be removed.
[1]: 
https://www.postgresql.org/message-id/cab7npqtu40q5_nsghvomjfbyh1hdtqmbfdj+kwfjspam35b...@mail.gmail.com.


That doesn't work, because if the intention is to be able to one day
support statistics across multiple tables, you can't assume that the
statistics are in the same schema as the table.

In fact, if multi-table statistics are to be allowed in the future, I
think you want to move away from thinking of statistics as depending
on and referring to a single table, and handle them more like views --
i.e, store a pg_node_tree representing the from_clause and add
multiple dependencies at statistics creation time. That was what I was
getting at upthread when I suggested the alternate syntax, and also
answers Tomas' question about how JOIN might one day be supported.

Of course, if we don't think that we will ever support multi-table
statistics, that all goes away, and you may as well make the
statistics name local to the table, but I think that's a bit limiting.
One way or the other, I think this is a question that needs to be
answered now. My vote is to leave expansion room to support
multi-table statistics in the future.

Regards,
Dean


I can see multi-table statistics being useful if one is trying to 
optimise indexes for multiple joins.


Am assuming that the statistics can be accessed by the user as well as 
the planner? (I've only lightly followed this thread, so I might have 
missed, significant relevant details!)



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-10-04 Thread Gavin Flower

On 04/10/16 20:15, Michael Paquier wrote:

On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite  wrote:

Wouldn't pgbench benefit from it?
It was mentioned some time ago [1], in relationship to the
\into construct, how client-server latency was important enough to
justify the use of a "\;" separator between statements, to send them
as a group.

But with the libpq batch API, maybe this could be modernized
with meta-commands like this:
   \startbatch
   ...
   \endbatch

Or just \batch [on|off], which sounds like a damn good idea to me for
some users willing to test some workloads before integrating it in an
application.


+1

'\batch' is a bit easier, to find, & to remember than '\startbatch'



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-09-14 Thread Gavin Flower

On 15/09/16 03:45, Robert Haas wrote:

On Wed, Sep 14, 2016 at 5:22 AM, Thomas Berger  wrote:

Today, i found the time to read all the mails in this thread, and i think i 
have to explain, why we decided to open a bug for this behavior.

Pn Tuesday, 23. August 2016, 13:30:29 Robert Haas wrote:

J. Random User: I'm having a problem!
Mailing List: Gee, how big are your tables?
J. Random User: Here's some pg_size_pretty output.
Mailing List: Gosh, we don't know what that means, what do you have
this obscure GUC set to?
J. Random User: Maybe I'll just give up on SQL and use MongoDB.

In fact, we had just the other way around. One of our most critical databases 
had some extreme bloat.
Some of our internal customers was very confused, about the sizes reported by 
the database.
This confusion has led to wrong decisions. (And a long discussion about the 
choice of DBMS btw)

I think there is a point missing in this whole discussion, or i just didn't see 
it:

Yeah, the behavior of "kB" is defined in the "postgresql.conf" documentation.
But no _user_ reads this. There is no link or hint in the documentation of 
"pg_size_pretty()" [1].

Interesting.  I think that our documentation should only describe the
way we use unit suffixes in one central place, but other places (like
pg_size_pretty) could link to that central place.

I don't believe that there is any general unanimity among users or
developers about the question of which suffixes devote units
denominated in units of 2^10 bytes vs. 10^3 bytes.  About once a year,
somebody makes an argument that we're doing it wrong, but the evidence
that I've seen is very mixed.  So when people say that there is only
one right way to do this and we are not in compliance with that one
right way, I guess I just don't believe it.  Not everybody likes the
way we do it, but I am fairly sure that if we change it, we'll make
some currently-unhappy people happy and some currently-happy people
unhappy.  And the people who don't care but wanted to preserve
backward compatibility will all be in the latter camp.

However, that is not to say that the documentation couldn't be better.

Well, I started programming 1968, and was taught that 1 kilobyte was 
1024 (2^10).


I object to Johny-come-latelies who try and insist it is 1000.

As regards 'kB' vs 'KB', I'm not too worried either way - I think 
consistency is more important



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Gavin Flower

On 12/09/16 12:16, Gavin Flower wrote:
[...]
 two blocks would be logically adjacent (which means they are likely 
to be physically close together, but not guaranteed!).



[...]

Actual disk layouts are quite complicated, the above is an over 
simplification, but the message is still valid.


There are various tricks of disc layout ( & low level handling) that can 
be used to minimise the time taken to read 2 blocks that are logically 
adjacent.  I had to know this stuff for discs that MainFrame computers 
used in the 1980's - modern disk systems are way more complicated, but 
the conclusions are still valid.


I am extremely glad that I no longer have to concern myself with 
understanding the precise low stuff on discs these days - there is no 
longer a one-to-one correspondence of what the O/S thinks is a disk 
block, with how the data is physically recorded on the disc.



Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Gavin Flower

On 12/09/16 10:13, Peter Geoghegan wrote:

On Sun, Sep 11, 2016 at 8:47 AM, Heikki Linnakangas  wrote:

[...]

I don't know what the difference is between accessing 10 pages
randomly, and accessing a random set of 10 single pages sequentially,
in close succession. As Tom would say, that's above my pay grade. I
suppose it comes down to how close "close" actually is (but in any
case, it's all very fudged).


If you select ten pages at random and sort them, then consecutive reads 
of the sorted list are more likely to access pages in the same block or 
closely adjacent (is my understanding).


eg

blocks:            
pages:   0 1   2 3   4 56 78 9

if the ten 'random pages' were selected in the random order:
6 1 7 8 4 2 9 3 0
Consecutive reads would always read new blocks, but the sorted list 
would have blocks read sequentially.


In practice, it would be rarely this simple.  However, if any of the 
random pages where in the same block, then that block would only need to 
be fetched once - similarly if 2 of the random pages where in 
consecutive blocks, then the two blocks would be logically adjacent 
(which means they are likely to be physically close together, but not 
guaranteed!).


[...]


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Long options for pg_ctl waiting

2016-09-07 Thread Gavin Flower

On 08/09/16 09:08, Vik Fearing wrote:

On 09/07/2016 10:41 PM, Alvaro Herrera wrote:

Gavin Flower wrote:


possibly '--nosync' (& any similar) should have a '--no-sync' variation
added, with the '--nosync' variation documented as depreciated?

I agree -- I would go as far as just documenting --no-sync only and
keeping the --nosync one working with minimal (if any) visibility in
docs.

Okay, here's a patch to do that.  I don't think it's the other patch's
job to do it.

I also changed --noclean to --no-clean, and --no-locale was already correct.


Suggest a comment along the lines "Where flags of the form --xxx have a 
negated form, then the preferred negated form is --no-xxx - and that any 
existing use of the form --noxxx should be converted to --no-xxx, as the 
non hyphenated form is now deprecated & will be removed in a future 
version of Postgres."


Possibly generate warningswhen the non hyphenated form is used?

Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Long options for pg_ctl waiting

2016-09-07 Thread Gavin Flower

On 08/09/16 07:31, Robert Haas wrote:

On Sat, Sep 3, 2016 at 7:13 PM, Michael Paquier
 wrote:

On Sun, Sep 4, 2016 at 5:57 AM, Vik Fearing  wrote:

One thing that has been irking me ever since I came to PostgreSQL is the
fact that pg_ctl -w (and -W) don't have longhand equivalents.  I like to
use the long version in scripts and such as extra documentation, and
I've never been able to with these.  What's more, I keep forgetting that
--wait (and --no-wait) aren't a thing.

Trivial patch attached.

Nit: Like --nosync we could use --nowait, without an hyphen.

But is that actually better?  I think that the idea of omitting the
dash here is one of those things that sounds good at first, and then
later you realize that it was actually a dumb idea all along.  If
somebody has an option for --body or --on or --table and has to negate
it by running --nobody or --noon or --notable, some confusion may
result, because in each case you get a word that is not really the
logical inverse of the original option.   Also, if you end up with any
multi-word options, like --save-backup-files, then users wonder why
the opposite, --nosave-backup-files, has a dash between words 2 and 3
and between words 3 and 4, but not between words 1 and 2.  I suggest
we'd do better to standardize on always including a dash in such
cases.


+1

possibly '--nosync' (& any similar) should have a '--no-sync' variation 
added, with the '--nosync' variation documented as depreciated?



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequence data type

2016-09-03 Thread Gavin Flower

On 04/09/16 06:41, Vik Fearing wrote:

On 08/31/2016 06:22 AM, Peter Eisentraut wrote:

Here is a patch that adds the notion of a data type to a sequence.  So
it might be CREATE SEQUENCE foo AS integer.  The types are restricted to
int{2,4,8} as now.

This patch does not apply cleanly to current master (=600dc4c).


Must admit I first thought of: "2, 4, 8, who do we appreciate!"


MORE SERIOUSLY:

Would a possibly future expansion be to include numeric?

Of hand, I can't see any value for using other than integers of 2, 4, & 
8 bytes (not a criticism! - may simply be a failure of imagination on my 
part).


I am curious as to the use cases for other possibilities.


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] System load consideration before spawning parallel workers

2016-09-01 Thread Gavin Flower

On 02/09/16 05:01, Peter Eisentraut wrote:

On 8/16/16 3:39 AM, Haribabu Kommi wrote:

[...]


All of this seems very platform specific, too.  You have
Windows-specific code, but the rest seems very Linux-specific.  The
dstat tool I had never heard of before.  There is stuff with cgroups,
which I don't know how portable they are across different Linux
installations.  Something about Solaris was mentioned.  What about the
rest?  How can we maintain this in the long term?  How do we know that
these facilities actually work correctly and not cause mysterious problems?

[...]
I think that we should not hobble pg in Linux, because of limitations of 
other O/S's like those from Microsoft!


On the safe side, if a feature has insufficient evidence of working in a 
particular O/S, then it should not be default enabled for that O/S.


If a feature is useful in Linux, but not elsewhere: then pg should still 
run in the other O/S's but the documentation should reflect that.



Cheers,.
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] System load consideration before spawning parallel workers

2016-09-01 Thread Gavin Flower

On 02/09/16 04:44, Peter Eisentraut wrote:

On 8/1/16 2:17 AM, Gavin Flower wrote:

Possibly look how make does it with the '-l' flag?

'-l 8' don't start more process when load is 8 or greater, works on
Linux at least...

The problem with that approach is that it takes about a minute for the
load averages figures to be updated, by which time you have already
thrashed your system.

You can try this out by building PostgreSQL this way.  Please save your
work first, because you might have to hard-reboot your system.

Hmm...  I've built several versions of pg this way, without any obvious 
problems!


Looking at top, suggests that the load averages never go much above 8, 
and are usually less.


This is the bash script I use:


#!/bin/bash
# postgresql-build.sh


VERSION='9.5.0'

TAR_FILE="postgresql-$VERSION.tar.bz2"
echo 'TAR_FILE['$TAR_FILE']'
tar xvf $TAR_FILE

PORT='--with-pgport=5433'   std is 5432

BASE_DIR="postgresql-$VERSION"
echo 'BASE_DIR['$BASE_DIR']'
cd $BASE_DIR

PREFIX="--prefix=/usr/local/lib/postgres-$VERSION"
echo 'PREFIX['$PREFIX']'

LANGUAGES='--with-python'
echo 'LANGUAGES['$LANGUAGES']'

SECURITY='--with-openssl --with-pam --with-ldap'
echo 'PREFIX['$PREFIX']'

XML='--with-libxml --with-libxslt'
echo 'SECURITY['$SECURITY']'

TZDATA='--with-system-tzdata=/usr/share/zoneinfo'
echo 'TZDATA['$TZDATA']'

##DEBUG='--enable-debug'
##echo 'DEBUG['$DEBUG']'


./configure $PREFIX $LANGUAGES $SECURITY $XML $TZDATA $DEBUG

time make -j7 -l8 && time make -j7 -l8 check


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-27 Thread Gavin Flower

On 27/08/16 20:33, Michael Paquier wrote:

On Sat, Aug 27, 2016 at 6:34 AM, Andres Freund  wrote:

On 2016-08-26 17:31:14 -0400, Peter Eisentraut wrote:

I agree with all that.  But the subject line is specifically about
moving pg_xlog.  So if your opinion is that we shouldn't move pg_xlog,
then that is noted.  But if we were to move it, we can think about a
good place to move it to.

I think it's probably worth moving pg_xlog, because the benefit also
includes preventing a few users from shooting themselves somewhere
vital. That's imo much less the case for some of the other moves.  But I
still don't think think a largescale reorganization is a good idea,
it'll just stall and nothing will happen.

OK, so let's focus only on the renaming mentioned in $subject. So far
as I can see on this thread, here are the opinions of people who
clearly gave one:
- Rename them, hard break is OK: Michael P, Bruce, Stephen (depends on
David's input),  Magnus
- Rename them, hard break not OK: Fujii-san (perhaps do nothing?)
- Do nothing: Simon (add a README), Tom, Peter E

As far as I can see, there is a consensus to not rename pg_xlog to
pg_journal and avoid using a third meaning, but instead use pg_wal. I
guess that now the other renaming would be pg_clog -> pg_xact. Other
opinions? Forgot you here?


I think if there are going to be things in pg that break software - for 
good reasons, like making future usage easier at the cost an initial 
sharp pain - then to do so in version '10.0.0' is very appropriate!  IMHO


And better to do so in 10.0.0 (especially if closely related), rather 
than 10.1.0 (or whatever the next version after that is named).  So, if 
other things might cause breakages, do so IN 10.0.0 - rather than hold 
back - assuming that there won't be hundreds or more major breakages!!!



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-25 Thread Gavin Flower

On 26/08/16 05:43, Josh Berkus wrote:

On 08/25/2016 01:12 PM, Robert Haas wrote:

I agree that #4 is best. I'm not sure it's worth the cost. I'm not worried

at all about the risk of master/slave sync thing, per previous statement.
But if it does have performance implications, per Andres suggestion, then
making it configurable at initdb time probably comes with a cost that's not
worth paying.

At this point it's hard to judge, because we don't have any idea what
the cost might be.  I guess if we want to pursue this approach,
somebody will have to code it up and benchmark it.  But what I'm
inclined to do for starters is put together a patch to go from 16MB ->
64MB.  Committing that early this cycle will give us time to
reconsider if that turns out to be painful for reasons we haven't
thought of yet.  And give tool authors time to make adjustments, if
any are needed.

The one thing I'd be worried about with the increase in size is folks
using PostgreSQL for very small databases.  If your database is only
30MB or so in size, the increase in size of the WAL will be pretty
significant (+144MB for the base 3 WAL segments).  I'm not sure this is
a real problem which users will notice (in today's scales, 144MB ain't
much), but if it turns out to be, it would be nice to have a way to
switch it back *just for them* without recompiling.


Let such folk use Microsoft Access???  


More seriously:
Surely most such people would be using very old hardware & not likely to 
be upgrading to the most recent version of pg in the near future?  And 
for the ones using modern hardware: either they have enough resources 
not to notice, or very probably will know enough to hunt round for a way 
to reduce the WAL size - I strongly suspect.


Currently, I'm not support pg in any production environment, and using 
it for testing & keeping up-to-date with pg.  So it would affect me - 
however, I have enough resources so it is no problem in practice.




Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-23 Thread Gavin Flower

On 24/08/16 12:02, neha khatri wrote:

>Andres Freund > writes:
>> On 2016-08-22 13:54:43 -0400, Robert Haas wrote:
>> On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane > wrote:

 I'm inclined to suggest you forget this approach and propose a single
 counter for "SQL commands executed", which avoids all of the above
 definitional problems.  People who need more detail than that are
 probably best advised to look to contrib/pg_stat_statements, anyway.

>>> I disagree.  I think SQL commands executed, lumping absolutely
>>> everything together, really isn't much use.

>> I'm inclined to agree. I think that's a quite useful stat when looking
>> at an installation one previously didn't have a lot of interaction 
with.


>Well, let's at least have an "other" category so you can add up the
>counters and get a meaningful total.

How would that meaningful total might help a user. What can a user 
might analyse with the counter in 'other' category.



Neha

The user could then judge if there were a significant number of examples 
not covered in the other categories - this may, or may not, be a 
problem; depending on the use case.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-08-22 Thread Gavin Flower

On 23/08/16 09:40, Andres Freund wrote:

Hi,

as noted in [1] I started hacking on removing the current implementation
of SRFs in the targetlist (tSRFs henceforth). IM discussion brought the
need for a description of the problem, need and approach to light.

There are several reasons for wanting to get rid of tSRFs. The primary
ones in my opinion are that the current behaviour of several SRFs in one
targetlist is confusing, and that the implementation burden currently is
all over the executor.  Especially the latter is what is motivating me
working on this, because it blocks my work on making the executor faster
for queries involving significant amounts of tuples.  Batching is hard
if random places in the querytree can icnrease the number of tuples.

The basic idea, hinted at in several threads, is, at plan time, to convert a 
query like
SELECT generate_series(1, 10);
into
SELECT generate_series FROM ROWS FROM(generate_series(1, 10));

thereby avoiding the complications in the executor (c.f. execQual.c
handling of isDone/ExprMultipleResult and supporting code in many
executor nodes / node->*.ps.ps_TupFromTlist).

There are several design questions along the way:

1) How to deal with the least-common-multiple behaviour of tSRFs. E.g.
=# SELECT generate_series(1, 3), generate_series(1,2);
returning
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   1 │
│   2 │   2 │
│   3 │   1 │
│   1 │   2 │
│   2 │   1 │
│   3 │   2 │
└─┴─┘
(6 rows)
but
=# SELECT generate_series(1, 3), generate_series(5,7);
returning
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   5 │
│   2 │   6 │
│   3 │   7 │
└─┴─┘

discussion in this thread came, according to my reading, to the
conclusion that that behaviour is just confusing and that the ROWS FROM
behaviour of
=# SELECT * FROM ROWS FROM(generate_series(1, 3), generate_series(1,2));
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   1 │
│   2 │   2 │
│   3 │  (null) │
└─┴─┘
(3 rows)

makes more sense.
I had always implicitly assumed that having 2 generated sequences would 
act as equivalent to:


SELECT
sa,
sb
FROM
ROWS FROM(generate_series(1, 3)) AS sa,
ROWS FROM(generate_series(5, 7)) AS sb
ORDER BY
sa,
sb;

 sa | sb
+
  1 |  5
  1 |  6
  1 |  7
  2 |  5
  2 |  6
  2 |  7
  3 |  5
  3 |  6
  3 |  7


Obviously I was wrong - but to me, my implicit assumption makes more sense!
[...]


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-22 Thread Gavin Flower

On 23/08/16 08:27, Tom Lane wrote:

Andres Freund  writes:

On 2016-08-22 13:54:43 -0400, Robert Haas wrote:

On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane  wrote:

I'm inclined to suggest you forget this approach and propose a single
counter for "SQL commands executed", which avoids all of the above
definitional problems.  People who need more detail than that are
probably best advised to look to contrib/pg_stat_statements, anyway.

I disagree.  I think SQL commands executed, lumping absolutely
everything together, really isn't much use.

I'm inclined to agree. I think that's a quite useful stat when looking
at an installation one previously didn't have a lot of interaction with.

Well, let's at least have an "other" category so you can add up the
counters and get a meaningful total.

regards, tom lane


Initially I thought of something I thought was factious, but then 
realized it might actually be both practicable & useful...


How about 2 extra categories (if appropriate!!!):

1. Things that actually might be sort of as fitting in 2 or more of the
   existing categories, or there is an ambiguity as to which category
   is appropriate.

2. Things that don't fit into any existing category

This was inspired by a real use case, in a totally unrelated area - but 
where I attempted to ensure counts were in the most useful categories I 
was able to provide.  The user had listed categories, but I found that 
the data didn't always fit neatly into them as specified.



Cheers,

Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] C++ port of Postgres

2016-08-17 Thread Gavin Flower

On 17/08/16 23:40, Aleksander Alekseev wrote:

I'm sure this wasn't your intent, but the tone of your response is
part of why people don't get involved with Postgres development...

Please note that you're the only person in the entire thread that's
said anything to the effect of a holy war...

OTOH, if the community takes the stance of "WTF WHY DO WE NEED
THIS?!",  we've just driven Joy and anyone else that's a C++ fan away.

I'm sorry for being maybe to emotional. It's was not (and never is!) my
intent to offend anyone. Also I would like to note that I don't speak
for community, I speak for myself.

What I saw was: "hey, lets rewrite PostgreSQL in C++ without any good
reason except (see [1] list)". Naturally I though (and still think) that
you people are just trolls. Or maybe "everything should be written in
C++ because it's the only right language and anyone who thinks
otherwise is wrong" type of fanatics. Thus I don't think you are here to
help.

Give a concrete reason. Like "hey, we rewrote this part of code in C++
and look, its much more readable, twice as fast as code in C (how to do
benchmarks right is a separate good topic!) and it still compiles fast
even on Raspberry Pi, works on all platforms you are supporting, etc".
Or "hey, we solved xid wraparound problem once and for all, but solution
is in C++, so its for you to decide whether to merge it or not".

If you really want to help, just solve _real_ problems using instruments
you prefer instead of reposting obviously holly war topic from general@
mailing list.

[1] https://en.wikipedia.org/wiki/List_of_fallacies

My main language is Java, and there are a lot of very good reasons for 
rewriting Postgres in Java, but I'd never push that - as there are also 
many good reasons for NOT rewriting Postgres in Java!


I am not an expert in C++, but I'm interested in its development and 
growing usage.  I've read enough about C++ to think it worthwhile to 
consider rewriting Postgres in C++.  If I had time, I would get deeper 
into C++, but for now pressures push me towards getting deeper into 
JavaScript - despite having an intense dislike for JavaScript!


The first 2 languages I used commercially were FORTRAN & COBOL back in 
the 70's, and I've been paid to teacht C to experienced programmers.


As far as I am concerned, there is no one language that is perfect.  I 
have written programs i over 20 different languages.


So I did not suggest C++, because I'm a C++ Fanatic!!!


Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-15 Thread Gavin Flower

On 16/08/16 16:19, Andrew Dunstan wrote:



On 08/15/2016 02:23 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 08/15/2016 10:19 AM, Tom Lane wrote:

Andrew Dunstan  writes:
We should probably specify -bext='/', which would cause the backup 
files

to be deleted unless an error occurred.

Really?  That seems a bit magic, and it's certainly undocumented.

We must be using different versions.

Hmm ... I'm using the RHEL6 version, which claims to be v20090616,
which is what pgindent/README says to use.



Ah. I have Fedora 22's v20140711. 2009 seems a bit ancient :-) Anyway, 
what you've done seems fine.


cheers

andrew



A vintage year, but for the best taste, it has to have been to be aged 
on Oak!  :-)




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] System load consideration before spawning parallel workers

2016-08-01 Thread Gavin Flower

On 01/08/16 18:08, Haribabu Kommi wrote:

On Fri, Jul 29, 2016 at 8:48 PM, Amit Kapila  wrote:

On Fri, Jul 29, 2016 at 11:26 AM, Haribabu Kommi
 wrote:

we observed that spawning the specified number of parallel workers for
every query that satisfies for parallelism is sometimes leading to
performance drop compared to improvement during the peak system load
with other processes. Adding more processes to the system is leading
to more context switches thus it reducing the performance of other SQL
operations.


Have you consider to tune using max_worker_processes, basically I
think even if you have kept the moderate value for
max_parallel_workers_per_gather, the number of processes might
increase if total number allowed is much bigger.

Are the total number of parallel workers more than number of
CPU's/cores in the system? If yes, I think that might be one reason
for seeing performance degradation.

Tuning max_worker_processes may work. But the problem here is, During the
peak load test, it is observed that setting parallel is leading to
drop in performance.

The main point here is, even if user set all the configurations properly to use
only the free resources as part of parallel query, in case if a sudden
load increase
can cause some performance problems.


In order to avoid this problem, how about adding some kind of system
load consideration into account before spawning the parallel workers?


Hook could be a possibility, but not sure how users are going to
decide the number of parallel workers, there might be other backends
as well which can consume resources.  I think we might need some form
of throttling w.r.t assignment of parallel workers to avoid system
overload.

There are some utilities and functions that are available to calculate the
current system load, based on the available resources and system load,
the module can allow the number of parallel workers that can start. In my
observation, adding this calculation will add some overhead for simple
queries. Because of this reason, i feel this can be hook function, only for
the users who want it, can be loaded.


Regards,
Hari Babu
Fujitsu Australia



Possibly look how make does it with the '-l' flag?

'-l 8' don't start more process when load is 8 or greater, works on 
Linux at least...



Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-07-09 Thread Gavin Flower

On 10/07/16 12:08, Andrew Gierth wrote:

"Gavin" == Gavin Flower <gavinflo...@archidevsys.co.nz> writes:

  >> How about
  >>
  >> Time: 1234567.666 ms (20m 34.6s)

  Gavin> I like that, but I think the human form should retain the 3
  Gavin> decimal places.

Scale it.

Time: 12.345 ms (0.012345s)

Time: 1234.567 ms (1.235s)

Time: 98765.432 ms (98.8s)

Time: 123456.789 ms (2m 3.5s)

Time: 12345678.666 ms (3h 24m 46s)

  Gavin> In a few years, we may well have enormously multiprocessor
  Gavin> computers with massive very fast permanent 'RAM' where the
  Gavin> entire database is always in memory, so timing to the nearest
  Gavin> microsecond could be useful.

But the original microsecond-resolution value is still right there, so I
don't see the issue.


Sorry misunderstood, thought it was milliseconds!




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \timing interval

2016-07-09 Thread Gavin Flower

On 10/07/16 08:00, Andrew Gierth wrote:

"Tom" == Tom Lane  writes:

  > Peter Eisentraut  writes:
  >> I'm not quite sure what you mean by wanting to do arithmetic on the
  >> numbers.  My phrasing of the problem is that after a long query, you
  >> might get output like this:
  >> Time: 1234567.666 ms
  >> which is pretty useless.

  Tom> What I mean by that is that not infrequently, I'll run the same
  Tom> query several times and then want to average the results.  That's
  Tom> easy with awk or similar scripts as long as the numbers are in
  Tom> straight decimal.

  Tom> I don't mind if we provide a way to print in Babylonian-inspired
  Tom> notation(s) as well, but I'm going to be seriously annoyed if
  Tom> that's the only way to get the output.

How about

Time: 1234567.666 ms (20m 34.6s)

?


I like that, but I think the human form should retain the 3 decimal places.

In a few years, we may well have enormously multiprocessor computers 
with massive very fast permanent 'RAM' where the entire database is 
always in memory, so timing to the nearest microsecond could be useful.  
Obviously microsecond precision would be silly now, and would probably 
never warrant being the default (I'd be happy to be proved wrong here!), 
but it might be worthwhile putting in as an option - while people are 
looking at the relevant areas of the code.


Am inspired by the man page for 'ls': [...] The  SIZE  argument is  an  
integer and optional unit (example: 10K is 10*1024). Units are 
K,M,G,T,P,E,Z,Y  (powers  of  1024) [...]" Obviously learnt from the 
lessons of "640KB should be enough for everyone" stupidity!



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Gavin Flower

Please see comment at the bottom of this post.

On 08/07/16 10:48, Pete Stevenson wrote:
Good info, thanks for the note. Agreed that it is difficult to pull 
things apart to isolate these features for offload — so actually 
running experiments with offload is not possible, as you point out 
(and for other reasons).


Maybe I could figure out the lines of code that add versions into a 
table and then those that collect old versions (they do get collected, 
right?). Anyway, thought being I could profile while running TPC-C or 
similar. I was hoping that someone might be able to jump on this with 
a response that they already did something similar. I know that 
Stonebraker has done some analysis along these lines, but I’m looking 
for an independent result that confirms (or not) his work.


Thank you,
Pete Stevenson


On Jul 7, 2016, at 3:43 PM, Simon Riggs > wrote:


On 7 July 2016 at 20:50, Pete Stevenson > wrote:


Hi Simon -

Thanks for the note. I think it's fair to say that I didn't
provide enough context, so let me try and elaborate on my question.

I agree, MVCC is a benefit. The research angle here is about
enabling MVCC with hardware offload. Since I didn't explicitly
mention it, the offload I refer to will respect all consistency
guarantees also.

It is the case that for the database to implement MVCC it must
provide consistent read to multiple different versions of data,
i.e. depending on the version used at transaction start. I'm not
an expert on postgresql internals, but this must have some cost.
I think the cost related to MVCC guarantees can roughly be
categorized as: creating new versions (linking them in), version
checking on read, garbage collecting old versions, and then there
is an additional cost that I am interested in (again not claiming
it is unnecessary in any sense) but there is a cost to generating
the log.

Thanks, by the way, for the warning about lab vs. reality. That's
why I'm asking this question here. I want to keep the
hypothetical tagged as such, but find defensible and realistic
metrics where those exist, i.e. in this instance, we do have a
database that can use MVCC. It should be possible to figure out
how much work goes into maintaining that property.


PostgreSQL uses a no overwrite storage mechanism, so any additional 
row versions are maintained in the same table alongside other rows. 
The MVCC actions are mostly mixed in with other aspects of the 
storage, so not isolated much for offload.


Oracle has a different mechanism that does isolate changed row 
versions into a separate data structure, so would be much more 
amenable to offload than PostgreSQL, in its current form.


Maybe look at SLRUs (clog etc) as a place to offload something?

--
Simon Riggs http://www.2ndQuadrant.com/ 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


In this list, the convention is to post replies at the end (with some 
rare exceptions), or interspersed when appropriate, and to omit parts no 
longer relevant.


The motivation of bottom posting like this: is that people get to see 
the context before the reply, AND emails don't end up getting longer & 
longer as people reply at the beginning forgetting to trim the now 
irrelevant stuff at the end.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] to_date_valid()

2016-07-03 Thread Gavin Flower

On 04/07/16 15:19, Pavel Stehule wrote:



2016-07-04 4:25 GMT+02:00 Craig Ringer >:


On 3 July 2016 at 09:32, Euler Taveira > wrote:

On 02-07-2016 22 :04, Andreas 'ads'
Scherbaum wrote:
> The attached patch adds a new function "to_date_valid()"
which will
> validate the date and return an error if the input and
output date do
> not match. Tests included, documentation update as well.
>
Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could
default to
false to not break compatibility.


because


   SELECT to_date('blah', 'pattern', true)

is less clear to read than

   SELECT to_date_valid('blah', 'pattern')

and offers no advantage. It's likely faster to use a separate
function too.


personally I prefer first variant - this is same function with 
stronger check.


The name to_date_valid sounds little bit strange - maybe 
to_date_strict should be better.


Regards

Pavel

-- 
 Craig Ringer http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services



Yeah, my feeling too, is that 'to_date_strict' would be better!


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-06-29 Thread Gavin Flower

On 21/06/16 03:53, Mark Dilger wrote:

On Jun 18, 2016, at 5:48 PM, Josh Berkus  wrote:

On 06/16/2016 11:01 PM, Craig Ringer wrote:

I thought about raising this, but I think in the end it's replacing one
confusing and weird versioning scheme for another confusing and weird
versioning scheme.

It does have the advantage that that compare a two-part major like
090401 vs 090402 won't be confused when they compare 100100 and 100200,
since it'll be 11 and 12. So it's more backward-compatible. But
ugly.


Realistically, though, we're more likely to end up with 10.0.1 than
10.1.  I don't think we're anywhere near plumbing the depths of the
stuff which will break because folks are parsing our version numbers
with regexes.  In more major software, this will break nagios
check_postgres.

Having a 3 part versioning scheme where the middle portion is always
zero makes the least sense to me of any of the proposals.  If we're going
to have the pain and hurting of moving to a 2 part versioning scheme,
and breaking nagios and what not, then lets just get on with it.  If we're
going to keep all three parts, can we please use my proposal earlier in
this thread where A.B.C are allocated for:

C++:  bug fixes only
B++:  new features added, but still backward compatible with prior version
A++:  new features added, not backward compatible, pg_upgrade required

If every new feature release ends up requiring pg_upgrade, then B will
always be zero, which is no worse than your proposal.  But at least users can
refer to part B to learn something useful, namely whether they will need to
run pg_upgrade as part of upgrading their existing cluster.

This has the advantage that new minor features, like adding a new guc, can
be released sooner than the next major release, but does not have to be
released in disguise as if it were a bug fix.

This is not a plea for keeping the three part versioning system.  It's just
a plea not to have a 2 part versioning system masquerading as a three
part versioning system, or vice versa.

mark



I agree with this!

I hate the rampant inflation associated with numbering schemes like 
FireFox - the numbers of no meaning at all, other than something 
non-trivial has been changed, with no indication at all about how 
non-trivial!




Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-24 Thread Gavin Flower

On 25/06/16 08:33, Robert Haas wrote:

On Fri, Jun 24, 2016 at 12:26 PM, Steve Crawford
 wrote:

My observation has been that the PostgreSQL development group aims for
correctness and the elimination of surprising results. This was part of the
reason to eliminate a number of automatic casts to dates in earlier
versions.

To me, 2016-02-30 is an invalid date that should generate an error.
Automatically and silently changing it to be 2016-03-01 strikes me as a
behavior I'd expect from a certain other open-source database, not
PostgreSQL.

I don't particularly disagree with that, but on the other hand, as
mentioned earlier, to_timestamp() is here for Oracle compatibility,
and if it doesn't do what Oracle's function does, then (1) it's not
useful for people migrating from Oracle and (2) we're making up the
behavior out of whole cloth.  I think things that we invent ourselves
should reject stuff like this, but in a compatibility function we
might want to, say, have compatibility.


How about a to_timestamp_strict(), in addition?

Its very existence will help people (who bother to read the 
documentation) to look more closely at the differences between the 
definitions, and allow them to choose the most appropriate for their use 
case.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does people favor to have matrix data type?

2016-05-30 Thread Gavin Flower

On 31/05/16 12:01, Kouhei Kaigai wrote:

On 05/29/2016 04:55 PM, Kouhei Kaigai wrote:

For the closer integration, it may be valuable if PL/R and PL/CUDA can exchange
the data structure with no serialization/de-serialization when PL/R code tries
to call SQL functions.

I had been thinking about something similar. Maybe PL/R can create an
extension within the R environment that wraps PL/CUDA directly or at the
least provides a way to use a fast-path call. We should probably try to
start out with one common use case to see how it might work and how much
benefit there might be.


My thought is the second option above. If SPI interface supports fast-path
like 'F' protocol, it may become a natural way for other PLs also to
integrate SQL functions in other languages.


IIUC, pg.spi.exec("SELECT my_function(...)") is the only way to call SQL 
functions

inside PL/R scripts.

Correct (currently).

BTW, this is starting to drift off topic I think -- perhaps we should
continue off list?


Some elements are common for PostgreSQL (matrix data type and fastpath SPI
interface). I like to keep the discussion on the list.
Regarding to the PoC on a particular use case, it might be an off-list
discussion.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

Possibly there should be two matrix types in Postgres: the first would 
be the default and optimized for small dense matrices, the second would 
store large sparse matrices efficiently in memory at the expensive of 
speed (possibly with one or more parameters relating to how sparse it is 
likely to be?) - for appropriate definitions 'small' & 'large', though 
memory savings for the latter type might not kick in unless the matrices 
are big enough (so a small sparse matrix might consume more memory than 
a nominally larger dense matrix type & a sparse matrix might have to be 
sufficiently sparse to make real memory savings at any size).


Probably good to think of 2 types at the start, even if the only the 
first is implemented initially.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add EXPLAIN (ALL) shorthand

2016-05-19 Thread Gavin Flower

On 20/05/16 10:11, David G. Johnston wrote:

[...]


EXPAIN ABCTV (might need permission to document this variation though)
What has an Australian Broadcast Corporation Television got to do with 
this???  :-)


[...]


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-17 Thread Gavin Flower

On 18/05/16 09:34, Vik Fearing wrote:

On 17/05/16 21:32, Alvaro Herrera wrote:

Is SCAN_ALL really the best we can do here?  The business of having an
underscore in an option name has no precedent (other than
CURRENT_DATABASE and the like).

ALTER DATABASE has options for ALLOW_CONNECTIONS, CONNECTION_LIMIT, and
IS_TEMPLATE.


How about COMPLETE, TOTAL, or WHOLE?

Sure, I'll play this game.  I like EXHAUSTIVE.


I prefer 'WHOLE', as it seems more obvious (and not because of the pun 
relating to 'wholesomeness'!!!)




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-05-13 Thread Gavin Flower

On 14/05/16 09:31, David G. Johnston wrote:
On Friday, May 13, 2016, Tom Lane > wrote:


Robert Haas > writes:
> On Fri, May 13, 2016 at 2:49 PM, Tom Lane > wrote:

> If we don't want to stick with the current practice of debating when
> to bump the same digit, then let's agree that 10.0 will follow
9.6 and
> after that we'll bump the first digit after X.4, as we did with 7.X
> and 8.X.

It was absolute, utter chance that the 7.x and 8.x series both ended
with .4.  I see no reason to turn that into some kind of standard,
especially when it didn't work like that for 6.x or 9.x.


The underlying premise, for me, of choosing .4 or .5  was that 
presently we discontinue support after 5 years/releases.  A new .0 
would come out just as we discontinue the previous .0


As an added idea, if switching to major.patch, would be to make 10.0.x 
but then go to 11.x. Though that's somewhat easier cognitively it 
would probably be harder for developers that just ripping off the bandaid.


David J.

I think the idea of incrementing the first digit when ever the last 
major version reaches EOL, is probably the best compromise. If that was 
adopted, then people could plan changes that triggered major breakage in 
established usage accordingly.


I like the version 'x.y.z' approach, where 'z' mostly means bug fixes 
and minor tweaks, 'y' has more significant enhancements, and 'x' may 
mean major breakages.  Lets not go down the Mozilla version inflation route.


For now, probably best if we have 9.5.n, 9.6.n, 10.0.n (where obviously 
n starts at zero for each series!).



Cheers,

Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Gavin Flower

On 12/05/16 02:20, Bruce Momjian wrote:

I am giving a keynote at an IEEE database conference in Helsinki next
week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
because I accepted the Helsinki conference invitation before the PGCon
Ottawa date was changed from June to May).

As part of the keynote, I would like to mention areas where academia can
help us.  The topics I can think of are:

Query optimization
Optimizer statistics
Indexing structures
Reducing function call overhead
CPU locality
Sorting
Parallelism
Sharding

Any others?


optimization of performance under very heavy loads
ranging from almost all reads to almost all writes/updates, & other usage 
profiles
single box, and multiple boxen

large numbers of CPU's
most efficient use of SSD's
best use of insanely large amounts of RAM

optimization of handling arrays & JSON structures


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's back-branch releases

2016-05-08 Thread Gavin Flower

On 09/05/16 10:22, Tom Lane wrote:

Gavin Flower <gavinflo...@archidevsys.co.nz> writes:

On 09/05/16 08:56, Tom Lane wrote:

Hmm, "which see" is perfectly good English to my knowledge, and I'm not
sure that other possible ways of wording this would be less awkward.

To me the phrase "which see" is plain weird, at least in this context!
Is this some American usage I've not heard on TV nor films???

Don't think so.  AFAIK it's a translation of the Latin "q.v." (quod vide),
and is used in more or less the same way.  It's not hard to find examples
by googling.

regards, tom lane


Well I've come across many examples of examples of bad grammar, so 
finding an example of  usage in Google is not proof the usage is valid!


Even at best, it doesn't flow and is awkward.


Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's back-branch releases

2016-05-08 Thread Gavin Flower

On 09/05/16 08:56, Tom Lane wrote:

[ I think you meant to attach this to the other thread, but anyway... ]

"David G. Johnston"  writes:

"...replacement_sort_tuples, which see for further details." needs
rewording.

Hmm, "which see" is perfectly good English to my knowledge, and I'm not
sure that other possible ways of wording this would be less awkward.


[...]

To me the phrase "which see" is plain weird, at least in this context!  
Is this some American usage I've not heard on TV nor films???


English is my first language, I was born in England and now reside in 
New Zealand.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Naming of new tsvector functions

2016-05-05 Thread Gavin Flower

On 06/05/16 07:44, Tom Lane wrote:

Stas Kelvich  writes:

On 04 May 2016, at 20:15, Tom Lane  wrote:
Also, I'd supposed that we'd rename to tsvector_something, since
the same patch also introduced tsvector_to_array() and
array_to_tsvector().  What's the motivation for using ts_ as the
prefix?

There is already several functions named ts_* (ts_rank, ts_headline, ts_rewrite)
and two named starting from tsvector_* (tsvector_update_trigger, 
tsvector_update_trigger_column).
Personally I’d prefer ts_ over tsvector_ since it is shorter, and still keeps 
semantics.

Yeah, I see we're already a bit inconsistent here.  The problem with using
a ts_ prefix, to my mind, is that it offers no option for distinguishing
tsvector from tsquery, should you need to do that.  Maybe this isn't a
problem for functions that have tsvector as input.

regards, tom lane



use tsv_ and tsq_?


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Naming of new tsvector functions

2016-05-05 Thread Gavin Flower

On 05/05/16 21:20, Stas Kelvich wrote:

On 04 May 2016, at 20:15, Tom Lane  wrote:

Stas Kelvich  writes:

On 04 May 2016, at 16:58, Tom Lane  wrote:
The other ones are not so problematic because they do not conflict with
SQL keywords.  It's only delete() and filter() that scare me.

Okay, so changed functions to ts_setweight, ts_delete, ts_unnest, ts_filter.

Somehow, I don't think you read what I wrote.

Renaming the pre-existing setweight() function to ts_setweight() is
not going to happen; it's been like that for half a dozen years now.
It would make no sense to call the new variant ts_setweight() while
keeping setweight() for the existing function, either.

Oh, I accidentally renamed one of the old functions, my mistake.


I also don't see that much point in ts_unnest(), since unnest()
in our implementation is a function not a keyword.  I don't have
a strong opinion about that one, though.

Just to keep some level of uniformity in function names. But also i’m
not insisting.


Also, I'd supposed that we'd rename to tsvector_something, since
the same patch also introduced tsvector_to_array() and
array_to_tsvector().  What's the motivation for using ts_ as the
prefix?

There is already several functions named ts_* (ts_rank, ts_headline, ts_rewrite)
and two named starting from tsvector_* (tsvector_update_trigger, 
tsvector_update_trigger_column).

Personally I’d prefer ts_ over tsvector_ since it is shorter, and still keeps 
semantics.


regards, tom lane


I've not been involved in doing any tsvector stuff, nor likely to in the 
near future - but if i was, I think I'd find simpler to get into if 
tsvector specific functions followed a common pattern of naming, like 
Stas is suggesting.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-04-25 Thread Gavin Flower

On 26/04/16 06:38, Joshua D. Drake wrote:

On 04/25/2016 09:04 AM, Robert Haas wrote:
On Mon, Apr 25, 2016 at 11:27 AM, Joshua D. Drake 
 wrote:

max_parallel_nodes


I hope you are trolling me.


Actually I wasn't. It is usually a little more obvious when I troll, 
subtlety is not exactly my strong suit ;)



[...]

I've been reading the pg mailings lists for over ten years...

Just when I've noticed tempers flaring, and a flame war starting - it is 
already over!  In essence, I've never seen any genuine trolling or 
flamewars - people are just too friendly & cooperative.  Though I have 
seen strong disagreements, people are very careful to keep the tone from 
getting too heated, yet still manage to show their passion.



Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-23 Thread Gavin Flower

On 22/04/16 17:36, Amit Kapila wrote:
On Fri, Apr 22, 2016 at 1:31 AM, Gavin Flower 
<gavinflo...@archidevsys.co.nz <mailto:gavinflo...@archidevsys.co.nz>> 
wrote:


On 22/04/16 06:07, Robert Haas wrote:

On Thu, Apr 21, 2016 at 1:48 PM, Tom Lane <t...@sss.pgh.pa.us
<mailto:t...@sss.pgh.pa.us>> wrote:

Robert Haas <robertmh...@gmail.com
<mailto:robertmh...@gmail.com>> writes:

On Wed, Apr 20, 2016 at 2:28 PM, Tom Lane
<t...@sss.pgh.pa.us <mailto:t...@sss.pgh.pa.us>> wrote:

Andres Freund <and...@anarazel.de
<mailto:and...@anarazel.de>> writes:

max_parallel_degree currently defaults to 0. 
I think we should enable

it by default for at least the beta period.
Otherwise we're primarily
going to get reports back after the release.

So, I suggest that the only sensible non-zero values
here are probably
"1" or "2", given a default pool of 8 worker processes
system-wide.
Andres told me yesterday he'd vote for "2".  Any other
opinions?

It has to be at least 2 for beta purposes, else you are
not testing
situations with more than one worker process at all, which
would be
rather a large omission no?

That's what Andres, thought, too.  From my point of view, the big
thing is to be using workers at all.  It is of course possible
that
there could be some bugs where a single worker is not enough, but
there's a lot of types of bug where even one worker would probably
find the problem.  But I'm OK with changing the default to 2.

I'm curious.

Why not 4?


IIUC, the idea to change max_parallel_degree for beta is to catch any 
bugs in parallelism code, not to do any performance testing of same.  
So, I think either 1 or 2 should be sufficient to hit the bugs if 
there are any.  Do you have any reason to think that we might miss 
some category of bugs if we don't use higher max_parallel_degree?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
No.  Just felt that 4 would not be too great for the type of processor 
chips used on servers to handle.


For complications, such as race conditions and implied logical 
assumptions - I tend to think of 0, 1, 2, 3, many.


Essentially just a gut feeling that 4 might reveal more corner cases.


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-23 Thread Gavin Flower

On 23/04/16 00:56, Robert Haas wrote:

On Thu, Apr 21, 2016 at 7:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

Robert Haas <robertmh...@gmail.com> writes:

On Thu, Apr 21, 2016 at 4:01 PM, Gavin Flower
<gavinflo...@archidevsys.co.nz> wrote:

Why not 4?  As most processors now have at least 4 physical cores, & surely
it be more likely to flush out race conditions.

Because if we did that, then it's extremely likely that people would
end up writing queries that are faster only if workers are present,
and then not get any workers.

Is that because max_worker_processes is only 8 by default?  Maybe we
need to raise that, at least for beta purposes?

I'm not really in favor of that.  I mean, almost all of our default
settings are optimized for running PostgreSQL on, for example, a
Raspberry Pi 2, so it would seem odd to suddenly swing the other
direction and assume that there are more than 8 unused CPU cores.  It
doesn't make sense to me to roll out settings in beta that we wouldn't
be willing to release with if they work out.  That's why, honestly, I
would prefer max_parallel_degree=1, which I think would be practical
for many real-world deployments.  max_parallel_degree=2 is OK.  Beyond
that, we're just setting people up to fail, I think.  Higher settings
should probably only be used on substantial hardware, and not
everybody has that.

If Java can find out how many processors there are available to it, 
since JDK1.4, then surely PostgreSQL can do the same?


So how about the default being half the available processors rounded up 
to the nearest integer?


Perhaps the GUC for workers should be a percentage of the available 
processors, with the minimum & maximum workers optionally specified - or 
something of that nature?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-04-21 Thread Gavin Flower

On 22/04/16 06:07, Robert Haas wrote:

On Thu, Apr 21, 2016 at 1:48 PM, Tom Lane  wrote:

Robert Haas  writes:

On Wed, Apr 20, 2016 at 2:28 PM, Tom Lane  wrote:

Andres Freund  writes:

max_parallel_degree currently defaults to 0.  I think we should enable
it by default for at least the beta period. Otherwise we're primarily
going to get reports back after the release.

So, I suggest that the only sensible non-zero values here are probably
"1" or "2", given a default pool of 8 worker processes system-wide.
Andres told me yesterday he'd vote for "2".  Any other opinions?

It has to be at least 2 for beta purposes, else you are not testing
situations with more than one worker process at all, which would be
rather a large omission no?

That's what Andres, thought, too.  From my point of view, the big
thing is to be using workers at all.  It is of course possible that
there could be some bugs where a single worker is not enough, but
there's a lot of types of bug where even one worker would probably
find the problem.  But I'm OK with changing the default to 2.


I'm curious.

Why not 4?  As most processors now have at least 4 physical cores, & 
surely it be more likely to flush out race conditions.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2016-03 Commitfest

2016-04-08 Thread Gavin Flower

On 09/04/16 02:58, Magnus Hagander wrote:
On Fri, Apr 8, 2016 at 4:56 PM, Robert Haas > wrote:


On Fri, Apr 8, 2016 at 10:06 AM, David Steele > wrote:
> On 4/8/16 10:00 AM, Tom Lane wrote:
>> David Steele >
writes:
>>> So the commitfest is 84% complete with less than twelve hours
to go.
>>
>> Have we set a particular time-of-day for closing the CF, and if so
>> what is it exactly?
>
> From the referenced email:
>
> "Accordingly, the release management has decided that all
> feature patches destined for PostgreSQL 9.6 must be committed no
later
> than April 8, 2016.  Any patch not committed prior to 2016-04-09
> 00:00:00 GMT may not be committed to PostgreSQL 9.6 unless (a)
it is a
> bug fix, (b) it represents essential cleanup of a
previously-committed
> patch, or (c) the release management team has approved an
extension to
> the deadline for that particular patch."

IOW, the deadline is 8pm US/Eastern time, or about 9 hours from now.

Let's try not to introduce more bugs in the next 9 hours than we have
in the preceding 9 months.


What? Where's the fun in that?!

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
Yes, even worse Robert might attract the ire of the BLL (Bug Liberation 
League) - and then where would we be???





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alter or rename enum value

2016-03-25 Thread Gavin Flower

On 26/03/16 08:17, Jim Nasby wrote:

On 3/24/16 10:27 PM, Tom Lane wrote:

It's conceivable that we could do something like adding an "isdead"
column to pg_enum and making enum_in reject new values that're marked
isdead.  But I can't see that we'd ever be able to support true
removal of an enum value at reasonable cost.  And I'm not really sure
where the use-case argument is for working hard on it.


I wonder if we could handle this by allowing foreign keys on enum 
columns back to pg_enum. Presumably that means we'd have to treat 
pg_enum as a regular table and not a catalog table. Due to locking 
concerns I don't think we'd want to put the FKs in place by default 
either.


I've certainly heard people avoiding ENUMs because of their 
limitations, so it'd be nice if there was a way to lift them.

Well, I use Enums extensively in Java.

However, I totally avoid using ENUMs in pg, due to their inflexibility!


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make primnodes.h gender neutral

2016-03-19 Thread Gavin Flower

On 18/03/16 09:41, Joshua D. Drake wrote:

On 03/17/2016 01:36 PM, Alvaro Herrera wrote:


[...]




(*) I'm probably going to be expelled from the project for saying this,
but I very much doubt that female coders stay away from PostgreSQL just
because some files say "he" in comments rather than "she" or "he or she"
or "one" or "they".  They probably have different reasons for staying
away from the project.


Wanna bet? There is a very loud movement about this. We can either:

A. Start fighting about it

B. Just fix it, it doesn't matter anyway and it doesn't hurt the 
quality of the code or the documentation


JD


I strongly think that gender should not be mentioned unless it is 
relevant - as constructs like 'he or she' are clumsy and distract from 
what is being commented on, not to mention that some rare people are: 
neither, both, or ambiguous (from research I did when I read a rather 
curious article).


Other than 'one', 'their', 'they', &' them' - there are role specific 
references like 'user', 'developer', & 'DBA', ... that can be used where 
appropriate.


I tend to prefer the term 'Gender Appropriate' rather than 'Gender 
Neutral', as sometimes mentioning gender IS very relevant!




Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-12 Thread Gavin Flower

On 12/03/16 23:27, Michael Paquier wrote:

On Fri, Mar 11, 2016 at 5:28 AM, Craig Ringer  wrote:

On 11 March 2016 at 03:07, Igal @ Lucee.org  wrote:

I noticed that you usually don't put html in the emails here, but I think that 
it's appropriate here to show the information in a clear way (also, according 
to my computer it's 2016).

Pretty sure we have at least one person here using mailreader software that's 
old enough to vote in most countries, but I tend to share the sentiment. At 
least when there's actually a functional reason like this :)

This one made me smile. Extracting a quote from another project's
slogan: All mail clients s**k. Some of them just s**k less.


s**k == sulk???



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Gavin Flower

On 11/03/16 08:48, Igal @ Lucee.org wrote:

On 3/10/2016 11:44 AM, Robert Haas wrote:
On Thu, Mar 10, 2016 at 2:35 PM, Simon Riggs  
wrote:

But I still don't know "meh" means.

Maybe this helps?

https://en.wikipedia.org/wiki/Meh


LOL
https://en.wikipedia.org/wiki/LOL



I'm pretending to work, so will you and Robert stop making that pretence 
more difficult!!!  :-)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2016-03 Commitfest Manager

2016-03-01 Thread Gavin Flower

On 02/03/16 17:01, Michael Paquier wrote:

On Wed, Mar 2, 2016 at 11:25 AM, David Steele  wrote:

Agreed.  I see you created the new CF so no reason to keep it open.

OK. Done. May the force to manage all those patches be with you, manager.

May the Source be with you Luke^H^H^H^HMichael!


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Figures in docs

2016-02-17 Thread Gavin Flower

On 18/02/16 10:38, Peter Eisentraut wrote:

On 2/16/16 8:17 PM, Tatsuo Ishii wrote:

It seems there's no figures/diagrams in our docs. I vaguely recall that
we used to have a few diagrams in our docs. If so, was there any
technical reason to remove them?

Because no one has been able to propose a good format for storing and
editing pictures.




SVG?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Releasing in September

2016-01-20 Thread Gavin Flower

On 21/01/16 06:40, Tom Lane wrote:

Magnus Hagander  writes:

That's pretty much what I suggested :)
Except that we need to do it for the last one as well. With the only
exception that the last one might be a bit longer. But the fact is that the
recent of CFs *and* releases, we've taken the approach of closing the CF
when everything in it is done or explicitly reviewed-and-bumped, and tried
really really hard not to bump things because nobody had time to look at
them. That's what I'm suggesting we change, and actually just cut them.
Yes, one of the reasons for the CFs was to allow people a fair chance to
get reviewed.But given that there isn't actually a deadline in practice
doesn't help with that.

Yeah.  It's certainly unfair if someone's patch doesn't get reviewed,
but there are only 24 hours in a day, and we have a limited pool of
reviewer and committer manpower.  I think we just have to say that
sometimes life is unfair.

I think it might also be a good idea if we could somehow distinguish
"nobody had time for that yet" from "nobody is interested", with an eye
to flat-out rejecting patches that no one cares enough about to review.
Maybe we could reduce the workload by inventing some kind of up-front
filter whereby a patch isn't even a candidate to get reviewed unless, say,
at least two people besides the author say "this sounds like it's worth
pursuing".

One of the other things I do not like about the current CF process is that
it's created a default assumption that most submitted patches should get
accepted eventually.  It is *very* hard to reject a patch altogether in
the CF process: you more or less have to show cause why it should not get
accepted.  That default is backwards.  Maybe this isn't the process' fault
exactly, but it sure seems like that's how we approach patches now.

regards, tom lane


Possibly the emphasis should be on what patches should be ACCEPTED, 
rather than on what patches should be REJECTED?


Then there is less stigma in a patch missing out, and people don't have 
justify rejecting patches.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2016-01-19 Thread Gavin Flower

On 12/12/14 05:53, Heikki Linnakangas wrote:

On 10/13/2014 01:00 AM, Tomas Vondra wrote:

Hi,

attached is a WIP patch implementing multivariate statistics.


Great! Really glad to see you working on this.


+ * FIXME This sample sizing is mostly OK when computing stats for
+ *   individual columns, but when computing multi-variate stats
+ *   for multivariate stats (histograms, mcv, ...) it's rather
+ *   insufficient. For small number of dimensions it works, but
+ *   for complex stats it'd be nice use sample proportional to
+ *   the table (say, 0.5% - 1%) instead of a fixed size.


I don't think a fraction of the table is appropriate. As long as the 
sample is random, the accuracy of a sample doesn't depend much on the 
size of the population. For example, if you sample 1,000 rows from a 
table with 100,000 rows, or 1000 rows from a table with 100,000,000 
rows, the accuracy is pretty much the same. That doesn't change when 
you go from a single variable to multiple variables.


You do need a bigger sample with multiple variables, however. My gut 
feeling is that if you sample N rows for a single variable, with two 
variables you need to sample N^2 rows to get the same accuracy. But 
it's not proportional to the table size. (I have no proof for that, 
but I'm sure there is literature on this.)

[...]

I did stage III statistics at University many moons ago...

The accuracy of the sample only depends on the value of N, not the total 
size of the population, with the obvious constraint that N <= population 
size.


The standard deviation in a random sample is proportional to the square 
root of N.  So using N = 100 would have a standard deviation of about 
10%, so to reduce it to 5% you would need N = 400.


For multiple variables, it will also be a function of N - I don't recall 
precisely how, I suspect it might M * N were M is the number of 
parameters (but I'm not as certain).  I think M^N might be needed if you 
want all the possible correlations between sets of variable to be 
reasonably significant - but I'm mostly just guessing here.


So using a % of table size is somewhat silly, looking at the above. 
However, if you want to detect frequencies that occur at the 1% level, 
then you will need to sample 1% of the table or greater.  So which 
approach is 'best', depends on what you are trying to determine. The 
sample size is more useful when you need to decide between 2 different 
hypothesises.


The sampling methodology, is far more important than the ratio of N to 
population size - consider the bias imposed by using random telephone 
numbers, even before the event of mobile phones!



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Truncation of identifiers

2016-01-13 Thread Gavin Flower

On 14/01/16 13:05, Tom Lane wrote:

Thomas Munro  writes:

Wouldn't it be better to raise an error when identifiers are too long,
rather than accepting but truncating them?

I wouldn't think so.


I'm not aware of any other database that does this.

It's standard practice in most programming languages AFAIK.  And SQL is
surely a programming language.


If you're using oversized identifiers you
could finish up using more than one way to refer to the same database
object, and then your queries will have a different meaning if
NAMEDATALEN ever changes.

No, they'd just start failing if they didn't match the object (which
there can be only one of, else you'd have gotten other errors).

Another argument against comes from the fact that NAMEDATALEN is usually
less than what SQL says is the minimum required length (viz, 128
characters).  Your proposal would have us throwing entirely needless
errors on queries that are fully spec-conformant.

regards, tom lane



I would prefer a database to be more strict.

How about a GUC to control this behaviour, with the default to be lax?


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest status for 2015-11

2015-12-24 Thread Gavin Flower

On 25/12/15 01:45, Michael Paquier wrote:
[...]

And the CF is no closed. Here is the final score:
Committed: 30.
Moved to next CF: 42.
Rejected: 9.
Returned with Feedback: 22.
Total: 103.
Regards,


You didn't say how may regards...


[More seriously]
Many thanks to you, and the other Postgres developers, for all your hard 
work & dedication, much appreciated!




Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Gavin Flower

On 08/12/15 05:27, David G. Johnston wrote:
On Mon, Dec 7, 2015 at 8:35 AM, Jim Nasby >wrote:


On 12/6/15 10:38 AM, Tom Lane wrote:

I said "in most cases".  You can find example cases to support
almost any
weird planner optimization no matter how expensive and
single-purpose;
but that is the wrong way to think about it.  What you have to
think about
is average cases, and in particular, not putting a drag on
planning time
in cases where no benefit ensues.  We're not committing any
patches that
give one uncommon case an 1100X speedup by penalizing every
other query 10%,
or even 1%; especially not when there may be other ways to fix it.


This is a problem that seriously hurts Postgres in data
warehousing applications. We can't keep ignoring optimizations
that provide even as little as 10% execution improvements for 10x
worse planner performance, because in a warehouse it's next to
impossible for planning time to matter.

Obviously it'd be great if there was a fast, easy way to figure
out whether a query would be expensive enough to go the whole 9
yards on planning it but at this point I suspect a simple GUC
would be a big improvement.


Something like "enable_equivalencefilters" but that defaults to false 
unlike every one existing "enable_*" GUC?


​It would be a lot more user-friendly to have something along the 
lines of "planner_mode (text)" with labels like "star, transactional, 
bulk_load, etc..." because I suspect there are other things we'd want 
to add if we start identifying queries by their type/usage and 
optimize accordingly.  Having the knobs available is necessary but 
putting on a façade would make the user more straight-forward for the 
common cases.


David J.


How about:

planning_time_base 10  # Default effort, may be increased or decreased 
as required - must be at least 1
planning_time_  0  # By default, planner makes no (or minimal) 
effort to optimise for feature 


So for some people, adjusting planning_time_base may be sufficient - but 
for more specialised cases, people can tell the planner to consider 
expending more effort.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-07 Thread Gavin Flower

On 08/12/15 08:34, Evgeniy Shishkin wrote:

On 07 Dec 2015, at 22:27, Gavin Flower <gavinflo...@archidevsys.co.nz> wrote:

On 08/12/15 05:27, David G. Johnston wrote:

On Mon, Dec 7, 2015 at 8:35 AM, Jim Nasby <jim.na...@bluetreble.com 
<mailto:jim.na...@bluetreble.com>>wrote:

On 12/6/15 10:38 AM, Tom Lane wrote:

I said "in most cases".  You can find example cases to support
almost any
weird planner optimization no matter how expensive and
single-purpose;
but that is the wrong way to think about it.  What you have to
think about
is average cases, and in particular, not putting a drag on
planning time
in cases where no benefit ensues.  We're not committing any
patches that
give one uncommon case an 1100X speedup by penalizing every
other query 10%,
or even 1%; especially not when there may be other ways to fix it.


This is a problem that seriously hurts Postgres in data
warehousing applications. We can't keep ignoring optimizations
that provide even as little as 10% execution improvements for 10x
worse planner performance, because in a warehouse it's next to
impossible for planning time to matter.

Obviously it'd be great if there was a fast, easy way to figure
out whether a query would be expensive enough to go the whole 9
yards on planning it but at this point I suspect a simple GUC
would be a big improvement.


Something like "enable_equivalencefilters" but that defaults to false unlike every one 
existing "enable_*" GUC?

​It would be a lot more user-friendly to have something along the lines of "planner_mode 
(text)" with labels like "star, transactional, bulk_load, etc..." because I suspect 
there are other things we'd want to add if we start identifying queries by their type/usage and 
optimize accordingly. Having the knobs available is necessary but putting on a façade would make 
the user more straight-forward for the common cases.

David J.


How about:

planning_time_base 10  # Default effort, may be increased or decreased as 
required - must be at least 1
planning_time_  0  # By default, planner makes no (or minimal) effort to 
optimise for feature 

So for some people, adjusting planning_time_base may be sufficient - but for 
more specialised cases, people can tell the planner to consider expending more 
effort.


Mysql have now 19 optimizer_switch parameters
https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html
I notice that they are either on or off, I suspect that it is better to 
have some sort of measure of how much extra effort the planner should make.




Please don't do that.
I think having some might be useful - though in most situations, having 
a general indicator to the planner might be sufficient.


From reading the thread, I have the impression that for some extreme 
workloads, them some extra twiddling would be useful even though for 
most people it simply be an unnecessary complication.


In over twenty years I've never needed such knobs, but I might get a 
project next year where they might be useful.  So I agree that for most 
situations, such extra stuff is not needed - but I'd like additional 
options available if I ever needed them.




I'd rather like some sort of pg_stat_statements, which would track execution 
and planning time.
On new query, we can lookup if query can benefit from more planning time.
But i don't know how costly this can be.


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] custom function for converting human readable sizes to bytes

2015-11-23 Thread Gavin Flower

On 24/11/15 06:31, Pavel Stehule wrote:



2015-11-23 18:04 GMT+01:00 Tom Lane >:


Jim Nasby  writes:
> On 11/23/15 3:11 AM, Corey Huinker wrote:
>> +1 to both pg_size_bytes() and ::bytesize. Both contribute to
making the
>> statements more self-documenting.

> The function seems like overkill to me if we have the type. Just my
> opinion though. I'm thinking the type could just be called
'size' too
> (or prettysize?). No reason it has to be tied to bytes (in
particular
> this would work for bits too).

Please, no.  That's *way* too generic a name.

I do not actually agree with making a type for this anyway.  I can
tolerate a function, but adding a datatype is overkill; and it will
introduce far more definitional issues than it's worth. (eg, which
other types should have casts to/from it, and at what level)


so pg_size_bytes is good enough for everybody?

Regards

Pavel


regards, tom lane


perhaps pg_size_bites for those people who want: KiB,  MiB, GiB, TiB, 
PiB, ,..   ???:-)




Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-11-15 Thread Gavin Flower

On 16/11/15 12:05, Robert Haas wrote:

On Fri, Nov 13, 2015 at 10:46 AM, Thom Brown  wrote:

And perhaps associated PIDs?

Yeah, that can be useful, if others also feel like it is important, I can
look into preparing a patch for the same.

Thanks.

Thom, what do you think the EXPLAIN output should look like,
specifically?  Or anyone else who feels like answering.

I don't think it would be very useful to repeat the entire EXPLAIN
output n times, once per worker.  That sounds like a loser.  But we
could add additional lines to the output for each node, like this:

  Parallel Seq Scan on foo  (cost=0.00..XXX rows=YYY width=ZZZ) (actual
time=AAA..BBB rows=CCC loops=1)
   Leader: actual time=AAA..BBB rows=CCC loops=1
   Worker 0: actual time=AAA..BBB rows=CCC loops=1
   Worker 1: actual time=AAA..BBB rows=CCC loops=1
   Worker 2: actual time=AAA..BBB rows=CCC loops=1

If "buffers" is specified, we could display the summary information
after the Parallel Seq Scan as normal and then display an additional
per-worker line after the "Leader" line and each "Worker N" line.  I
think displaying the worker index is more useful than displaying the
PID, especially if we think that a plan tree like this might ever get
executed multiple times with different PIDs on each pass.

Like?  Dislike?  Other ideas?


Possibly have an option to include the PID?

Consider altering the format field width of the Worker number (depending 
on the number of workers) so you don't get:

   Worker 9 ...
   Worker 10 ...
but something like
   Worker  9 ...
   Worker 10 ...



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-11 Thread Gavin Flower

On 12/11/15 13:52, Greg Stark wrote:

On Wed, Nov 11, 2015 at 6:57 PM, Gavin Flower
<gavinflo...@archidevsys.co.nz> wrote:

Don't you realize that 400MB is over 4 million of the old 100Kb floppy
disks, and even with the new big 1.44MB 3.5 " disks, you'd need about 280!!!

Don't be silly. It's only four thousand 100Kb floppies.



You're right, of course!

So you won't mind helping me back up 400MB on 100 kB floppies then???



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-11 Thread Gavin Flower

On 12/11/15 02:07, Craig Ringer wrote:



On 11 November 2015 at 16:02, Torsten Zühlsdorff 
> 
wrote:


From my experience most databases are just tpo small. Their
operations finish before there can be a deadlock. Same for race
conditions - most developer don't know about them, because they
never stumbled upon them. I am matching regularly discussions if a
database is already to big when holding 10.000 records in the
whole cluster...


Ha. Yes. So true.

I see Stack Overflow posts where somebody explains that their query 
takes ages on their Huge!!1! database. Then it turns out the query 
takes 0.2 seconds on a 400MB table.


Huge. Right.

--
 Craig Ringer http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


I'll say its huge.

Don't you realize that 400MB is over 4 million of the old 100Kb floppy 
disks, and even with the new big 1.44MB 3.5 " disks, you'd need about 280!!!


Though, I suspect that the people who are saying that 400MB is huge, 
never actually used those floppies I mentioned above.


Now-a-days I would not regard 400GB as huge, even though when I started 
programming, MainFrames often had less than 1MB of core memory, and the 
big 12" tape reels could hold a max of 35MB of data.  Now I'm sitting 
sitting at a Linux box were even my SSD has hundreds of times the 
storage (let alone the capacity of my HD's) the entire New Zealand Post 
Office had in 1980!  How times change...


The reality, is that people tend to compare things in their direct 
experience, and don't have a feeling for the 'size' of the computers 
they use in terms of storage & processing power.  The above mainframe I 
mentioned (an ICL 4/72) had washing machine sized boxes each with a 60MB 
disk - everything was impressively sized, now you can fit a TB HD in a 
match box.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-09 Thread Gavin Flower

On 09/11/15 23:08, kawami...@tkl.iis.u-tokyo.ac.jp wrote:

Hi guys,

I’ve been using Postgres for research at an university,

Great!

[...]

・Postgres 9.4.1

[..]

More knowledgeable people are sure to reply in more detail!

However, they would probably appreciate it if you can run with 9.4.5 
(the latest released version).  Running it with the beta of 9.5 would be 
a bonus!


Note that I don't know enough to say for sure that later versions would 
make any difference in this case, but at least using later later 
versions would kill lots of Red Herrings!  :-)



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for geqo tweaks

2015-11-06 Thread Gavin Flower

On 07/11/15 09:59, Nathan Wagner wrote:
[...]
My day to make a fool of myself in public I guess. You're right of 
course. I can only plead distraction by having too many projects in 
mind at once and not focusing properly. Sorry for taking up your time 
on things I should have checked better.

[...]

There are two types of people:
those that don't bother
and
those that try, & fail!


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fortnight interval support

2015-11-02 Thread Gavin Flower

On 02/11/15 07:06, Tom Lane wrote:

David Fetter <da...@fetter.org> writes:

On Tue, Oct 27, 2015 at 07:30:13PM +, Nathan Wagner wrote:

On Wed, Oct 28, 2015 at 08:17:25AM +1300, Gavin Flower wrote:


I had actually written:

> You trying to get PostgreSQL banned in France???
>
> When I was learning French many years ago, I was told that the French
> consider their fortnight to be 15 days!!!

The following 2 lines were from Nathan, not me!

What, it's a "fortnight", not a "quinzaine".

You have no idea how hard it was to resist updating the patch...

Well, if you won't do it, I will.

Please tell me this is a joke.

(FWIW, I don't have a problem with "fortnight", but I draw the line
at units that are not used in English.)

regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fortnight interval support

2015-10-27 Thread Gavin Flower

On 28/10/15 02:52, Nathan Wagner wrote:

On Mon, Oct 26, 2015 at 01:58:52PM -0400, Robert Haas wrote:

Aw, you're no fun.  select '1 fortnight'::interval => '14 days' would be cool.

Patch attached...

:)


[...]

You trying to get PostgreSQL banned in France???  :-)

When I was learning French many years ago, I was told that the French 
consider their fortnight to be 15 days!!!


see:
https://nz.answers.yahoo.com/question/index?qid=20100920093722AAc54Xo
https://en.wikipedia.org/wiki/Fortnight
http://www.wordsense.eu/fortnight


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] questions about PG update performance

2015-10-26 Thread Gavin Flower

On 27/10/15 11:37, Jim Nasby wrote:

On 10/25/15 9:36 PM, Kisung Kim wrote:

I want to explain for our clients that PG's update performance is
comparable to Oracle's.


There's really only 2 ways you can answer that. You can either 
handwave the question away ("Yes, update performance is comparable."), 
or you have to do actual benchmarking. Trying to answer this from a 
theoretical standpoint is completely useless because there's an absurd 
number of things that will affect this:


Number of columns
Data types
Size of overall transaction
Percent of transactions that roll back
Size of table
What % of table is updated every day
Underlying hardware
What OS the database is running on
What filesystem the database is running on

... and that's just off the top of my head.

Or to look at it another way, I guarantee you can create a scenario 
where Postgres beats the pants off Oracle, *or vice versa*. So you 
have to either go with an answer along the lines of "For most 
workloads the performance of both databases is similar." or you have 
to benchmark the actual application in question. Most performance 
issues you find will probably be correctable with a moderate amount of 
work.


To me, the real tradeoff between Postgres and Oracle (or any other 
commercial database) is whether you'd rather spend money on expert 
employees or software contracts.


And of course, on how you alter the tuning parameters in 
postgresql.conf, like temp_buffers and work_mem.  The 'correct' values 
will depend on your workload and amount of RAM etc.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Release of CVEs

2015-10-13 Thread Gavin Flower

On 14/10/15 18:19, Tom Lane wrote:

I wrote:

Michael Paquier  writes:

On Mon, Oct 12, 2015 at 2:54 AM, Josh Berkus wrote:

I don't know that there's anything the PostgreSQL project can do about
it.  If anyone on this list is connected with MITRE, please ask them
what they need to be more prompt.

http://cve.mitre.org/ has a "Contact Us" tab linking to the address I
mentioned. That may be a start as at this state this is far more than
6 weeks.

I'm inclined to start by asking the Red Hat security guys, from whom
we obtained all these CVE numbers to begin with.  Will check into it
tomorrow.

According to the Red Hat guys, the fundamental problem is that Mitre like
to research and write up the official CVE descriptions themselves ...
which would be fine if they had adequate resources to do it in a timely
fashion, but they don't really.  Apparently, most of our bugs are of low
enough severity to be way down their priority list.  (Maybe we should
consider that a good thing.)

However, Red Hat did also point out a possible alternative: instead of
linking to the Mitre website, we could link to Red Hat's own repository
of CVE descriptions at
   https://access.redhat.com/security/cve/
for example
   https://access.redhat.com/security/cve/CVE-2015-5289

This is not as unofficial as it might seem, because for several years now
Mitre has officially delegated responsibility for initial assignment of
CVE numbers for all open-source issues to Red Hat.  (It's just final
wording of the descriptions that they're insisting on doing themselves.)

A quick browse through some of the relevant items says that this is at
least as good as cve.mitre.org in terms of the descriptions of the
security issues, but it is a bit Red-Hat-centric in that there's info
about which Red Hat package releases include a fix, but not about package
releases from other vendors such as Ubuntu.

As a former wearer of the red fedora, I'm not going to pretend to have
an unbiased opinion on whether we should switch our security-page links
to point to Red Hat's entries instead of Mitre's.  But it's something
worth considering, given that we're seeing as much as a year's lag in
Mitre's pages.

regards, tom lane


Would be be possibly to link to the Red Hat pages, and (at least semi) 
automate their replacement by the official pages when they become available?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Gavin Flower

On 29/09/15 11:54, Joshua D. Drake wrote:

On 09/28/2015 03:40 PM, Alvaro Herrera wrote:

Tom Lane wrote:


Now, running gitlab on community-owned hardware would potentially be an
option, if we find gitlab attractive from a functionality standpoint.
The question I'd have about that is whether it has a real development
community, or is open-source in name only.  If github did go belly up,
would we find ourselves maintaining the gitlab code all by ourselves?
That might not be the end of the world, but it wouldn't be a good use
of community time either.

Fundamentally, we're playing the long game here.  We do not want to 
make

a choice of tools that we're going to regret ten years from now.


We already made a similar choice some years ago when we started
depending on the then-recently open sourced SourceForge code for
pgFoundry.  That didn't turn out all that well in the long run.


I think we need to look at long standing FOSS projects with a large 
and extended user base (Redmine, RT). Anything that is fairly 
community specific (Debbugs) likely will cause more heartache than it 
is worth in the long run.


JD



Linux kernel project uses bugzilla (https://bugzilla.kernel.org)
and so does LibreOffice (https://bugs.documentfoundation.org)

I think they are both fairly big projects in for the long haul.


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use gender-neutral language in documentation

2015-09-24 Thread Gavin Flower


An example from a book on PostgreSQL server programming that I'm working 
through (Note that it is obviously awkward to write with gender pronouns 
when gender is irrelevant, note the "he she" in one place and "he/she" 
in another!):


   "If the user is a superuser, then he she has permission to see the
   full query. If the user is a regular user, he/she will only see the
   full query for his/her queries."

Written in 'Gender Appropriate' style (only refer to gender when it is 
relevant):


   "If the user is a superuser, then they have permission to see the
   full query. If the user is a regular user, they will only see the
   full query for their queries."

I think the second version is easier to read - and in this case, shorter!


-Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use gender-neutral language in documentation

2015-09-24 Thread Gavin Flower

On 24/09/15 22:41, Geoff Winkless wrote:
On 24 September 2015 at 11:33, Gavin Flower 
<gavinflo...@archidevsys.co.nz 
<mailto:gavinflo...@archidevsys.co.nz>>wrote:


An example from a book on PostgreSQL server programming that I'm
working through (Note that it is obviously awkward to write with
gender pronouns when gender is irrelevant, note the "he she" in
one place and "he/she" in another!):

   "If the user is a superuser, then they have permission to see the
   full query. If the user is a regular user, they will only see the
   full query for their queries."

Can I quietly suggest "​Users with superuser pemissions can always see 
the full query​, while regular users will only see the full query for 
their own queries."?


Geoff

By all means say it quietly!  :-)

But I was simply trying to change it into Gender Appropriate form, 
rather improve it in other aspects.


However, your rephrasing is better still!


-Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use gender-neutral language in documentation

2015-09-22 Thread Gavin Flower

On 23/09/15 08:17, Christopher Browne wrote:
[...]


"The user should do [something]; he or she or it, or the cis/trans/asexual
personage or connection used by a web application, whether written in
C, Java, Perl, PHP, running on Windows or Unix or ..." as the increasing
inclusions heads to some pathological limit.

[...]

order to be politically correct" will irritate people; instead, make the
documentation *better*.  Replacing "he" with 
"he/she/cis/trans/Unix/Windows"

(or some such) wouldn't make it *better*.  (And probably that's a phrase
that's missing some legalistic wherefores whereases!)

[...]

You also have to include "they" as some people have multiple 
personalities, I actually met one (but only - as far as I can tell - saw 
one of them) - in the early 1990's I conversed with several people on 
alt.sexual.abuse.recovery, so got more insights into these types of 
complexities than most people.  I was doing a project in network 
traffic, and got to look at some high volume usenet groups, of which 
that group was one.


Don't forget GNU/Linux,  & GNU/Hurd, plus many others...  :-)

I'll settle for avoiding unnecessary use of gender!


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use gender-neutral language in documentation

2015-09-22 Thread Gavin Flower

On 22/09/15 21:33, Geoff Winkless wrote:
On 22 September 2015 at 09:28, Albe Laurenz >wrote:


Peter Geoghegan wrote:
> On Mon, Sep 21, 2015 at 9:32 PM, Erik Rijkers > wrote:
>> I think this compulsive 'he'-avoiding is making the text worse.
>>
>>
>> -  environment variable); any user can make such a change
for his session.
>> +  environment variable); any user can make such a change
for their session.
>
> -1. It seems fine to me.

(Disclaimer: I am not a native speaker.)

Using the pronoun of the third person plural as a replacement for
"his or her"
has become widely used, at least in the U.S., and the OED condones
that use:
http://www.oxforddictionaries.com/definition/english/they


Without wanting to get into a grammar war, ​I'm not so sure I agree 
that it "condones" it. Dictionaries reflect the current state of 
usage, they don't act as arbiters of correctness. The abuse of 
"literally" as an emphasiser (which usage is now listed in the OED) is 
a prime example.


As an Englishman ​I would prefer "his or her" over "their". Perhaps 
our American cousins might disagree though.


WRT the second, it probably doesn't help that "might not be the same 
as the database user that is to be connect as" is incorrect anyway - 
it should perhaps be "that is to be connect*ed *as" (although I still 
find the construction clumsy).


Geoff

I am an Englishman.

I prefer "their" rather than "his or her", it is less clumsy & there is 
no point in specifying gender unless it is relevant!


Besides, some people are neither, or their biological gender is 
ambiguous - so a few people fit into neither the male nor the female 
category (depending on precise definitions, about 0.5%)!



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use gender-neutral language in documentation

2015-09-22 Thread Gavin Flower

On 22/09/15 22:17, Geoff Winkless wrote:
On 22 September 2015 at 10:52, Gavin Flower 
<gavinflo...@archidevsys.co.nz 
<mailto:gavinflo...@archidevsys.co.nz>>wrote:


On 22/09/15 21:33, Geoff Winkless wrote:

​ ​
Without wanting to get into a grammar war, ​I'm not so sure I
agree that it "condones" it. Dictionaries reflect the current
state of usage, they don't act as arbiters of correctness. The
abuse of "literally" as an emphasiser (which usage is now
listed in the OED) is a prime example.

I would prefer "his or her" over "their". Perhaps our American
cousins might disagree though.

I prefer "their" rather than "his or her", it is less clumsy &
there is no point in specifying gender unless it is relevant!


I agree in that I prefer "their" in informal speech; however in a 
formal document I would find it sloppy.​ I don't think "his or her" is 
inherently clumsy; m

aybe I'm just showing my age.​

Besides, some people are neither, or their biological gender is
ambiguous - so a few people fit into neither the male nor the
female category (depending on precise definitions, about 0.5%)!


My understanding is that most intersex (and certainly all trans) 
people would identify with one or the other, and even those who don't 
select exclusively identify with a mix of both (and would therefore 
still be covered by "his or her", no?) although I don't pretend to be 
an expert.


Perhaps it would be easier to avoid the controversy by actually 
rewording into the plural, where possible?


So

"any user can make such a change for his session."

becomes

"Users can make such a change for their individual sessions"

or similar?

Geoff
To me, the key things is NOT to specify gender, unless it is relevant - 
and I don't think gender is relevant in describing how to use a database.


I was using "Gender Appropriate" language long before the Politically 
Correct craze started (over 50 years ago)!  I was told references to 
"he" in rules included females, which I thought was daft!



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PROPOSAL] Covering + unique indexes.

2015-09-14 Thread Gavin Flower

On 15/09/15 09:44, Jim Nasby wrote:

On 9/14/15 1:50 PM, Thomas Munro wrote:

CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}}
INDEX ON
table_name (column_name1, column_name2 ...);


I would use the first (simple) syntax and just throw an error 
if the

user tries to skip a column on the UNIQUE clause.

Seems, second option looks as more natural extension of CREATE
UNIQUE INDEX


True, but it's awefully verbose. :( And...


It surprised me that you can INCLUDE extra columns on non-UNIQUE
indexes, since you could just add them as regular indexed columns for
the same effect.  It looks like when you do that in SQL Server, the
extra columns are only stored on btree leaf pages and so can't be used
for searching or ordering.  I don't know how useful that is or if we
would ever want it... but I just wanted to note that difference, and
that the proposed UNIQUE ON FIRST n COLUMNS syntax and catalog change
can't express that.


... we might want to support INCLUDE at some point. It enhances 
covering scans without bloating the heck out of the btree. (I'm not 
sure if it would help other index types...) So it seems like a bad 
idea to preclude that.


I don't see that UNIQUE ON FIRST precludes also supporting INCLUDE. 
Presumably we could do either


CREATE INDEX ... ON table (f1, f2, f3) UNIQUE(f1, f2) INCLUDE(f4);

Of the formats I've seen so far, I prefer this one.

I think using "[ALSO] INCLUDE(f4)" - might be potentially more readable 
than using just "INCLUDE(f4)".  even if not used, the noise word also 
would help people understand that the other fields mentioned are already 
covered.


If not too difficult then allowing the unique fields to be separated by 
other fields could be useful - in the example allowing "UNIQUE(f1, 
f3)".  Especially if the index is likely to be used to CLUSTER a table, 
where the order f1, f2, ... is important.




or
CREATE UNIQUE ON FIRST 2 COLUMNS INDEX ... ON table (f1, f2, f3) 
INCLUDE(f4);


Personally, I find the first form easier to read.

Are we certain that no index type could ever support an index on (f1, 
f2, f3) UNIQUE(f1, f3)? Even if it doesn't make sense for btree, 
perhaps some other index could handle it.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-09-03 Thread Gavin Flower

On 04/09/15 12:11, Bruce Momjian wrote:

On Thu, Sep  3, 2015 at 11:37:09PM +0200, Petr Jelinek wrote:

I don't understand.  I'm just proposing that the source code for the
extension to live in src/extensions/, and have the shared library
installed by toplevel make install; I'm not suggesting that the
extension is installed automatically.  For that, you still need a
superuser to run CREATE EXTENSION.


+! for this

OK, what does "+!" mean?  (I know it is probably a shift-key mistype,
but it looks interesting.)

It obviously signifies a Good Move that involved a check - at least, 
that is what it would mean when annotating a Chess Game!  :-)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable plan

2015-08-26 Thread Gavin Flower

On 27/08/15 13:36, Kouhei Kaigai wrote:
[...]
My measurement is done on v9.5 based system. So, it also seems to me 
replacement of CHAR(n) by VARCHAR(n) will make sense.


Is there any reason to not simply use text instead of CHAR(n) or VARCHAR(n)?

[...]


-Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] allow 'semester' in functions EXTRACT, DATE_PART, TO_CHAR and so on.

2015-08-14 Thread Gavin Flower

On 15/08/15 11:58, Dickson S. Guedes wrote:

Hello hackers,

* Description

This patch is a proposal to allow the use of word 'semester' to
extract it from date in functions like EXTRACT, DATE_PART, etc and
adds the letter 'S' to format the date output in to_char.

** Example

SELECT EXTRACT(semester FROM DATE '2015-07-07');

date_part
---
  2

* Motivation

The term is used in a school or college to represent a half-year.
Actually it could be evaluated from a date using some math with
'quarter' but could not be extracted from date since the API to define
a reserved word for EXTRACT is in the core, rather than in SQL.

* Syntax

I'm using the word 'semester' since it is common in college and school
but I wondering if it should be other word (like 'halfyear', for
example)

Is this a feature that worth?

BTW, I'll put this in the commit fest (2015-09).

Thanks!
Note that the start and end points for a semester might well be 
different between universities in the Northern  Southern Hemispheres of 
this planet- also between institutions in the same hemisphere.


The first semester in 2015 for the University of Auckland begins Monday 
2 March 2015:

https://www.auckland.ac.nz/en/for/current-students/cs-academic-information/cs-academic-dates.html

The first semester in 2015 for ANU (Australian National University) 
begins Monday 16 February 2015:

http://www.anu.edu.au/directories/university-calendar

So semesters don't appear to align with normal half year boundaries.


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] allow 'semester' in functions EXTRACT, DATE_PART, TO_CHAR and so on.

2015-08-14 Thread Gavin Flower

On 15/08/15 13:06, Dickson S. Guedes wrote:

2015-08-14 21:32 GMT-03:00 Gavin Flower gavinflo...@archidevsys.co.nz:
...

So semesters don't appear to align with normal half year boundaries.

Interesting links, thanks!

Which sounds better for a native English: 'half', 'halfyear'?

For example:


SELECT date_trunc('halfyear', current_date);

date_trunc

  2015-07-01 00:00:00-03
(1 row)

Thanks!
I would go with 'halfyear', but I don't have a use case for it, so don't 
put too much weight on my opinion of this!



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-08-07 Thread Gavin Flower

On 08/08/15 06:43, Bruce Momjian wrote:

On Fri, Aug  7, 2015 at 11:53:30AM -0400, Robert Haas wrote:

[...]

Well, we could just throw a Postgres 9.5 is faster release note item
in there and call it a day.  ;-)

Nah! Just say it is Much Shinier, I'll buy it.

Unfortunately, we have to have much more professional sounding reasons 
to upgrade, to tell our managers - so I guess you DO need more details...


[...]



(I realize now that compiling the release notes must be a somewhat
thankless task, so let me just say that I do appreciate the work
you've put into this very much and the comments above shouldn't be
understood to take anything away from that.  The fact that we don't
agree on what the criteria ought to be does not mean that I don't
appreciate you doing the work.)

Considering the number of almost-arbitrary decisions I have to make to
write the major release notes, I am surprised at how few complaints I
get.  Of course, I have been clearly told by core that no one else wants
this job.

All joking aside, I appreciate your efforts.  I read the release notes, 
even though currently I don't have an immediate need to use PostgreSQL.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 64-bit XIDs again

2015-07-30 Thread Gavin Flower

On 31/07/15 02:24, Heikki Linnakangas wrote:

On 07/30/2015 04:26 PM, Alexander Korotkov wrote:

Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in 
t_infomask2 to

distinguish old and new formats.


I think we should move to 64-bit XIDs in in-memory structs snapshots, 
proc array etc. And expand clog to handle 64-bit XIDs. But keep the 
xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field 
to the page header so that logically the xmin/xmax fields on the page 
are 64 bits wide, but physically stored in 32 bits. That's possible as 
long as no two XIDs on the same page are more than 2^31 XIDs apart. So 
you still need to freeze old tuples on the page when that's about to 
happen, but it would make it possible to have more than 2^32 XID 
transactions in the clog. You'd never be forced to do anti-wraparound 
vacuums, you could just let the clog grow arbitrarily large.


There is a big downside to expanding xmin/xmax to 64 bits: it takes 
space. More space means more memory needed for caching, more memory 
bandwidth, more I/O, etc.


- Heikki



I think having a special case to save 32 bits per tuple would cause 
unnecessary complications, and the savings are minimal compared to the 
size of current modern storage devices and the typical memory used in 
serious database servers.


I think it is too much pain for very little gain, especially when 
looking into the future growth in storage capacity andbandwidth.


The early mainframes used a base displacement technique to keep the size 
of addresses down in instructions: 16 bit addresses, comprising 4 bits 
for a base register and 12 bits for the displacement (hence the use of 
4KB pages sizes now!).  Necessary at the time when mainframes were often 
less than 128 KB!  Now it would ludicrous to do that for modern servers!



Cheers,
Gavin

(Who is ancient enough, to have programmed such MainFrames!)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alpha2/Beta1

2015-07-21 Thread Gavin Flower

On 22/07/15 09:01, Josh Berkus wrote:
[...]


My question for Hackers is: should this be Alpha2 or Beta 1?



Yes!  :-)

(The decision, as to which, I'll leave to the REAL Developers.)

More seriously, I expect that the alpha probably generated a lot of 
useful feedback.  Was this actually the case?



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-19 Thread Gavin Flower

On 20/07/15 15:00, Pavel Stehule wrote:



2015-07-19 21:39 GMT+02:00 Josh Berkus j...@agliodbs.com 
mailto:j...@agliodbs.com:


Pavel, All:

Just to be clear, the idea of a global temp table is that the
table def
is available to all users, but the data is private to each session?


yes.

Pavel


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Just wondering...

Would it be difficult to add the ability for one user to share the 
contents with a list of named other users (roles)?


-Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-07-01 Thread Gavin Flower

On 01/07/15 17:37, Amit Kapila wrote:
On Tue, Jun 30, 2015 at 4:00 AM, Jeff Davis pg...@j-davis.com 
mailto:pg...@j-davis.com wrote:


 [Jumping in without catching up on entire thread.


[...]

.

 2. Where is the speedup coming from? How much of it is CPU and IO
 overlapping (i.e. not leaving disk or CPU idle while the other is
 working), and how much from the CPU parallelism? I know this is
 difficult to answer rigorously, but it would be nice to have some
 breakdown even if for a specific machine.


Yes, you are right and we have done quite some testing (on the hardware
available) with this patch (with different approaches) to see how much
difference it creates for IO and CPU, with respect to IO we have found
that it doesn't help much [1], though it helps when the data is cached
and there are really good benefits in terms of CPU [2].


[...]

I assume your answer refers to a table on one spindle of spinning rust.


QUESTIONS:

1. what about I/O using an SSD?

2. what if the table is in a RAID array (of various types), would
   having the table spread over multiple spindles help?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [CORE] Restore-reliability mode

2015-06-08 Thread Gavin Flower

On 09/06/15 00:59, David Gould wrote:

I think Alphas are valuable and useful and even more so if they have release
notes. For example, some of my clients are capable of fetching sources and
building from scratch and filing bug reports and are often interested in
particular new features. They even have staging infrastructure that could
test new postgres releases with real applications. But they don't do it.
They also don't follow -hackers, they don't track git, and they don't have
any easy way to tell if if the new feature they are interested in is
actually complete and ready to test at any particular time. A lot of
features are developed in multiple commits over a period of time and they
see no point in testing until at least most of the feature is complete and
expected to work. But it is not obvious from outside when that happens for
any given feature. For my clients the value of Alpha releases would
mainly be the release notes, or some other mark in the sand that says As of
Alpha-3 feature X is included and expected to mostly work.

-dg



RELEASE NOTES

I think that having:

1. release notes

2. an Alpha people can simply install without having to compile

Would encourage more people to get involved.  Such people would be 
unlikely to have the time and inclination to use 'nightlies', even if 
compiling was not required.


I have read other posts in this thread, that support the above.

Surely, it would be good for pg to have some more people checking 
quality at an earlier stage?  So reducing barriers to do so is a good thing?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [CORE] Restore-reliability mode

2015-06-06 Thread Gavin Flower

On 06/06/15 21:07, Simon Riggs wrote:
On 5 June 2015 at 17:20, Alvaro Herrera alvhe...@2ndquadrant.com 
mailto:alvhe...@2ndquadrant.com wrote:


Simon Riggs wrote:
 On 5 June 2015 at 15:00, Robert Haas robertmh...@gmail.com
mailto:robertmh...@gmail.com wrote:

  Stamping it a beta implies that we think it's something fairly
  stable that we'd be pretty happy to release if things go well,
which
  is a higher bar to clear.

 We don't have a clear definition of what Beta means. For me,
Beta has
 always meant trial software, please test.

I think that definition *is* the problem, actually.  To me, beta
means
trial software, please test, but final product will be very
similar to
what you see here.  What we need to convey at this point is what you
said, but I think a better word for that is alpha. There may be more
mobility in there than in a beta, in users's perception, which is the
right impression we want to convey.

Another point is that historically, once we've released a beta, we're
pretty reluctant to bump catversion.  We're not ready for that at this
stage, which is one criteria that suggests to me that we're not ready
for beta.

So I think the right thing to do at this point is to get an alpha out,
shortly after releasing upcoming minors.


OK, I can get behind that.

My only additional point is that it is a good idea to release an Alpha 
every time, not just this release.


And if its called Alpha, lets release it immediately. We can allow 
Alpha1, Alpha2 as needed, plus we allow catversion and file format 
changes between Alpha versions.


Proposed definitions

Alpha: This is trial software please actively test and report bugs. 
Your feedback is sought on usability and performance, which may result 
in changes to the features included here. Not all known issues have 
been resolved but work continues on resolving them. Multiple Alpha 
versions may be released before we move to Beta. We reserve the right 
to change internal API definitions, file formats and increment the 
catalog version between Alpha versions and Beta, so we do not 
guarantee and easy upgrade path from this version to later versions of 
this release.


Beta: This is trial software please actively test and report bugs and 
performance issues. Multiple Beta versions may be released before we 
move to Release Candidate. We will attempt to maintain APIs, file 
formats and catversions.


--
Simon Riggs http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
As a 'user' I am very happy with the idea of having Alpha's, gives me a 
feeling that there will be less chance of problems being released in the 
final version.


Because not only does it give more chances to test, but might encourage 
more people to get involved in contributing, either ideas for minor 
tweaks or simple patches etc. (as being not quite finished, and an 
expectation that minor functional changes have a possibility of being 
accepted for the version, if there is sufficient merit).



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is it possible to have a fast-write Index?

2015-06-05 Thread Gavin Flower

On 06/06/15 04:07, deavid wrote:
There are several use cases where I see useful an index, but adding it 
will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table 
which has frequent updates, we need several index to speed up selects, 
but then we'll slow down updates a lot, specially when we have 10 or 
more indexes.
Other cases involve indexes for text search, which are used only for 
user search and aren't that important, so we want to have them, but we 
don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems 
in some ways (table partitioning, partial indexes, etc), but i don't 
feel they are the solution to every problem of this kind.


Some people already asked for delayed write indexes, but the idea 
gets discarded because the index could get out of sync, so it can omit 
results and this is unacceptable. But i think maybe that could be 
fixed in several ways and we can have a fast and reliable index (but 
maybe not so fast on selects).


Since I do not know every internal of postgres, i feel simpler to 
share here and ask which things can or cannot be done.


Let's imagine there is a new type of index called weird_btree, where 
we trade-off simplicity for speed. In almost every mode, we will rely 
on VACUUM to put our index in optimal state.


Mode 1: on aminsert mark the index as INVALID. So, if you modified 
the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before 
doing SELECT. This is almost the same as create index concurrently, 
the main difference is you don't have to remember to drop the index 
before writing. (I don't see much benefit here)


Mode 2: on aminsert, put the new entry in a plain, unordered list 
instead of the btree. Inserting at the end of a list should be faster 
than big btrees and you'll know later which entries you missed indexing.


Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the 
btree there is the list and output every row, out-of order. You will 
have to tell postgres that our index isn't sorted and it will have to 
recheck every row.


Mode 2.b: mark the index invalid instead. When doing the next vacuum, 
sort the list and insert it to the btree in a bulk operation. If it's 
ok, mark the index valid.


Mode 3: on aminsert, put the new entry on a second btree; leaving 
the first one untouched. Because the second btree is new, will be 
small, and writes should be faster. When doing a index scan, read 
tuples from both at same time (like merge sort). On vacuum, merge the 
second btree onto the first. On this mode, the index is sorted and 
there's no need of recheck.


Anyone thinks this would be a interesting feature for postgresql?
Did I miss something?

PD: Maybe it's also possible to take advantage of clustering, and have 
indexes which entries are range of TIDs; but i'm not sure if this is 
too exotic, or if it will make a difference.


Sincerely,
David.

How about a hybrid indexing system, with 2 parts:

(1) existing index system which is checked first and has been mostly 
optimised for speed of reading.  If there are only a few inserts/updates 
and the system is not heavily loaded, then it gets modified 
immediately.  The threshold for being too busy, and few enough changes, 
could be configurable.


(2) overflow index optimised for writing.  Possible in memory and not 
backed to permanent storage.  A crash would require a complete index 
rebuild - but only when there were entries in it (or at least more than 
some configurable threshold, to allow for cases were some missing index 
entries are acceptable).


So where the index is needed for a query, part 1 is checked first, and 
the part 2 if necessary


Have a background process that will move entries from part 2 to part 1, 
when the systems is less busy.



Cheers,
Gavin





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade resets timeline to 1

2015-05-28 Thread Gavin Flower

On 29/05/15 12:59, Noah Misch wrote:

On Thu, May 28, 2015 at 05:26:56PM +0200, Christoph Berg wrote:

Re: Noah Misch 2015-05-28 20150528150234.ga4111...@tornado.leadboat.com

To clarify for the archives, the 2015-05-16 changes did not revert to 9.3 and
earlier behavior.  Rather, they standardized on the {9.0,9.1,9.2}-to-{9.3,9.4}
upgrade behavior, bringing that behavior to all supported branches and source
versions.  Here is the history of timeline restoration in pg_upgrade:

Ok, sorry for the noise then. It's not a regression, but I still think
the behavior needs improvement, but this is indeed 9.6 material.

No, thank you for the report.  It had strong signs of being a regression,
considering recent changes and the timing of your discovery.


From my experience, I would far rather a user raise concerns that are 
important to them, and find there is no real problem, than users not 
raising things and a serious bug or system shorting coming go unnoticed.


This is a major concern of mine, for example: in my current project, 
where users were NOT raising problems in a timely manner, caused 
unnecessary work rather later in the project than I would have liked!


So not just for PostgreSQL, but in general if a user has concerns, 
please raise them!!!



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problems with question marks in operators (JDBC, ECPG, ...)

2015-05-19 Thread Gavin Flower

On 20/05/15 07:37, Jan de Visser wrote:

On May 19, 2015 07:04:56 PM Greg Sabino Mullane wrote:

Bruno Harbulot asked for a devil's advocate by saying:

My main point was that this is not specific to JDBC. Considering that even
PostgreSQL's own ECPG is affected, the issue goes probably deeper than it
seems. I'm just not convinced that passing the problem onto connectors,
libraries and ultimately application developers is the right thing to do
here.

Well, one could argue that it *is* their problem, as they should be using
the standard Postgres way for placeholders, which is $1, $2, $3...

Shirley you are joking: Many products use JDBC as an abstraction layer
facilitating (mostly) seamless switching between databases. I know the product
I worked on did. Are you advocating that every single statement should use
SELECT * FROM foo WHERE bar = $1 on pg and SELECT * FROM foo WHERE bar = ?
on every other database?

A database is only as valuable as the the part of the outside world it can
interact with. Large parts of the data-consuming world are developed in java
using JDBC. If your opinion is that JDBC developers should adapt themselves to
pg then you instantaneously diminish the value of pg.

jan



I prefer the $1 approach, others can't use that, and there are 
situations where I could not either.


So, how about defaulting to the '?' approach, but have a method to 
explicitly set the mode - to switch to using '$'?




Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Triaging the remaining open commitfest items

2015-05-14 Thread Gavin Flower

On 15/05/15 10:58, Bruce Momjian wrote:

On Thu, May 14, 2015 at 06:57:24PM -0400, Tom Lane wrote:

Stephen Frost sfr...@snowman.net writes:

* Bruce Momjian (br...@momjian.us) wrote:

I will call for a vote that the freeze deadline be changed if this patch
is rejected to due to time.  I might lose the vote, but I am going to
try because if we lose our reputation for fairness, we have lost a lot
more than a week/month of release time.

I'm guessing the vote is core-only, but +1 from me in any case.  I fully
agree that this patch has had a serious measure of effort put behind it
from the author and is absolutely a capability we desire and need to
have in core.

I should think we'd have learned by now what happens when we delay a
release date to get in some extra feature.  It hasn't worked well in
the past and I see no reason to believe the results would be any more
desirable this time.

Right, the importance of the feature is not a reason to delay the
feature freeze.

Following rules like this is very important, but so is making valid 
exceptions.


Though I'm in no position to judge the importance of this patch, so I 
won't attempt to!



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   >