Re: [HACKERS] Proposal of Table Partition

2015-08-31 Thread Ashutosh Bapat
There is already a recent proposal on hackers about partition support in
PostgreSQL by Amit Langote. You will find the thread at
http://www.postgresql.org/message-id/55d3093c.5010...@lab.ntt.co.jp. May be
you can collaborate with the ongoing work.

On Sun, Aug 30, 2015 at 4:28 PM, My Life  wrote:

> Hi, everyone! I'd like to propose a postgres partition implementation.
> First, I would show the design to everyone, and talk about it. If we think
> the design is not very bad, and can be commit to the PostgreSQL baseline,
> then I will post the code to the community.
> (note: my english is not very good.)
>
> Table Partition Design
> =
> In this design, partitions are normal tables in inheritance hierarchies,
> with the same table structure with the partitioned table.
>
> In pg_class we have an additional relpartition field which has following
> values:
> 's'/* single regular table */
> 'r'/* partitioned table by range */
> 'l'/* partitioned table by list */
> 'h'/* partitioned table by hash */
> 'c'/* child partition table */
>
> Add a new system schema named 'pg_partition', just like 'pg_toast', we can
> create the partition catalog table to store the partition entries. let's
> assume the partition catalog's name is pg_partition_2586 (2586 is the
> partitioned table's OID in pg_class).
> a range or interval partition catalog's structure is as follows:
> columndata typecomment
> partnamenamea partition's name, this is the
> primary key
> partidoida partition's OID in pg_class
> intervaltexta interval partition's interval(maybe
> a expression)
> partkey1depends on partitioned table
> ...
> partkeyNdepends on partitioned table
> partkey1, ..., partkeyN is a partition's upper bound.
> Finally, make a unique constraint on partkey1, ..., partkeyN.
> Every time we create a new partition, we insert a new tuple into this
> partition catalog.
> Every time we drop an old partition, we delete the related tuple in this
> partition catalog.
>
> For a partitioned table's CREATE action, we should transform the action
> into the CREATE action of partitioned table and partitions, and the INSERT
> action into the partition catalog.
>
> For INSERT action, we implement a RelationGetTuplePartid method, which can
> find the partition the tuple belongs to. It will do an index scan on the
> partition catalog table(assume it is pg_partition_2586) to find the
> partition.
> and a ExecGetPartitionResultRel method, which can return the partition's
> ResultRelInfo to execute INSERT action.
>
> For partitioned table's scan action, and JOIN action, we implemented a
> plan node named 'PartitionExpand'. the plan node can expand the partitioned
> table scan node into a list of partitions according to the filter and
> conditions. and it can expand partitioned table JOIN node into a list of
> partitions JOIN node wisely.
>
> For pg_dump backup action, we should dump the partition catalog, and
> relpartition field in pg_class.
>
> so these are the main points of the design, and I can show any detail you
> wondered later.
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Proposal of Table Partition

2015-08-31 Thread Amit Langote

Hello,

On 2015-08-30 PM 07:58, My Life wrote:
> Hi, everyone! I'd like to propose a postgres partition implementation.
> First, I would show the design to everyone, and talk about it. If we think
> the design is not very bad, and can be commit to the PostgreSQL baseline,
> then I will post the code to the community.

As Ashutosh suggested, could you please take a look at my proposal at the
link he posted?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-08-31 Thread Fabien COELHO


Hello Amit,


IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB


Wow! Thanks for trying the patch on such high-end hardware!

About the disks: what kind of HDD (RAID? speed?)? HDD write cache?

What is the OS? The FS?


warmup=60


Quite short, but probably ok.


scale=300


Means about 4-4.5 GB base.


time=7200
synchronous_commit=on



shared_buffers=8GB


This is small wrt hardware, but given the scale setup I think that it 
should not matter much.



max_wal_size=5GB


Hmmm... Maybe quite small given the average performance?


checkpoint_timeout=2min


This seems rather small. Are the checkpoints xlog or time triggered?

You did not update checkpoint_completion_target, which means 0.5 so that 
the checkpoint is scheduled to run in at most 1 minute, which suggest at 
least 130 MB/s write performance for the checkpoint.



parallelism - 128 clients, 128 threads


Given 192 hw threads, I would have tried used 128 clients & 64 threads, so 
that each pgbench client has its own dedicated postgres in a thread, and 
that postgres processes are not competing with pgbench. Now as pgbench is 
mostly sleeping, probably that does not matter much... I may also be 
totally wrong:-)



Sort - off
avg over 7200: 8256.382528 ± 6218.769282 [0.00, 76.05,
10975.50, 13105.95, 21729.00]
percent of values below 10.0: 19.5%


The max performance is consistent with 128 threads * 200 (random) writes 
per second.



Sort - on
avg over 7200: 8375.930639 ± 6148.747366 [0.00, 84.00,
10946.00, 13084.00, 20289.90]
percent of values below 10.0: 18.6%


This is really a small improvement, probably in the error interval of the 
measure. I would not trust much 1.5% tps or 0.9% availability 
improvements.


I think that we could conclude that on your (great) setup, with these 
configuration parameter, this patch does not harm performance. This is a 
good thing, even if I would have hoped to see better performance.



Before going to conclusion, let me try to explain above data (I am
explaining again even though Fabien has explained, to make it clear
if someone has not read his mail)

Let's try to understand with data for sorting - off option

avg over 7200: 8256.382528 ± 6218.769282

8256.382528 - average tps for 7200s pgbench run
6218.769282 - standard deviation on per second figures

[0.00, 84.00, 10946.00, 13084.00, 20289.90]

These 5 values can be read as minimum TPS, q1, median TPS, q3,
maximum TPS over 7200s pgbench run.  As far as I understand q1
and q3 median of subset of values which I didn't focussed much.


q1 = 84 means that 25% of the time the performance was below 84 tps, about 
1% of the average performance, which I would translate as "pg is pretty 
unresponsive 25% of the time".


This is the kind of issue I really want to address, the eventual tps 
improvements are just a side effect.



percent of values below 10.0: 19.5%

Above means percent of time the result is below 10 tps.


Which means "postgres is really unresponsive 19.5% of the time".

If you count zeros, you will get "postgres was totally unresponsive X% of 
the time".



Now about test results, these tests are done for pgbench full speed runs
and the above results indicate that there is approximately 1.5%
improvement in avg. TPS and ~1% improvement in tps values which are
below 10 with sorting on and there is almost no improvement in median or
maximum TPS values, instead they or slightly less when sorting is
on which could be due to run-to-run variation.


Yes, I agree.


I have done more tests as well by varying time and number of clients
keeping other configuration same as above, but the results are quite
similar.


Given the hardware, I would suggest to raise checkpoint_timeout, 
shared_buffers and max_wal_size, and use checkpoint_completion_target=0.8. 
I would expect that it should improve performance both with and without 
sorting.


It would be interesting to have informations from checkpoint logs 
(especially how many buffers written in how long, whether checkpoints are 
time or xlog triggered, ...).


The results of sorting patch for the tests done indicate that the win is 
not big enough with just doing sorting during checkpoints,


ISTM that you do too much generalization: The win is not big "under this 
configuration and harware".


I think that the patch may have very small influence under some 
conditions, but should not degrade performance significantly, and on the 
other hand it should provide great improvements under some (other) 
conditions.


So having no performance degradation is a good result, even if I would 
hope to get better results.  It would be interesting to understand why 
random disk writes do not perform too poorly on this box: size of I/O 
queue, kind of (expensive:-) disks, write caches, file system, raid 
level...



we should consider flush patch along with sorting.


I also think that it would be interesting.

I would like to perform some tests with both 

Re: [HACKERS] Potential GIN vacuum bug

2015-08-31 Thread Jeff Janes
On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane  wrote:
> >> Your earlier point about how the current design throttles insertions to
> >> keep the pending list from growing without bound seems like a bigger
> deal
> >> to worry about.  I think we'd like to have some substitute for that.
> >> ...
>
> > If the goal is to not change existing behavior (like for back patching)
> the
> > margin should be 1, always wait.
>
> The current behavior is buggy, both as to performance and correctness,
> so I'm not sure how come "don't change the behavior" should be a
> requirement.
>

I'm not confident the new behavior, with regards to performance, is an
absolute win.
We usually don't backpatch performance changes unless they have no or very
little
trade off.  The only margin I can confidently say that for is the margin of
1.0.


>
> > But we would still have to deal with the
> > fact that unconditional acquire attempt by the backends will cause a
> vacuum
> > to cancel itself, which is undesirable.
>
> Good point.
>
> > If we define a new namespace for
> > this lock (like the relation extension lock has its own namespace) then
> > perhaps the cancellation code could be made to not cancel on that
> > condition.  But that too seems like a lot of work to backpatch.
>
> We could possibly teach the autocancel logic to distinguish this lock type
> from others without using a new namespace.  That seems a bit klugy, but
> maybe better than adding a new namespace.  (For example, there are
> probably only a couple of modes in which we take page-level locks at
> present.  Choosing a currently unused, but self-exclusive, mode for taking
> such a lock might serve.)
>

Like the attached?  (The conditional variant for user backends was
unceremoniously yanked out.)


>
> > Would we bother to back-patch a theoretical bug which there is no
> evidence
> > is triggering in the field?
>
> What's theoretical about it?  You seemed pretty sure that the issue in
>
> http://www.postgresql.org/message-id/flat/CA+bfosGVGVQhMAa=0-mue6coo7dbsgayxb-xsnr5vm-s39h...@mail.gmail.com
> was exactly this.
>

I was adamant that the broken concurrency was not helping him
performance-wise, and also introduces correctness bugs.  But I don't think
the unfortunate performance is a bug, just a issue highly related to one
that is a bug.  I don't think a margin of 2, or even 20, would help him.
It would just build a bigger time bomb with a longer fuse.  What he needs
is to turn fastupdate off, or get ssd, or get some other improvements that
aren't going to be backpatched.  If we don't know what setting to use to
fix one person's performance problem, I'd rather set it to something that
at least is know that it won't cause other people to have problems that
they didn't used to.


>
> > If we want to improve the current behavior rather than fix a bug, then I
> > think that if the list is greater than threshold*2 and the cleaning lock
> is
> > unavailable, what it should do is proceed to insert the tuple's keys into
> > the index itself, as if fastupdate = off.  That would require some major
> > surgery to the existing code, as by the time it invokes the clean up, it
> is
> > too late to not insert into the pending list.
>
> Meh.  That's introducing a whole new behavioral regime, which quite aside
> from correctness bugs might introduce new performance bugs of its own.
> It certainly doesn't sound like a better back-patch candidate than the
> other thing.
>

Right, the start of the paragraph was meant to transition from backpatch to
forward looking.

Continuing the forward looking part:

I've given up on fastupdate for 9.4, and turned it off for my indexes.  As
implemented it seems like a rather niche solution.  So it is kind of
unfortunate that it is on by default, and that there is no way to turn it
off except for each individual index separately.  Hopefully we can make it
less niche.  Or maybe the niche is what I (and apparently Mr. Kehlet)
are trying to do.

There seems to be two ways for fastupdate to help:

1) Let someone else do it, in the background.

That is pretty much useless from my perspective, because there is no way to
get someone else to actually do it as often as it is needed to be done.  I
can create an extension to expose the cleanup call to SQL, and then setup a
cron job (or a bgworker) to run that very frequently, or frequently poll to
decide it should be run.  That works, kind of, if this type of thing is
important enough for you to go through all that (It is not important enough
to me, for production use, at this point. I'd rather just set fastupdate to
off, but at least it is available if I need it).  Except, this is still a
serial job, and there is no way around that without turning fastupdate
off.  You can have a RAID of 30 disks, and the clean up process is still
going to have 1 IO outstanding at a time.  With 

Re: [HACKERS] [PROPOSAL] Table Partition

2015-08-31 Thread Amit Langote

Hello,

On 2015-08-30 PM 10:42, My Life wrote:
> 
> For partitioned table's scan action, and JOIN action, we implemented
> a plan node named 'PartitionExpand'. the plan node can expand the
> partitioned table scan node into a list of partitions according to
> the filter and conditions. and it can expand partitioned table JOIN
> node into a list of partitions JOIN node wisely.
> We implemented a DynamicPrunePartition method, which can expand the
> partitioned table's scan node into a list of partition's scan node.
> We implemented a DynamicPrunePartitionJoin method, which can expand
> the partitioned table's JOIN node into a list of partition's JOIN node.
> These expand action happend in ExecInitPartitionExpand function, when
> initialize the executor. and all these action implemented based on the
> partition catalog.
> 

In your design, can index scan be used for individual partition? If yes,
can you share how it is handled?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Alvaro Herrera
Bruce Momjian wrote:

> My hope is that many FDW improvements will benefit sharding and
> non-sharding workloads, but I bet some improvements are going to be
> sharding-specific.  I would say we are still in the exploratory stage,
> but based on the number of people who care about this feature and want
> to be involved, I think we are off to a very good start.  :-)

Having lots of interested people doesn't help with some problems,
though.  The Citus document says:

And the issue with these four limitations wasn't with foreign
data wrappers. We wrote mongo_fdw and cstore_fdw, and we're
quite happy with the contract FDWs provide. The problem was that
we were trying to retrofit an API for something that it was
fundamentally not designed to do.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Etsuro Fujita

On 2015/09/01 9:54, Bruce Momjian wrote:

On Mon, Aug 31, 2015 at 05:10:11PM -0700, Josh Berkus wrote:

As far as (3) is concerned, why
wouldn't we use the foreign data wrapper interface, and specifically
postgres_fdw?  That interface was designed for the explicit purpose of
allowing access to remote data sources, and a lot of work has been put
into it, so it would be highly surprising if we decided to throw that
away and develop something completely new from the ground up.


Well, query hooks are also a capability which we already have, and is
mature.  Citus has already posted about why they chose to use them instead.

As long as you recognize that the FDW API (not just the existing fdws)
will need to expand to make this work, it's a viable path.


Uh, we already have a list of things we need to add to FDWs to make them
work, and Citus Data has provided a document of more things that are
needed, https://goo.gl/vJWF85.


My top priority is postgres_fdw join pushdown, but I also plan to work 
on update pushdown [1] for 9.6, which couldn't make it into 9.5.  I 
think that would resolve the following issue mentioned in the document:


UPDATE and DELETE operations are performed by first fetching
records from the table scanning functions, and then going over
the fetched records. If the user wanted to update a single row,
this involved first pulling rows and then updating related
records.

Best regards,
Etsuro Fujita

[1] https://commitfest.postgresql.org/4/162/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Satoshi Nagayasu
Hi,

I have a question on jumbling queries in pg_stat_statements.

I found that JumbleRangeTable() uses relation oid in
RangeTblEntry.

Obviously, this would result different queryid when the table
gets re-created (dropped and created).

Why don't we use relation name (with looking up the catalog)
on query jumbling? For performance reason?

Regards,
-- 
NAGAYASU Satoshi 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Peter Geoghegan
On Mon, Aug 31, 2015 at 8:32 PM, Satoshi Nagayasu  wrote:
> Why don't we use relation name (with looking up the catalog)
> on query jumbling? For performance reason?

I think that there is a good case for preferring this behavior. While
it is a little confusing that pg_stat_statements does not change the
representative query string, renaming a table does not make it a
substantively different table.

There is, IIRC, one case where a string is jumbled directly (CTE
name). It's usually not the right thing, IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better detection of staled postmaster.pid

2015-08-31 Thread Tom Lane
Kevin Grittner  writes:
> Pavel Raiskup  wrote:
>> It's been reported [1] that postmaster fails to start against staled
>> postmaster.pid after (e.g.) power outage on Fedora,

> Was the other newly started process another PostgreSQL cluster?
> Was it launched under the same OS user?

Yes, that's what the bugzilla report indicated.

> (Those are the only
> conditions under which I've seen this.)  I think it is wise to use
> a separate OS user for each cluster.

That's my recommendation too.  The only other thing you could do to
prevent this would be to manually blow away postmaster.pid files,
and please listen to this: that cure is a lot worse than the disease.
It's almost never implemented safely (that is, in a way that guarantees
the forcible removal can *only* happen at system boot and never later).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding since-version tags to the docs?

2015-08-31 Thread Tom Lane
"Shulgin, Oleksandr"  writes:
> On Mon, Aug 31, 2015 at 4:01 PM, Tom Lane  wrote:
>> TBH, I think this is a horrid idea.  We occasionally manually add remarks
>> like "since version x.y, Postgres does this".  Inevitably, that just bulks
>> up the documentation; and it starts to look seriously silly in a few years
>> when x.y and all its predecessors are out of support.

> Well, I wouldn't name it outright silly: what's so bad about knowing that
> certain feature was there since 9.0, for example?

Right now, you might well care about whether a feature arrived in 9.3 vs
9.4, for instance; but it's highly unlikely that you care whether a
feature arrived in 7.1 or 7.2.  The problem with this proposal is that
it will add far more bloat of the latter sort than currently-useful
information; and the ratio will get worse over time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding since-version tags to the docs?

2015-08-31 Thread David G. Johnston
On Mon, Aug 31, 2015 at 10:39 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Mon, Aug 31, 2015 at 4:01 PM, Tom Lane  wrote:
>
>>
>> It'll be a real
>> mess if we do that for everything.
>>
>
> I share the fear that it could become messy, but it doesn't necessary
> *have to* be a mess.
>
>
​The extensions and SQL command sections are quite amenable to this kind of
note - and also have the lesser need since those pages are already
standalone and you can browse the versions at the top to figure out when
the page was added.  The function tables and, to a lesser extent, types are
likely to find the greater benefit to this but are also most likely to have
information overload.

I would be content with a policy that only version tags corresponding to
active releases be included and that tags pointing to older releases be
removed.  If we are consistent with the corresponding tagging and wording
the removal aspect can be fully automated.

I have interest in this but not enough to cobble together a patch
containing a sufficient number of examples that the community can review
and vote upon.

David J.


Re: [HACKERS] Adding since-version tags to the docs?

2015-08-31 Thread Andres Freund
On 2015-08-31 10:48:01 -0400, Tom Lane wrote:
> The problem with this proposal is that it will add far more bloat of
> the latter sort than currently-useful information; and the ratio will
> get worse over time.

If we add that information in sane way we should be able to remove it
automatically after de-supporting old versions.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.

2015-08-31 Thread Tomas Vondra

Hi,

On 08/31/2015 09:41 AM, Anastasia Lubennikova wrote:

Hi, hackers!
I'm going to begin work on effective storage of duplicate keys in B-tree
index.
The main idea is to implement posting lists and posting trees for B-tree
index pages as it's already done for GIN.

In a nutshell, effective storing of duplicates in GIN is organised as
follows.
Index stores single index tuple for each unique key. That index tuple
points to posting list which contains pointers to heap tuples (TIDs). If
too many rows having the same key, multiple pages are allocated for the
TIDs and these constitute so called posting tree.
You can find wonderful detailed descriptions in gin readme

and articles .
It also makes possible to apply compression algorithm to posting
list/tree and significantly decrease index size. Read more in
presentation (part 1)
.

Now new B-tree index tuple must be inserted for each table row that we
index.
It can possibly cause page split. Because of MVCC even unique index
could contain duplicates.
Storing duplicates in posting list/tree helps to avoid superfluous splits.

So it seems to be very useful improvement. Of course it requires a lot
of changes in B-tree implementation, so I need approval from community.


In general, index size is often a serious issue - cases where indexes 
need more space than tables are not quite uncommon in my experience. So 
I think the efforts to lower space requirements for indexes are good.


But if we introduce posting lists into btree indexes, how different are 
they from GIN? It seems to me that if I create a GIN index (using 
btree_gin), I do get mostly the same thing you propose, no?


Sure, there are differences - GIN indexes don't handle UNIQUE indexes, 
but the compression can only be effective when there are duplicate rows. 
So either the index is not UNIQUE (so the b-tree feature is not needed), 
or there are many updates.


Which brings me to the other benefit of btree indexes - they are 
designed for high concurrency. How much is this going to be affected by 
introducing the posting lists?


kind regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread Tomas Vondra



On 08/31/2015 12:54 PM, YUriy Zhuravlev wrote:

Hello hackers

Recently, we were given access to the test server is IBM, 9119-MHE with 8 CPUs
* 8 cores * 8 threads. We decided to take advantage of this and to find
bottlenecks for read scalability (pgbench -S).

All detail you can read here:
http://www.postgrespro.ru/blog/pgsql/2015/08/30/p8scaling

Performance 9.4 stopped growing after 100 clients, and 9.5 / 9.6 stopped after
150 (at 4 NUMA nodes). After research using pref we saw that inhibits
ProcArrayLock in GetSnaphotData. But inserting the stub instead of
GetSnapshotData not significantly increased scalability. Trying to find the
bottleneck with gdb, we found another place. We have noticed s_lock in
PinBuffer and UnpinBuffer. For the test we rewrited PinBuffer and UnpinBuffer
by atomic operations and we liked the result. Degradation of performance
almost completely disappeared, and went scaling up to 400 clients (4 NUMA
nodes with 256 "CPUs").

To scale Postgres for large NUMA machine must be ported to the atomic
operations bufmgr. During our tests, we no found errors in our patch, but most
likely it is not true and this patch only for test.

Who has any thoughts?


Well, I could test the patch on a x86 machine with 4 sockets (64 cores), 
but I wonder whether it makes sense at this point, as the patch really 
is not correct (judging by what Andres says).


Also, what pgbench scale was used for the testing?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread Andres Freund
On 2015-08-31 17:43:08 +0200, Tomas Vondra wrote:
> Well, I could test the patch on a x86 machine with 4 sockets (64 cores), but
> I wonder whether it makes sense at this point, as the patch really is not
> correct (judging by what Andres says).

Additionally it's, for default pgbench, really mostly a bottlneck after
GetSnapshotData() is fixed. You can make it a problem much earlier if
you have index nested loops over a lot of rows.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread Tomas Vondra



On 08/31/2015 05:48 PM, Andres Freund wrote:

On 2015-08-31 17:43:08 +0200, Tomas Vondra wrote:

Well, I could test the patch on a x86 machine with 4 sockets (64 cores), but
I wonder whether it makes sense at this point, as the patch really is not
correct (judging by what Andres says).


Additionally it's, for default pgbench, really mostly a bottlneck after
GetSnapshotData() is fixed. You can make it a problem much earlier if
you have index nested loops over a lot of rows.


[scratches head] So does this mean it's worth testing the patch on x86 
or not, in it's current state? Or should we come up with another test 
case, exercising the nested loops?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread Andres Freund
On 2015-08-31 17:54:17 +0200, Tomas Vondra wrote:
> [scratches head] So does this mean it's worth testing the patch on x86 or
> not, in it's current state?

You could try if you're interested. But I don't think it's super
meaningful. The patch is just a POC and rather widely incorrect.

Don't get me wrong, I think it's rather important that we fix this. But
that requires, imo, conceptual/development work right now, not
performance testing.

> Or should we come up with another test case, exercising the nested
> loops?

You'd need to do that, yes.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread Dmitry Vasilyev
We did not got any affect on core 64 with smt = 8, and we not have a 64
-cpu x86 machine with disable HT feature. 
You can set scale > 1000 and with shared_buffers >> size of index
pgbench_accounts_pkey.
You can also increase the concurrency: not only access top of b-tree
index, but also to a specific buffer: select * from pgbench_accounts
where aid = 1;


On Mon, 2015-08-31 at 17:43 +0200, Tomas Vondra wrote:
> 
> On 08/31/2015 12:54 PM, YUriy Zhuravlev wrote:
> > Hello hackers
> > 
> > Recently, we were given access to the test server is IBM, 9119-MHE
> > with 8 CPUs
> > * 8 cores * 8 threads. We decided to take advantage of this and to
> > find
> > bottlenecks for read scalability (pgbench -S).
> > 
> > All detail you can read here:
> > http://www.postgrespro.ru/blog/pgsql/2015/08/30/p8scaling
> > 
> > Performance 9.4 stopped growing after 100 clients, and 9.5 / 9.6
> > stopped after
> > 150 (at 4 NUMA nodes). After research using pref we saw that
> > inhibits
> > ProcArrayLock in GetSnaphotData. But inserting the stub instead of
> > GetSnapshotData not significantly increased scalability. Trying to
> > find the
> > bottleneck with gdb, we found another place. We have noticed s_lock
> > in
> > PinBuffer and UnpinBuffer. For the test we rewrited PinBuffer and
> > UnpinBuffer
> > by atomic operations and we liked the result. Degradation of
> > performance
> > almost completely disappeared, and went scaling up to 400 clients
> > (4 NUMA
> > nodes with 256 "CPUs").
> > 
> > To scale Postgres for large NUMA machine must be ported to the
> > atomic
> > operations bufmgr. During our tests, we no found errors in our
> > patch, but most
> > likely it is not true and this patch only for test.
> > 
> > Who has any thoughts?
> 
> Well, I could test the patch on a x86 machine with 4 sockets (64
> cores), 
> but I wonder whether it makes sense at this point, as the patch
> really 
> is not correct (judging by what Andres says).
> 
> Also, what pgbench scale was used for the testing?
> 
> regards
> 
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread YUriy Zhuravlev
On Monday 31 August 2015 17:54:17 Tomas Vondra wrote:
> So does this mean it's worth testing the patch on x86 
> or not, in it's current state?

Its realy intersting. But you need have true 64 cores without HT. (32 core +HT 
not have effect)

-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing latex-longtable value

2015-08-31 Thread Bruce Momjian
On Tue, Jul  7, 2015 at 03:21:50PM -0400, Bruce Momjian wrote:
> On Tue, Jul  7, 2015 at 01:05:09PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Tue, Jul  7, 2015 at 11:48:13AM -0400, Tom Lane wrote:
> > >> It's a bug.  Back-patch as needed.
> > 
> > > Doesn't that cause translation string differences that are worse than
> > > the original bug, e.g.:
> > >  psql_error("\\pset: allowed formats are unaligned, aligned, wrapped, 
> > > html, asciidoc, latex, latex-longtable, troff-ms\n");
> > 
> > No.  When we've discussed this sort of thing in the past, people have been
> > quite clear that they'd rather have accurate messages that come out in
> > English than inaccurate-though-localized messages.  Certainly we should
> > avoid gratuitous changes in back-branch localized strings, but this is a
> > factual error in the message.
> 
> OK, good to know.

Patch applied back through 9.3, when longtable was added.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable plan

2015-08-31 Thread Qingqing Zhou
On Thu, Aug 27, 2015 at 1:01 PM, Qingqing Zhou
 wrote:
> On Wed, Aug 26, 2015 at 5:28 PM, Tom Lane  wrote:
>>
>> After looking at the code a bit, IMO the most reasonable thing to do is to
>> include this transformation in inline_set_returning_functions(), perhaps
>> renaming it to something like inline_srfs_and_ctes().
>>
>
> This is essentially the same as my current implementation (revised
> patch attached):
>

I tried the method as Tom suggested (attached in previous email) but
still got the same issue - anybody see what I did wrong here? :-(

Thanks,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable plan

2015-08-31 Thread Andres Freund
On 2015-08-19 15:14:03 -0700, Josh Berkus wrote:
> Asking users to refactor their applications to add OFFSET 0 is a bit
> painful, if we could take care of it via a backwards-compatibility GUC.
>  We have many users who are specifically using the CTE optimization
> barrier to work around planner failures.

Agreed. I think we'll cause a lot of problems in migrations if we do
this unconditionally. I also think CTEs are a much cleaner optimization
barrier than OFFSET 0.

Some are probably going to hate me for this, but I think it'd be better
to change the grammar to something like
name opt_name_list AS '(' PreparableStmt ')' OPTIONS '(' cte_option_list ')'

and allow to specify 'inline' 'off'/'on'. The guc would simply change
the default value.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread YUriy Zhuravlev
On Monday 31 August 2015 17:48:50 Andres Freund wrote:
> Additionally it's, for default pgbench, really mostly a bottlneck after
> GetSnapshotData() is fixed. You can make it a problem much earlier if
> you have index nested loops over a lot of rows.
100 000 000 is a lot? Simple select query from pgbech is common task not for 
all but...
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-31 Thread Shulgin, Oleksandr
On Mon, Aug 31, 2015 at 12:35 PM, Pavel Stehule 
wrote:

>
>>>
>>> http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com
>>>
>>
>> Ah, thanks!  Somehow I've missed this mail.  You didn't add the patch to
>> a commitfest back then I think?
>>
>
> I had no time to finish this patch - there is few issues in signal
> handling and returning back result - but still I want it :) - and what I
> know - almost all other SQL db has similar functionality.
>

I've updated the patch for the current master and also added some
unexpected parameters handling, so attached is a v2.

I'd say we should hide the so-designed pg_cmdstatus() interface behind more
friendly calls like pg_explain_backend() and pg_backend_progress() to give
some naming examples, to remove the need for magic numbers in the second
arg.

What I've found missing in this approach is the insight into nested
executor runs, so that if you're running a "SELECT my_func()", you only see
this outer query in the pg_cmdstatus() output.  With the auto_explain
approach, by hooking into executor I was able to capture the nested queries
and their plans as well.

It's conceptually trivial to add some code to use the Executor hooks here,
but I don't see any precedent for this except for contrib modules
(auto_explain and pg_stat_statements), I'm just not sure if that would be
OK-ish.

And when we solve that, there is another problem of having a sane interface
to query the nested plans.  For a psql user, probably the most interesting
would be the topmost (level=1) and the innermost (e.g. level=-1) plans.  We
might also want to provide a full nesting of plans in a structured format
like JSON or... *cough* XML, for programs to consume and display nicely
with folding and stuff.

And the most interesting would be making instrumentation work with all of
the above.

I'm adding this to the next CF.

--
Alex
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 0abde43..40db40d 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -26,6 +26,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/cmdstatus.h"
 
 
 /*
@@ -296,6 +297,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_CMDSTATUS_INFO))
+		HandleCmdStatusInfoInterrupt();
+
 	if (set_latch_on_sigusr1)
 		SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ce4bdaf..5d5df58 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -67,6 +67,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/cmdstatus.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2991,6 +2992,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (CmdStatusInfoRequested)
+		ProcessCmdStatusInfoRequest();
 }
 
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 3ed0b44..2c8687c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -18,7 +18,7 @@ endif
 # keep this list arranged alphabetically or it gets to be a mess
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
-	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
+	bool.o cash.o char.o cmdstatus.o date.o datetime.o datum.o dbsize.o domains.o \
 	encode.o enum.o expandeddatum.o \
 	float.o format_type.o formatting.o genfile.o \
 	geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \
diff --git a/src/backend/utils/adt/cmdstatus.c b/src/backend/utils/adt/cmdstatus.c
new file mode 100644
index 000..38c1947
--- /dev/null
+++ b/src/backend/utils/adt/cmdstatus.c
@@ -0,0 +1,508 @@
+/*-
+ *
+ * cmdstatus.c
+ *	  Definitions for pg_cmdstatus function.
+ *
+ * Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/cmdstatus.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/htup_details.h"
+#include "commands/explain.h"
+#include "lib/stringinfo.h"
+#include "storage/latch.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/shmem.h"
+#include "tcop/dest.h"
+#include "tcop/pquery.h"
+#include "utils/builtins.h"
+#include "utils/cmdstatus.h"
+
+
+#define CMDINFO_SLOTS		100
+#define BUFFER_SIZE			(8 * 1024)
+
+bool CmdStatusInfoRequested = false;
+
+typedef struct {
+	bool	is_valid;
+	bool	is_done;
+	int		

[HACKERS] Should \o mean "everything?"

2015-08-31 Thread David Fetter
Folks,

In a failed attempt to send the output of \pset to a pipe, I noticed
that for reasons I find difficult to explain, not every output gets
redirected with \o.

At first blush, I'd consider this inconsistency as a bug.

What have I missed?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is "WIN32" #defined in Cygwin builds?

2015-08-31 Thread Tom Lane
I started wondering about $subject because we are fairly schizophrenic
about whether we believe this.  For example, only a few lines apart in
dirmod.c, there are

#if defined(WIN32) || defined(__CYGWIN__)

#if defined(WIN32) && !defined(__CYGWIN__)

Presumably, one of these could be simplified, but I'm not sure which.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-08-31 Thread dinesh kumar
Hi,

On Sat, Aug 29, 2015 at 4:22 PM, Pavel Stehule 
wrote:

> Hi
>
> I am starting to work review of this patch
>
> 2015-07-13 9:54 GMT+02:00 dinesh kumar :
>
>> Hi All,
>>
>> Greetings for the day.
>>
>> Would like to discuss on below feature here.
>>
>> Feature:
>> Having an SQL function, to write messages to log destination.
>>
>> Justification:
>> As of now, we don't have an SQL function to write custom/application
>> messages to log destination. We have "RAISE" clause which is controlled by
>> log_ parameters. If we have an SQL function which works irrespective of
>> log settings, that would be a good for many log parsers. What i mean is, in
>> DBA point of view, if we route all our native OS stats to log files in a
>> proper format, then we can have our log reporting tools to give most
>> effective reports. Also, Applications can log their own messages to
>> postgres log files, which can be monitored by DBAs too.
>>
>> Implementation:
>> Implemented a new function "pg_report_log" which takes one argument
>> as text, and returns void. I took, "LOG" prefix for all the reporting
>> messages.I wasn't sure to go with new prefix for this, since these are
>> normal LOG messages. Let me know, if i am wrong here.
>>
>> Here is the attached patch.
>>
>
> This patch is not complex, but the implementation doesn't cover a
> "ereport" well.
>
> Although this functionality should be replaced by custom function in any
> PL (now or near future), I am not against to have this function in core.
> There are lot of companies with strong resistance against stored procedures
> - and sometimes this functionality can help with SQL debugging.
>
> Issues:
>
> 1. Support only MESSAGE field in exception - I am expecting to support all
> fields: HINT, DETAIL, ...
>

Added these functionalities too.


> 2. Missing regress tests
>

Adding here.


> 3. the parsing ereport level should be public function shared with PLpgSQL
> and other PL
>

Sorry Pavel. I am not getting your point here. Would you give me an example.


> 4. should be hidestmt mandatory parameter?
>

I changed this argument's default value as "true".


> 5. the function declaration is strange
>
> postgres=# \sf pg_report_log (text, anyelement, boolean)
> CREATE OR REPLACE FUNCTION pg_catalog.pg_report_log(text, anyelement,
> boolean)
>  RETURNS void
>  LANGUAGE sql
>  STABLE STRICT COST 1
> AS $function$SELECT pg_report_log($1::pg_catalog.text,
> $2::pg_catalog.text, $3::boolean)$function$
>
> Why polymorphic? It is useless on any modern release
>
>
I took quote_ident(anyelement) as referral code, for implementing this.
Could you guide me if I am doing wrong here.


> postgres=# \sf pg_report_log (text, text, boolean)
> CREATE OR REPLACE FUNCTION pg_catalog.pg_report_log(text, text, boolean)
>  RETURNS void
>  LANGUAGE internal
>  IMMUTABLE STRICT
> AS $function$pg_report_log$function$
>
> Why stable, why immutable? This function should be volatile.
>
> Fixed these to volatile.


> 6. using elog level enum as errcode is wrong idea - errcodes are defined
> in table http://www.postgresql.org/docs/9.4/static/errcodes-appendix.html
>

You mean, if the elevel is 'ERROR', then we need to allow errcode. Let me
know your inputs.

Adding new patch, with the above fixes.

Thanks in advance.

Regards,
Dinesh

>
> Regards
>
> Pavel
>
>
>>
>> Regards,
>> Dinesh
>> manojadinesh.blogspot.com
>>
>
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b3b78d2..1ee8945 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17925,6 +17925,15 @@ postgres=# SELECT * FROM 
pg_xlogfile_name_offset(pg_stop_backup());
 Return information about a file.

   
+  
+   
+pg_report_log(eleveltext, 
message anyelement, 
ishidestmtboolean, detail 
text, hint text, context 
text)
+   
+   void
+   
+Write message into log file as per log level.
+   
+  
  
 

@@ -17993,6 +18002,24 @@ SELECT (pg_stat_file('filename')).modification;
 

 
+   
+pg_report_log
+   
+   
+pg_report_log is useful to write custom messages
+into current log destination and returns void.
+This function don't support the PANIC, FATAL log levels due to their 
unique internal DB usage, which may cause the database instability. Using 
ishidestmt, function can write or ignore the current SQL 
statement into the log file. Also, we can have DETAIL, HINT, CONTEXT log 
messages by provding detail, hint and 
context as function arguments. By default, all these parameter 
values are EMPTY.
+Typical usages include:
+
+postgres=# SELECT pg_report_log('NOTICE', 'Custom Message', true);
+NOTICE:  Custom Message
+ pg_report_log 
+---
+ 
+(1 row)
+
+   
+
   
 
   
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index ccc030f..4105252 100644
--- a/src/backend/catalog/system_views.sql
+++ 

Re: [HACKERS] Is "WIN32" #defined in Cygwin builds?

2015-08-31 Thread Andrew Dunstan



On 08/31/2015 02:21 PM, Tom Lane wrote:

I started wondering about $subject because we are fairly schizophrenic
about whether we believe this.  For example, only a few lines apart in
dirmod.c, there are

#if defined(WIN32) || defined(__CYGWIN__)

#if defined(WIN32) && !defined(__CYGWIN__)

Presumably, one of these could be simplified, but I'm not sure which.





No, and we've made sure not to do that ourselves, or at least I hope we 
have. Here's what the cygwin compiler says:


   andrew@antonio ~/bf
   $ touch xx.c

   andrew@antonio ~/bf
   $ gcc -E -dM xx.c | grep WIN
   #define __WINT_MAX__ 4294967295U
   #define __WINT_MIN__ 0U
   #define __SIZEOF_WINT_T__ 4
   #define __CYGWIN__ 1
   #define __WINT_TYPE__ unsigned int
   #define __CYGWIN32__ 1

   andrew@antonio ~/bf


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Mason S
>
>
>  We also a bit disappointed by Huawei position about CSN patch, we hoped
> to use for  our XTM.
>

Disappointed in what way? Moving to some sort of CSN approach seems to open
things up for different future ideas. In the short term, it would mean
replacing potentially large snapshots and longer visibility checks. In the
long term, perhaps CSN could help simplify the design of multi-master
replication schemes.


> FDW approach has been actively criticized by pg_shard people and that's
> also made me a bit suspicious.  It looks like  we are doomed to continue
> several development forks, so we decided to work on very important common
> project, XTM, which we hoped could be accepted by all parties and
> eventually committed to 9.6.  Now I see we were right, unfortunately.
>
>
I think the original XC project probably would have taken the FDW approach
as a basis if it had existed, with focus on push-down optimizations.

I assume that future work around PG sharding probably would be more likely
to be accepted with the FDW approach. One could perhaps work on pushing
down joins, aggregates and order by, then look at any optimizations gained
if code is moved outside of FDW.  It would make sense if some kind of
generic optimization for foreign tables for SQL-based sources could be
leveraged across all databases, rather than having to re-implement for each
FDW.

There are different approaches and related features that may need to be
improved.

Do we want multiple copies of shards, like the pg_shard approach? Or keep
things simpler and leave it up to the DBA to add standbys?

Do we want to leverage table inheritance? If so, we may want to spend time
improving performance for when the number of shards becomes large with what
currently exists. If using table inheritance, we could add the ability to
specify what node (er, foreign server) the subtable lives on. We could
create top level sharding expressions that allow these to be implicitly
created.

Should we allow arbitrary expressions for shards, not just range, list and
hash?

Maybe the most community-acceptable approach would look something like

- Use FDWs, and continue to optimize push-down operations, also for
non-PostgreSQL databases.

- Use table inheritance for defining the shards. Ideally allow for
specifying that some shards may be replicated to other foreign servers (and
itself) (for pushing down joins with lookup/static tables; at this point it
should be decent for star schema based data warehouses).

- XTM/GTM hooks. Preferably we move to CSN for snapshots in core PostgreSQL
though.

Longer term, efficient internode joins would require a lot more work.

The devil is in the details. There are things that have to be addressed,
for example, if using global XIDs via GTM, not every transaction is on
every node, so we need to make sure that new clog pages get added
properly.  There is also the potential to require a lot more code to be
added, like for cursor handling and stored functions. Perhaps some
limitations when using shards to foreign servers are acceptable if it is
desired to minimize code changes.  XC and XL code help.

Regards,

Mason


[HACKERS] Anybody have icc for IA64?

2015-08-31 Thread Tom Lane
After pushing 2c713d6e, I realized that the buildfarm isn't going to tell
me anything useful about it, because the change only makes a difference
for icc on ia64, and we have no such members in the buildfarm.  (We've
got icc, and we've got ia64, but not both in the same place.)

It's fairly worrisome that we don't have this case covered, considering
that there's a whole bunch of bespoke code for the case.  Now, I'm trying
to remove most of the special-case code, but it's hard to be sure if this
will work --- maybe icc is only compatible with gcc asm on x86[_64] and
not on ia64 :-(

Anyway, if anyone could see whether HEAD still works on such a platform,
I'd appreciate it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is "WIN32" #defined in Cygwin builds?

2015-08-31 Thread Tom Lane
Andrew Dunstan  writes:
> On 08/31/2015 02:21 PM, Tom Lane wrote:
>> I started wondering about $subject because we are fairly schizophrenic
>> about whether we believe this.

> No, and we've made sure not to do that ourselves, or at least I hope we 
> have.

OK, thanks.  I was wondering whether I'd broken the Cygwin build with
a65e0864, but it should be okay.

Not sure if it's worth trying to clean up the #if tests that are redundant
given this knowledge.  It looks like there's about a dozen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Alexander Korotkov
On Mon, Aug 31, 2015 at 9:48 PM, Mason S  wrote:

>
>>  We also a bit disappointed by Huawei position about CSN patch, we hoped
>> to use for  our XTM.
>>
>
> Disappointed in what way? Moving to some sort of CSN approach seems to
> open things up for different future ideas. In the short term, it would mean
> replacing potentially large snapshots and longer visibility checks. In the
> long term, perhaps CSN could help simplify the design of multi-master
> replication schemes.
>

We are disappointed because at PGCon talk Huawei announced publishing of
their CSN patch and further work in this direction together with community.
However, it's even not published yet despite all the promises. Nobody from
Huawei answers CSN thread in the hackers.
So, I think we got nothing from Huawei except teasing and should rely only
on ourselves. That is disappointing.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Anybody have icc for IA64?

2015-08-31 Thread Alvaro Herrera
Tom Lane wrote:
> After pushing 2c713d6e, I realized that the buildfarm isn't going to tell
> me anything useful about it, because the change only makes a difference
> for icc on ia64, and we have no such members in the buildfarm.  (We've
> got icc, and we've got ia64, but not both in the same place.)
> 
> It's fairly worrisome that we don't have this case covered, considering
> that there's a whole bunch of bespoke code for the case.  Now, I'm trying
> to remove most of the special-case code, but it's hard to be sure if this
> will work --- maybe icc is only compatible with gcc asm on x86[_64] and
> not on ia64 :-(
> 
> Anyway, if anyone could see whether HEAD still works on such a platform,
> I'd appreciate it.

I have pinged the EDB team that maintains the only IA64 machine we have.
It runs HP-UX.  I'm not sure how easy it is to get Intel's compiler
there ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Information of pg_stat_ssl visible to all users

2015-08-31 Thread Peter Eisentraut
On 8/31/15 9:13 AM, Andres Freund wrote:
> I'm just saying that we should strive to behave at least somewhat
> consistently, and change everything at once, not piecemal. Because the
> latter will not decrease the pain of migrating to a new model in a
> relevant way while making the system harder to understand.

Well, we already hide a fair chunk of information from pg_stat_activity
from unprivileged users, including everything related to the connection
origin of other users.  So from that precedent, the entire SSL
information ought to be considered privileged.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should \o mean "everything?"

2015-08-31 Thread Kevin Grittner
David Fetter  wrote:

> In a failed attempt to send the output of \pset to a pipe, I
> noticed that for reasons I find difficult to explain, not every
> output gets redirected with \o.
>
> At first blush, I'd consider this inconsistency as a bug.
>
> What have I missed?

The documentation says:

| Arranges to save future query results to the file filename or
| pipe future results to the shell command command. If no argument
| is specified, the query output is reset to the standard output.
|
| "Query results" includes all tables, command responses, and
| notices obtained from the database server, as well as output of
| various backslash commands that query the database (such as \d),
| but not error messages.

Are you seeing anything inconsistent with the documentation?  If
so, what?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Qingqing Zhou
On Mon, Aug 31, 2015 at 2:12 AM, Oleg Bartunov  wrote:
>
> AFAIK, XC/XL has already some customers and that is an additional pressure
> on their development team, which is now called X2. I don't exactly know how
> internal Huawei's MPPDB is connected to XC/XL.
>

Huawei's MPPDB is based on PG-XC and tailored it more targeting OLAP scenarios.

The basic idea is that OLAP needs a shared nothing scale out
architecture for read and write. It needs ok-TP-performance, a
restricted set of functionality, and thus avoids some problems like
GTM being a central scaling bottleneck.

I advocate to merge PostgreSQL core with scale-out features, if we are
ready to face some long time functional discrepancies between the two
deployments.

Regards,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should \o mean "everything?"

2015-08-31 Thread David Fetter
On Mon, Aug 31, 2015 at 07:18:02PM +, Kevin Grittner wrote:
> David Fetter  wrote:
> 
> > In a failed attempt to send the output of \pset to a pipe, I
> > noticed that for reasons I find difficult to explain, not every
> > output gets redirected with \o.
> >
> > At first blush, I'd consider this inconsistency as a bug.
> >
> > What have I missed?
> 
> The documentation says:
> 
> | Arranges to save future query results to the file filename or
> | pipe future results to the shell command command. If no argument
> | is specified, the query output is reset to the standard output.
> |
> | "Query results" includes all tables, command responses, and
> | notices obtained from the database server, as well as output of
> | various backslash commands that query the database (such as \d),
> | but not error messages.
> 
> Are you seeing anything inconsistent with the documentation?  If
> so, what?

The documentation doesn't specify that a proper subset of the
backslash commands go through, and doesn't enumerate either the ones
that do or the ones that don't.

The way I see it, the fact that backslash commands aren't treated
consistently here would be a bug even if the inconsistency were
documented, which it is only vaguely by implication.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Josh Berkus
All, Bruce:

First, let me put out there that I think the horizontal scaling project
which has buy-in from the community and we're working on is infinitely
better than the one we're not working on or is an underresourced fork.
So we're in agreement on that.  However, I think there's a lot of room
for discussion; I feel like the FDW approach was decided in exclusive
meetings involving a very small number of people.  The FDW approach
*may* be the right approach, but I'd like to see some rigorous
questioning of that before it's final.

Particularly, I'm concerned that we already have two projects in process
aimed at horizontal scalability, and it seems like we could bring either
(or both) projects to production quality MUCH faster than we could make
an FDW-based solution work.  These are:

* pg_shard
* BDR

It seems worthwhile, just as a thought experiment, if we can get where
we want using those, faster, or by combining those with new FDW features.

It's also important to recognize that there are three major use-cases
for write-scalable clustering:

* OLTP: small-medium cluster, absolute ACID consistency,
  bottlnecked on small writes per second
* DW: small-large cluster, ACID optional,
  bottlenecked on bulk reads/writes
* Web: medium to very large cluster, ACID optional,
  bottlenecked on # of connections

We cannot possibly solve all of the above at once, but to the extent
that we recognize all 3 use cases, we can build core features which can
be adapted to all of them.

I'm also going to pontificate that, for a future solution, we should not
focus on write *IO*, but rather on CPU and RAM. The reason for this
thinking is that, with the latest improvements in hardware and 9.5
improvements, it's increasingly rare for machines to be bottlenecked on
writes to the transaction log (or the heap). This has some implications
for system design.  For example, solutions which require all connections
to go through a single master node do not scale sufficiently to be worth
bothering with.

On some other questions from Mason:

> Do we want multiple copies of shards, like the pg_shard approach? Or
> keep things simpler and leave it up to the DBA to add standbys? 

We want multiple copies of shards created by the sharding system itself.
 Having a separate, and completely orthagonal, redundancy system to the
sharding system is overly burdensome on the DBA and makes low-data-loss
HA impossible.

> Do we want to leverage table inheritance? If so, we may want to spend
> time improving performance for when the number of shards becomes large
> with what currently exists. If using table inheritance, we could add the
> ability to specify what node (er, foreign server) the subtable lives on.
> We could create top level sharding expressions that allow these to be
> implicitly created.

IMHO, given that we're looking at replacing inheritance because of its
many documented limitations, building sharding on top of inheritance
seems unwise.  For example, many sharding systems are hash-based; how
would an inheritance system transparently use hash keys?

> Should we allow arbitrary expressions for shards, not just range, list
> and hash?

That seems like a 2.0 feature.  It also doesn't seem necessary to
support it for the moderately skilled user; that is, requiring a special
C sharding function for this seems fine to me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Table Partition

2015-08-31 Thread My Life
> There  is already a recent proposal on hackers about partition support in  
> PostgreSQL by Amit Langote. 
> You will find the thread at 
> http://www.postgresql.org/message-id/55d3093c.5010...@lab.ntt.co.jp.

Actually, I have seen this design before, and it was not just a design, it has 
been implemented. I agree with it although I still have some reservations, 
because I think it is a little more complicated.
1. you store all partition info into 2 system catalogs: pg_partitioned_rel, 
pg_partition. it will be less efficient to access and maintain the partition 
info, include scan, add, delete, modify action, maybe concurrency. And the data 
volumn will get larger and larger.
2. the column 'anyarray partrangebounds' in pg_partition is not very accurate, 
and we cannot use the index access operators associated with some data type.

> In your design, can index scan be used for individual partition? If yes,
> can you share how it is handled?
of course, index scan can be used for individual partition.
because we make a unique constraint on partkey1, ..., partkeyN on 
pg_partition.pg_partition_2586.
the unique constraint is really a unique index.
if a index scan's index condition involved the partkey, we can use this unique 
index to choose the matched partitions in a specified order. and expand the 
partitioned table's index scan node into a list of partition's index scan node. 
In this way, the 'PartitionExpand' plan node likes the 'Append' plan node, but 
has some difference.


-- Original --
From: "Ashutosh Bapat";;
Date: Mon, Aug 31, 2015 02:43 PM
To: "My Life"; 
Copy: "pgsql-hackers";  
"tgl"; "bruce";  
"robertmhaas"; 
Subject: Re: [HACKERS] Proposal of Table Partition



There  is already a recent proposal on hackers about partition support in  
PostgreSQL by Amit Langote. You will find the thread at 
http://www.postgresql.org/message-id/55d3093c.5010...@lab.ntt.co.jp. May be you 
can collaborate with the ongoing work.






-- Original --
From:  "Amit Langote";;
Date:  Mon, Aug 31, 2015 03:14 PM
To:  "My Life"; 
"pgsql-hackers"; 

Subject:  Re: [HACKERS] [PROPOSAL] Table Partition




Hello,

On 2015-08-30 PM 10:42, My Life wrote:
> 
> For partitioned table's scan action, and JOIN action, we implemented
> a plan node named 'PartitionExpand'. the plan node can expand the
> partitioned table scan node into a list of partitions according to
> the filter and conditions. and it can expand partitioned table JOIN
> node into a list of partitions JOIN node wisely.
> We implemented a DynamicPrunePartition method, which can expand the
> partitioned table's scan node into a list of partition's scan node.
> We implemented a DynamicPrunePartitionJoin method, which can expand
> the partitioned table's JOIN node into a list of partition's JOIN node.
> These expand action happend in ExecInitPartitionExpand function, when
> initialize the executor. and all these action implemented based on the
> partition catalog.
> 

In your design, can index scan be used for individual partition? If yes,
can you share how it is handled?

Thanks,
Amit

Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Joshua D. Drake

On 08/31/2015 01:16 PM, Josh Berkus wrote:

All, Bruce:




I'm also going to pontificate that, for a future solution, we should not
focus on write *IO*, but rather on CPU and RAM. The reason for this
thinking is that, with the latest improvements in hardware and 9.5
improvements, it's increasingly rare for machines to be bottlenecked on
writes to the transaction log (or the heap). This has some implications
for system design.  For example, solutions which require all connections
to go through a single master node do not scale sufficiently to be worth
bothering with.


We see this already, under very high concurrency (lots of connections, 
many cores) we often see a significant drop in performance that is not 
related to IO in any meaningful way.


JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should \o mean "everything?"

2015-08-31 Thread David Fetter
On Mon, Aug 31, 2015 at 07:18:02PM +, Kevin Grittner wrote:
> David Fetter  wrote:
> 
> > In a failed attempt to send the output of \pset to a pipe, I
> > noticed that for reasons I find difficult to explain, not every
> > output gets redirected with \o.
> >
> > At first blush, I'd consider this inconsistency as a bug.
> >
> > What have I missed?
> 
> The documentation says:
> 
> | Arranges to save future query results to the file filename or
> | pipe future results to the shell command command. If no argument
> | is specified, the query output is reset to the standard output.
> |
> | "Query results" includes all tables, command responses, and
> | notices obtained from the database server, as well as output of
> | various backslash commands that query the database (such as \d),
> | but not error messages.
> 
> Are you seeing anything inconsistent with the documentation?  If
> so, what?

Perhaps an example would help clarify...

postgres=# \o | perl -pE 's/^/PREFIXED!/'
postgres=# \dt
postgres=# PREFIXED!No relations found.

postgres=# \set 
AUTOCOMMIT = 'on'
ON_ERROR_STOP = ''
PROMPT1 = '%/%R%# '
PROMPT2 = '%/%R%# '
PROMPT3 = '>> '
VERBOSITY = 'default'
VERSION = 'PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc 
(Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit'
DBNAME = 'postgres'
USER = 'postgres'
HOST = '/var/run/postgresql'
PORT = '5432'
ENCODING = 'UTF8'
PSQL_EDITOR = '"/usr/local/bin/vim"'

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump & restore

2015-08-31 Thread Bruce Momjian
On Tue, Jul 28, 2015 at 04:23:36PM -0300, Alvaro Herrera wrote:
> Josh Berkus wrote:
> > On 07/28/2015 11:58 AM, Robert Haas wrote:
> > > I'd be strongly in favour of teaching GRANT, SECURITY LABEL, COMMENT
> > >> ON DATABASE, etc to recognise CURRENT_DATABASE as a keyword. Then
> > >> dumping them in pg_dump --create, and in pg_dump -Fc .
> > >>
> > >> In practice I see zero real use of pg_dumpall without --globals-only,
> > >> and almost everyone does pg_dump -Fc . I'd like to see that method
> > >> case actually preserve the whole state of the system and do the right
> > >> thing sensibly.
> > >>
> > >> A pg_restore option to skip database-level settings could be useful,
> > >> but I think by default they should be restored.
> > 
> > +1
> > 
> > Let's get rid of pg_dumpall -g.
> 
> Quite the opposite, I think --- let's get rid of pg_dumpall EXCEPT when
> invoked as pg_dumpall -g.

Is this a TODO?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Robert Haas
On Mon, Aug 31, 2015 at 4:16 PM, Josh Berkus  wrote:
> First, let me put out there that I think the horizontal scaling project
> which has buy-in from the community and we're working on is infinitely
> better than the one we're not working on or is an underresourced fork.
> So we're in agreement on that.  However, I think there's a lot of room
> for discussion; I feel like the FDW approach was decided in exclusive
> meetings involving a very small number of people.  The FDW approach
> *may* be the right approach, but I'd like to see some rigorous
> questioning of that before it's final.

It seems to me that sharding consists of (1) breaking your data set up
into shards, (2) possibly replicating some of those shards onto
multiple machines, and then (3) being able to access the remote data
from local queries.   As far as (1) is concerned, we need declarative
partitioning, which is being worked on by Amit Langote.  As far as (2)
is concerned, I hope and expect BDR, or technology derived therefrom,
to eventually fill that need.  As far as (3) is concerned, why
wouldn't we use the foreign data wrapper interface, and specifically
postgres_fdw?  That interface was designed for the explicit purpose of
allowing access to remote data sources, and a lot of work has been put
into it, so it would be highly surprising if we decided to throw that
away and develop something completely new from the ground up.

It's true that postgres_fdw doesn't do everything we need yet.  The
new join pushdown hooks aren't used by postgres_fdw yet, and the API
itself has some bugs with EvalPlanQual handling.  Aggregate pushdown
is waiting on upper planner path-ification.   DML pushdown doesn't
exist yet, and the hooks that would enable pushdown of ORDER BY
clauses to the remote side aren't being used by postgres_fdw.  But all
of these things have been worked on.  Patches for many of them have
already been posted.  They have suffered from a certain amount of
neglect by senior hackers, and perhaps also from a shortage of time on
the part of the authors.  But an awful lot of the work that is needed
here has already been done, if only we could get it committed.
Aggregate pushdown is a notable exception, but abandoning the foreign
data wrapper approach in favor of something else won't fix that.

Postgres-XC developed a purpose-built system for talking to other
nodes instead of using the FDW interface, for the very good reason
that the FDW interface did not yet exist at the time that Postgres-XC
was created.  But several people associated with the XC project have
said, including one on this thread, that if it had existed, they
probably would have used it.  And it's hard to see why you wouldn't:
with XC's approach, the remote data source is presumed to be
PostgreSQL (or Postgres-XC/XL/X2/whatever); and you can only use the
facility as part of a sharding solution.  The FDW interface can talk
to anything, and it can be used for stuff other than sharding, like
making one remote table appear local because you just happen to want
that for some reason.  This makes the XC approach look rather brittle
by comparison.  I don't blame the XC folks for taking the shortest
path between two points, but FDWs are better, and we ought to try to
leverage that.

> Particularly, I'm concerned that we already have two projects in process
> aimed at horizontal scalability, and it seems like we could bring either
> (or both) projects to production quality MUCH faster than we could make
> an FDW-based solution work.  These are:
>
> * pg_shard
> * BDR
>
> It seems worthwhile, just as a thought experiment, if we can get where
> we want using those, faster, or by combining those with new FDW features.

I think it's abundantly clear that we need a logical replication
solution as part of any horizontal scalability story.  People will
want to do things like have 10 machines with each piece of data on 3
of them, and there won't be any reasonable way of doing that without
logical replication.  I assume that BDR, or some technology derived
from it, will end up in core and solve that problem.  I had actually
hoped we were going to get that in 9.5, but it didn't happen that way.
Still, I think that getting first single-master, and then eventually
multi-master, logical replication in core is absolutely critical.  And
not just for sharding specifically: replicating your whole database to
several nodes and load-balancing your clients across them isn't
sharding, but it does give you read scalability and is a good fit for
people with geographically dispersed data with good geographical
locality.  I think a lot of people will want that.

I'm not quite sure yet how we can marry declarative partitioning and
better FDW-pushdown and logical replication into one seamless, easy to
deploy solution that requires very low administrator effort.  But I am
sure that each of those things, taken individually, is very useful,
and that being able to construct a solution from those 

Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread Bruce Momjian
On Tue, Jul 14, 2015 at 09:48:59AM -0700, Smitha Pamujula wrote:
> This error will go away only if I install the new json_build94.
> 

> I was under the impression that we dont need to get the json_build
> libraries for 94. But the upgrade wont go forward without that. Are we
> missing doing something here or is it necessarty to get json_build94
> before upgrade.

I am coming very late to this discussion, but I do have some
information.  Basically, pg_upgrade is trying to predict if the
dump/restore will fail by checking references to shared objects in the
old cluster.  It sees a reference to json_build in the old cluster and
assumes this is needed in the new cluster, when in fact it isn't.

We could have hard-coded this knowledge into 9.4 pg_upgrade if we had
known it.  The simple solution is to install json_build94 as you did,
run pg_upgrade, then just uninstall json_build94 as nothing depends on
it.  Not sure if this should be in the pg_upgrade docs or not.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2015-08-31 Thread Alexander Korotkov
Hackers,

there is next revision of patches providing access method extendability.
Now it's based on another patch which reworks access method interface.
http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com

Besides access method interface, major change is generic xlog interface.
Now, generic xlog interface is more user friendly. Generic xlog compares
initial and changed versions of page by itself. The only thing it can't do
is to find data moves inside page, because it would be too high overhead.
So in order to get compact WAL records one should use
GenericXLogMemmove(dst, src, size) in order to move data inside page. If
this function wasn't used then WAL records would just not so compact.

In general pattern of generic WAL usage is following.

1) Start using generic WAL: specify relation

GenericXLogStart(index);

2) Register buffers

GenericXLogRegister(0, buffer1, false);
GenericXLogRegister(1, buffer2, true);

first argument is a slot number, second is the buffer, third is flag
indicating new buffer

3) Do changes in the pages. Use GenericXLogMemmove() if needed.

4) Finish using GenericXLogFinish(), or abort using GenericXLogAbort(). In
the case of abort initial state of pages will be reverted.

Generic xlog takes care about critical section, unlogged relation, setting
lsn, making buffer dirty. User code is just simple and clear.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


create-am.2.patch.gz
Description: GNU Zip compressed data


generic-xlog.2.patch.gz
Description: GNU Zip compressed data


bloom-contrib.2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread Smitha Pamujula
Thank you Bruce. So far installing it before have been working well so we
will continue with that plan.

I think it would help if its noted somewhere in the document as it would
have helped us save some time understanding why it was failing and why it
was looking for json_build.



On Mon, Aug 31, 2015 at 3:18 PM, Bruce Momjian  wrote:

> On Tue, Jul 14, 2015 at 09:48:59AM -0700, Smitha Pamujula wrote:
> > This error will go away only if I install the new json_build94.
> > 
>
> > I was under the impression that we dont need to get the json_build
> > libraries for 94. But the upgrade wont go forward without that. Are we
> > missing doing something here or is it necessarty to get json_build94
> > before upgrade.
>
> I am coming very late to this discussion, but I do have some
> information.  Basically, pg_upgrade is trying to predict if the
> dump/restore will fail by checking references to shared objects in the
> old cluster.  It sees a reference to json_build in the old cluster and
> assumes this is needed in the new cluster, when in fact it isn't.
>
> We could have hard-coded this knowledge into 9.4 pg_upgrade if we had
> known it.  The simple solution is to install json_build94 as you did,
> run pg_upgrade, then just uninstall json_build94 as nothing depends on
> it.  Not sure if this should be in the pg_upgrade docs or not.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
>   + Everyone has their own god. +
>



-- 
Smitha Pamujula
Database Administrator // The Watch Woman

Direct: 503.943.6764
Mobile: 503.290.6214 // Twitter: iovation
www.iovation.com


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread Bruce Momjian
On Mon, Aug 31, 2015 at 04:03:20PM -0700, Smitha Pamujula wrote:
> Thank you Bruce. So far installing it before have been working well so we will
> continue with that plan. 
> 
> I think it would help if its noted somewhere in the document as it would have
> helped us save some time understanding why it was failing and why it was
> looking for json_build. 

The problem is that this is a rare case where you had an extension that
was later included in Postgres.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread David E. Wheeler
On Aug 31, 2015, at 4:20 PM, Bruce Momjian  wrote:

>> I think it would help if its noted somewhere in the document as it would have
>> helped us save some time understanding why it was failing and why it was
>> looking for json_build. 
> 
> The problem is that this is a rare case where you had an extension that
> was later included in Postgres.

Maybe not so rare. Thanks to Andrew, we’ve had to do this for both 9.2-9.3 
(json_object) and 9.3-9.4 (json_build).

Best,

David



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread Andrew Dunstan



On 08/31/2015 07:21 PM, David E. Wheeler wrote:

On Aug 31, 2015, at 4:20 PM, Bruce Momjian  wrote:


I think it would help if its noted somewhere in the document as it would have
helped us save some time understanding why it was failing and why it was
looking for json_build.

The problem is that this is a rare case where you had an extension that
was later included in Postgres.

Maybe not so rare. Thanks to Andrew, we’ve had to do this for both 9.2-9.3 
(json_object) and 9.3-9.4 (json_build).




Yeah, a lot of people don't like to wait for new stuff. :-)

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Sumedh Pathak
Hi Bruce,

Sumedh from Citus Data here.

> August, 2015:  While speaking at SFPUG, Citus Data approached me about joining
the FDW sharding team.  They have been invited to the September 1 meeting,
as have the XC and XL people.

I'd like to add a clarification. We already tried the FDW APIs for pg_shard
two years ago and failed. We figured sharing our learnings could contribute
to the technical discussion and that's why we wanted to be in the call.

Ozgun summarized our technical learnings in this design document:
https://goo.gl/vJWF85

In the document, we focused on one of the four learnings we had with FDW
APIs. For us, we switched to the hook API based approach, and things went
smoothly from there.

Best,
Sumedh

On Sat, Aug 29, 2015 at 7:17 PM, Bruce Momjian  wrote:

> I have recently increased my public statements about the idea of adding
> horizontal scaling/sharding to Postgres. I wanted to share with hackers
> a timeline of how we got here, and where I think we are going in the
> short term:
>
> 2012-2013:  As part of writing my scaling talk
> (http://momjian.us/main/presentations/overview.html#scaling), studying
> Oracle RAC, and talking to users, it became clear that an XC-like
> architecture (sharding) was the only architecture that was going to allow
> for write scaling.
>
> Users and conference attendees I talked to were increasingly concerned
> about the ability of Postgres to scale for high write volumes.  They didn't
> necessarily need that scale now, but they needed to know they could get
> it if they wanted it, and wouldn't need to switch to a new database in
> the future.  This is similar to wanting a car that can get you on a highway
> on-ramp fast --- even if you don't need it, you want to know it is there.
>
> 2014:  I started to shop around the idea that we could use FDWs,
> parallelism, and a transaction/snapshot manager to get XC features
> as built-in to Postgres.  (I don't remember where the original idea
> came from.)  It was clear that having separate forks of the source code
> in XC and XL was never going to achieve critical mass --- there just
> aren't enough people who need high right scale right now, and the fork
> maintenance overhead is a huge burden.
>
> I realized that we would never get community acceptance to dump the XC
> (or XL) code needed for sharding into community Postgres, but with FDWs,
> we could add the features as _part_ of improving FDWs, which would benefit
> FDWs _and_ would be useful for sharding.  (We already see some of those
> FDW features in 9.5.)
>
> October, 2014:  EDB and NTT started working together in the community
> to start improving FDWs as a basis for an FDW-based sharding solution.
> Many of the 9.5 FDW improvements that also benefit sharding were developed
> by a combined EDB/NTT team.  The features improved FDWs independent of
> sharding, so they didn't need community buy-in on sharding to get them
> accepted.
>
> June, 2015:  I attended the PGCon sharding unconference session and
> there was a huge discussion about where we should go with sharding.
> I think the big take-away was that most people liked the FDW approach,
> but had business/customer reasons for wanting to work on XC or XL because
> those would be production-ready faster.
>
> July, 2015:  Oleg Bartunov and his new company Postgres Professional (PP)
> started to think about joining the FDW approach, rather than working on
> XL, as they had stated at PGCon in June.  A joint NTT/EDB/PP phone-in
> meeting is scheduled for September 1.
>
> August, 2015:  While speaking at SFPUG, Citus Data approached me about
> joining the FDW sharding team.  They have been invited to the September
> 1 meeting, as have the XC and XL people.
>
> October, 2015:  EDB is sponsoring a free 3-hour summit about FDW sharding
> at the PG-EU conference in Vienna.   Everyone is invited, but it is hoped
> most of the September 1 folks can attend.
>
> February, 2016:  Oleg is planning a similar meeting at their February
> Moscow conference.
>
> Anyway, I wanted to explain the work that has been happening around
> sharding.  As things move forward, I am increasingly convinced that write
> scaling will be needed soon, that the XC approach is the only reasonable
> way to do it, and that FDWs are the cleanest way to get it into community
> Postgres.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
>   + Everyone has their own god. +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
*Sumedh Pathak*
Citus Data
650.422.9797
sum...@citusdata.com


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread Bruce Momjian
On Mon, Aug 31, 2015 at 07:28:00PM -0400, Andrew Dunstan wrote:
> 
> 
> On 08/31/2015 07:21 PM, David E. Wheeler wrote:
> >On Aug 31, 2015, at 4:20 PM, Bruce Momjian  wrote:
> >
> >>>I think it would help if its noted somewhere in the document as it would 
> >>>have
> >>>helped us save some time understanding why it was failing and why it was
> >>>looking for json_build.
> >>The problem is that this is a rare case where you had an extension that
> >>was later included in Postgres.
> >Maybe not so rare. Thanks to Andrew, we’ve had to do this for both 9.2-9.3 
> >(json_object) and 9.3-9.4 (json_build).
> >
> 
> 
> Yeah, a lot of people don't like to wait for new stuff. :-)

It might make the most sense to mention this method in the release notes
of the extension.  However, I assume they are not using the extension in
the new server so their is no release to look at.

Still, I don't know how many people are doing this, but the right fix is
to get the names of the modules that are superceeded and tell pg_upgrade
to skip them.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread Smitha Pamujula
pg_upgrade skipping the modules makes the most sense to me as well.

On Mon, Aug 31, 2015 at 4:32 PM, Bruce Momjian  wrote:

> On Mon, Aug 31, 2015 at 07:28:00PM -0400, Andrew Dunstan wrote:
> >
> >
> > On 08/31/2015 07:21 PM, David E. Wheeler wrote:
> > >On Aug 31, 2015, at 4:20 PM, Bruce Momjian  wrote:
> > >
> > >>>I think it would help if its noted somewhere in the document as it
> would have
> > >>>helped us save some time understanding why it was failing and why it
> was
> > >>>looking for json_build.
> > >>The problem is that this is a rare case where you had an extension that
> > >>was later included in Postgres.
> > >Maybe not so rare. Thanks to Andrew, we’ve had to do this for both
> 9.2-9.3 (json_object) and 9.3-9.4 (json_build).
> > >
> >
> >
> > Yeah, a lot of people don't like to wait for new stuff. :-)
>
> It might make the most sense to mention this method in the release notes
> of the extension.  However, I assume they are not using the extension in
> the new server so their is no release to look at.
>
> Still, I don't know how many people are doing this, but the right fix is
> to get the names of the modules that are superceeded and tell pg_upgrade
> to skip them.
>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
>   + Everyone has their own god. +
>



-- 
Smitha Pamujula
Database Administrator // The Watch Woman

Direct: 503.943.6764
Mobile: 503.290.6214 // Twitter: iovation
www.iovation.com


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread Andrew Dunstan



On 08/31/2015 07:32 PM, Bruce Momjian wrote:

On Mon, Aug 31, 2015 at 07:28:00PM -0400, Andrew Dunstan wrote:


On 08/31/2015 07:21 PM, David E. Wheeler wrote:

On Aug 31, 2015, at 4:20 PM, Bruce Momjian  wrote:


I think it would help if its noted somewhere in the document as it would have
helped us save some time understanding why it was failing and why it was
looking for json_build.

The problem is that this is a rare case where you had an extension that
was later included in Postgres.

Maybe not so rare. Thanks to Andrew, we’ve had to do this for both 9.2-9.3 
(json_object) and 9.3-9.4 (json_build).



Yeah, a lot of people don't like to wait for new stuff. :-)

It might make the most sense to mention this method in the release notes
of the extension.  However, I assume they are not using the extension in
the new server so their is no release to look at.

Still, I don't know how many people are doing this, but the right fix is
to get the names of the modules that are superceeded and tell pg_upgrade
to skip them.




I don't think this knowledge should be hardcoded in pg_upgrade. I could 
see some point in a switch that would tell pg_upgrade a list of 
extensions to ignore.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread Tom Lane
Andrew Dunstan  writes:
> On 08/31/2015 07:32 PM, Bruce Momjian wrote:
>> Still, I don't know how many people are doing this, but the right fix is
>> to get the names of the modules that are superceeded and tell pg_upgrade
>> to skip them.

> I don't think this knowledge should be hardcoded in pg_upgrade. I could 
> see some point in a switch that would tell pg_upgrade a list of 
> extensions to ignore.

That would not be terribly helpful for cases where the pg_upgrade call is
embedded in some wrapper script or other.

In any case, there is plenty of precedent for hard-coding knowledge about
specific version updates into pg_upgrade.  The question here is whether
it's feasible to handle extensions that way.  I think we could reasonably
expect to know about cases where a formerly separate extension got
integrated into core, but are there other cases where pg_upgrade would
need to ignore an extension in the old database?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade + Extensions

2015-08-31 Thread David E. Wheeler
On Aug 31, 2015, at 4:58 PM, Tom Lane  wrote:

> In any case, there is plenty of precedent for hard-coding knowledge about
> specific version updates into pg_upgrade.  The question here is whether
> it's feasible to handle extensions that way.  I think we could reasonably
> expect to know about cases where a formerly separate extension got
> integrated into core,

+1

> but are there other cases where pg_upgrade would
> need to ignore an extension in the old database?

Not that I can think of, unless it’s already present because it was in 
template1 or something.

David



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Josh Berkus
On 08/31/2015 02:47 PM, Robert Haas wrote:
> On Mon, Aug 31, 2015 at 4:16 PM, Josh Berkus  wrote:
>> First, let me put out there that I think the horizontal scaling project
>> which has buy-in from the community and we're working on is infinitely
>> better than the one we're not working on or is an underresourced fork.
>> So we're in agreement on that.  However, I think there's a lot of room
>> for discussion; I feel like the FDW approach was decided in exclusive
>> meetings involving a very small number of people.  The FDW approach
>> *may* be the right approach, but I'd like to see some rigorous
>> questioning of that before it's final.
> 
> It seems to me that sharding consists of (1) breaking your data set up
> into shards, (2) possibly replicating some of those shards onto
> multiple machines, and then (3) being able to access the remote data
> from local queries.   As far as (1) is concerned, we need declarative
> partitioning, which is being worked on by Amit Langote.  As far as (2)
> is concerned, I hope and expect BDR, or technology derived therefrom,
> to eventually fill that need.  

Well, maybe.  If you look at pg_shard, you'll see that it works by
multiplexing writes to all copies.  There's a good reason to do that; it
allows you to have a tight feedback loop between the success of writes
and the availability of "good" nodes.  If you're depending on a separate
replication system to handle getting row copies from one shard to
another, then you need a different way to deal with bad nodes and with
inconsistency between copies of shards.  That's why the existing
multinode non-relational databases don't separate replication from
writes, either.

For that matter, if what you want is transactional fully ACID sharding,
I really don't see a way to do it via BDR, since BDR is purely
asynchronous replication, as far as I know.

Also, if we want BDR to do this, that's pretty far afield of what BDR is
currently capable of, so someone will need to put serious work into it
rather than just assuming functionality will show up.

> As far as (3) is concerned, why
> wouldn't we use the foreign data wrapper interface, and specifically
> postgres_fdw?  That interface was designed for the explicit purpose of
> allowing access to remote data sources, and a lot of work has been put
> into it, so it would be highly surprising if we decided to throw that
> away and develop something completely new from the ground up.

Well, query hooks are also a capability which we already have, and is
mature.  Citus has already posted about why they chose to use them instead.

As long as you recognize that the FDW API (not just the existing fdws)
will need to expand to make this work, it's a viable path.

Also consider that (3) includes both reads and writes.

> I think it's abundantly clear that we need a logical replication
> solution as part of any horizontal scalability story.  People will
> want to do things like have 10 machines with each piece of data on 3
> of them, and there won't be any reasonable way of doing that without
> logical replication.  I assume that BDR, or some technology derived
> from it, will end up in core and solve that problem.  I had actually
> hoped we were going to get that in 9.5, but it didn't happen that way.
> Still, I think that getting first single-master, and then eventually
> multi-master, logical replication in core is absolutely critical.  And
> not just for sharding specifically: replicating your whole database to
> several nodes and load-balancing your clients across them isn't
> sharding, but it does give you read scalability and is a good fit for
> people with geographically dispersed data with good geographical
> locality.  I think a lot of people will want that.

Well, the latter thing is something which BDR is designed for, so all
that needs to happen with that is getting the rest of the plumbing into
core.  Also documentation, packaging, productization, etc.  But the
heavy lifting has already been done.

However, integrating BDR with sharding has some major design issues
which aren't trivial and may be unresolvable, per above.

> I'm not quite sure yet how we can marry declarative partitioning and
> better FDW-pushdown and logical replication into one seamless, easy to
> deploy solution that requires very low administrator effort.  But I am
> sure that each of those things, taken individually, is very useful,
> and that being able to construct a solution from those building blocks
> would be a big improvement over what we have today.  I can't imagine
> that trying to do one monolithic project that provides all of those
> things, but only if you combine them in the specific way that the
> designer had in mind, is ever going to be successful.  People _will_
> want access to each of those features in an unbundled fashion.  And,
> trying to do them altogether leads to trying to solve too many
> problems at once.  I think the history of Postgres-XC is a cautionary
> tale 

Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Marc Munro
On Mon, 2015-08-31 at 22:21 +,  Robert Haas wrote:

> It seems to me that sharding consists of (1) breaking your data set up
> into shards, (2) possibly replicating some of those shards onto
> multiple machines, and then (3) being able to access the remote data
> from local queries. [...]

I believe there is another aspect to sharding that I have not yet seen
mentioned, which is one of connection routing.  

One use case that I have been involved in, is to simply partition the
application into entirely, or almost entirely, separate datasets running
on separate databases with little or no need for queries to access
remote data.

This allows each database to deal only with connections from clients
that actually want its local data, greatly reducing the number of
connections on any individual database.  If this works for your
application, your ability to scale is great.

The pain point comes from trying to route queries to the correct
database.  Inevitably, everyone taking this route builds custom
connection-selection layers into their apps.

It seems to me that even with the more sophisticated types of sharding
being discussed here, the ability to conditionally route a
query/connection to a suitable starting database could be quite
beneficial.

Although this is probably a job for the pgbouncer/pgpool developers
rather than the hackers on this list, this thread seems to be a good
place to mention it.

__
Marc




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Bruce Momjian
On Mon, Aug 31, 2015 at 05:10:11PM -0700, Josh Berkus wrote:
> > As far as (3) is concerned, why
> > wouldn't we use the foreign data wrapper interface, and specifically
> > postgres_fdw?  That interface was designed for the explicit purpose of
> > allowing access to remote data sources, and a lot of work has been put
> > into it, so it would be highly surprising if we decided to throw that
> > away and develop something completely new from the ground up.
> 
> Well, query hooks are also a capability which we already have, and is
> mature.  Citus has already posted about why they chose to use them instead.
> 
> As long as you recognize that the FDW API (not just the existing fdws)
> will need to expand to make this work, it's a viable path.

Uh, we already have a list of things we need to add to FDWs to make them
work, and Citus Data has provided a document of more things that are
needed, https://goo.gl/vJWF85.  I am not sure how much bigger a red flag
you want to confirm that everyone agrees that major FDW improvements are
a requirement for this.  

My hope is that many FDW improvements will benefit sharding and
non-sharding workloads, but I bet some improvements are going to be
sharding-specific.  I would say we are still in the exploratory stage,
but based on the number of people who care about this feature and want
to be involved, I think we are off to a very good start.  :-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-08-31 Thread Peter Eisentraut
On 8/20/15 9:59 AM, Andrew Dunstan wrote:
> The regression tests thus passed, but should not have. It occurred to me
> that if we had a test like
> 
> select pg_config('configure') ~ '--with-libxml' as has_xml;
> 
> in the xml tests then this failure mode would be detected.

This particular case could probably be addressed in a less roundabout
way by enhancing the mapping mechanism in pg_regress.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-08-31 Thread Peter Eisentraut
On 8/24/15 9:50 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 08/23/2015 08:58 PM, Michael Paquier wrote:
>>> I think that's a good thing to have, now I have concerns about making
>>> this data readable for non-superusers. Cloud deployments of Postgres
>>> are logically going to block the access of this view.
> 
>> I don't think it exposes any information of great security value.
> 
> We just had that kerfuffle about whether WAL compression posed a security
> risk; doesn't that imply that at least the data relevant to WAL position
> has to be unreadable by non-superusers?

We already have functions that expose the current (or recent, or
interesting) WAL position, so any new ones should probably follow the
existing ones.  Or possibly we don't need any new ones, because we
already have enough?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-08-31 Thread Peter Eisentraut
On 8/25/15 11:32 PM, Joe Conway wrote:
> 1.) pg_controldata() function and pg_controldata view added

I don't think dumping out whatever pg_controldata happens to print as a
bunch of text fields is very sophisticated.  We have functionality to
compute with WAL positions, for example, and they won't be of much use
if this is going to be all text.

Also, the GUC settings tracked in pg_control can already be viewed using
normal mechanisms, so we don't need a second way to see them.

The fact that some of this is stored in pg_control and some is not is
really an implementation detail.  We should be thinking of ways to
expose specific useful information in useful ways, not just dump out
everything we can find.  Ultimately, I think we would like to move away
from people parsing textual pg_controldata output, but this proposal is
not moving very far in that direction.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \rotate in psql

2015-08-31 Thread Pavel Stehule
2015-08-29 5:57 GMT+02:00 Pavel Stehule :

>
>
> 2015-08-29 0:48 GMT+02:00 Daniel Verite :
>
>>  Hi,
>>
>> This is a reboot of my previous proposal for pivoting results in psql,
>> with a new patch that generalizes the idea further through a command
>> now named \rotate, and some examples.
>>
>> So the concept is: having an existing query in the query buffer,
>> the user can specify two column numbers C1 and C2 (by default the 1st
>> and 2nd) as an argument to a \rotate command.
>>
>> The query results are then displayed in a 2D grid such that each tuple
>> (vx, vy, va, vb,...) is shown as |va vb...| in a cell at coordinates
>> (vx,vy).
>> The values vx,xy come from columns C1,C2 respectively and are
>> represented in the output as an horizontal and a vertical header.
>>
>> A cell may hold several columns from several rows, growing horizontally
>> and
>> vertically (\n inside the cell) if necessary to show all results.
>>
>> The examples below should be read with a monospaced font as in psql,
>> otherwise they will look pretty bad.
>>
>> 1. Example with only 2 columns, querying login/group membership from the
>> catalog.
>>Query:
>>
>> SELECT r.rolname as username,r1.rolname as groupname
>>   FROM pg_catalog.pg_roles r LEFT JOIN pg_catalog.pg_auth_members m
>>   ON (m.member = r.oid)
>>   LEFT JOIN pg_roles r1 ON (m.roleid=r1.oid)
>>   WHERE r.rolcanlogin
>>   ORDER BY 1
>>
>> Sample results:
>>   username  | groupname
>> +---
>>  daniel | mailusers
>>  drupal |
>>  dv | admin
>>  dv | common
>>  extc   | readonly
>>  extu   |
>>  foobar |
>>  joel   |
>>  mailreader | readonly
>>  manitou| mailusers
>>  manitou| admin
>>  postgres   |
>>  u1 | common
>>  u2 | mailusers
>>  zaz| mailusers
>>
>>
>> Applying \rotate gives:
>>Rotated query results
>>   username  | admin | common | mailusers | readonly
>> +---++---+--
>>  daniel |   || X |
>>  drupal |   ||   |
>>  dv | X | X  |   |
>>  extc   |   ||   | X
>>  extu   |   ||   |
>>  foobar |   ||   |
>>  joel   |   ||   |
>>  mailreader |   ||   | X
>>  manitou| X || X |
>>  postgres   |   ||   |
>>  u1 |   | X  |   |
>>  u2 |   || X |
>>  zaz|   || X |
>>
>> The 'X' inside cells is automatically added as there are only
>> 2 columns. If there was a 3rd column, the content of that column would
>> be displayed instead (as in the next example).
>>
>> What's good in that \rotate display compared to the classic output is that
>> it's more apparent, visually speaking, that such user belongs or not to
>> such
>> group or another.
>>
>> 2. Example with a unicode checkmark added as 3rd column, and
>>unicode linestyle and borders (to be seen with a mono-spaced font):
>>
>> SELECT r.rolname as username,r1.rolname as groupname, chr(10003)
>>   FROM pg_catalog.pg_roles r LEFT JOIN pg_catalog.pg_auth_members m
>>   ON (m.member = r.oid)
>>   LEFT JOIN pg_roles r1 ON (m.roleid=r1.oid)
>>   WHERE r.rolcanlogin
>>   ORDER BY 1
>>
>> Rotated query results
>> ┌┬───┬───�”
>> �┬───┬â
>> ��─┐
>> │  username  │ admin │ common │ mailusers │ readonly │
>> ├┼───┼───�”
>> �┼───┼â
>> ��─┤
>> │ daniel │   │  │ ✓ ││
>> │ drupal │   │  │ ││
>> │ dv   │ ✓ │ ✓  │ ││
>> │ extc   │   │  │ │ ✓│
>> │ extu   │   │  │ ││
>> │ foobar │   │  │ ││
>> │ joel   │   │  │ ││
>> │ mailreader │   │  │ │ ✓│
>> │ manitou│ ✓   │  │ ✓ ││
>> │ postgres   │   │  │ ││
>> │ u1   │ │ ✓  │ ││
>> │ u2   │ │  │ ✓ ││
>> │ zaz│   │  │ ✓ ││
>> └┴───┴───�”
>> �┴───┴â
>> ��─┘
>>
>>
>> What I like in that representation is that it looks good enough
>> to be pasted directly into a document in a word processor.
>>
>> 3. It can be rotated easily in the other direction, with:
>>\rotate 2 1
>>
>> (Cut horizontally to fit in a mail, the actual output is 116 chars wide).
>>
>>Rotated query results
>> ┌───┬┬───�”
>> �┬┬──┬──┬─â
>> ��──┬──┬
>> │ username  │ daniel │ drupal │ dv │ extc │ extu │ foobar │
>> joel │ mai...
>> ├───┼┼───�”
>> 

Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread Andres Freund
On 2015-08-31 13:54:57 +0300, YUriy Zhuravlev wrote:
> We have noticed s_lock in PinBuffer and UnpinBuffer. For the test we
> rewrited PinBuffer and UnpinBuffer by atomic operations and we liked
> the result. Degradation of performance almost completely disappeared,
> and went scaling up to 400 clients (4 NUMA nodes with 256 "CPUs").
> 
> To scale Postgres for large NUMA machine must be ported to the atomic 
> operations bufmgr. During our tests, we no found errors in our patch, but 
> most 
> likely it is not true and this patch only for test.

I agree that this is necessary, and it matches with what I've
profiled.

Unfortunately I don't think the patch can be quite as simple as
presented here - we rely on the exclusion provided by the spinlock in a
bunch of places for more than the manipulation of individual values. And
those currently are only correct if there's no other possible
manipulations going on. But it's definitely doable.

I've initial patch doing this, but for me at it seemed to be necessary
to merge flags, usage and refcount into a single value - otherwise the
consistency is hard to maintain because you can't do a CAS over all
values.

> diff --git a/src/backend/storage/buffer/buf_init.c 
> b/src/backend/storage/buffer/buf_init.c
> index 3ae2848..5fdaca7 100644
> --- a/src/backend/storage/buffer/buf_init.c
> +++ b/src/backend/storage/buffer/buf_init.c
> @@ -95,9 +95,9 @@ InitBufferPool(void)
>   BufferDesc *buf = GetBufferDescriptor(i);
>  
>   CLEAR_BUFFERTAG(buf->tag);
> - buf->flags = 0;
> - buf->usage_count = 0;
> - buf->refcount = 0;
> + buf->flags.value = 0;
> + buf->usage_count.value = 0;
> + buf->refcount.value = 0;
>   buf->wait_backend_pid = 0;

That's definitely not correct, you should initialize the atomics using
pg_atomic_init_u32() and write to by using pg_atomic_write_u32() - not
access them directly. This breaks the fallback paths.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.

2015-08-31 Thread Anastasia Lubennikova

Hi, hackers!
I'm going to begin work on effective storage of duplicate keys in B-tree 
index.
The main idea is to implement posting lists and posting trees for B-tree 
index pages as it's already done for GIN.


In a nutshell, effective storing of duplicates in GIN is organised as 
follows.
Index stores single index tuple for each unique key. That index tuple 
points to posting list which contains pointers to heap tuples (TIDs). If 
too many rows having the same key, multiple pages are allocated for the 
TIDs and these constitute so called posting tree.
You can find wonderful detailed descriptions in gin readme 
 
and articles .
It also makes possible to apply compression algorithm to posting 
list/tree and significantly decrease index size. Read more in 
presentation (part 1) 
.


Now new B-tree index tuple must be inserted for each table row that we 
index.
It can possibly cause page split. Because of MVCC even unique index 
could contain duplicates.

Storing duplicates in posting list/tree helps to avoid superfluous splits.

So it seems to be very useful improvement. Of course it requires a lot 
of changes in B-tree implementation, so I need approval from community.


1. Compatibility.
It's important to save compatibility with older index versions.
I'm going to change BTREE_VERSION to 3.
And use new (posting) features for v3, saving old implementation for v2.
Any objections?

2. There are several tricks to handle non-unique keys in B-tree.
More info in btree readme 
 
(chapter - Differences to the Lehman & Yao algorithm).

In the new version they'll become useless. Am I right?

3. Microvacuum.
Killed items are marked LP_DEAD and could be deleted from separate page 
at time of insertion.
Now it's fine, because each item corresponds with separate TID. But 
posting list implementation requires another way. I've got two ideas:

First is to mark LP_DEAD only those tuples where all TIDs are not visible.
Second is to add LP_DEAD flag to each TID in posting list(tree). This 
way requires a bit more space, but allows to do microvacuum of posting 
list/tree.

Which one is better?

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Oleg Bartunov
On Mon, Aug 31, 2015 at 5:48 AM, Bruce Momjian  wrote:

> On Sun, Aug 30, 2015 at 10:08:06PM -0400, Bruce Momjian wrote:
> > On Mon, Aug 31, 2015 at 09:53:57AM +0900, Michael Paquier wrote:
> > > Well, I have had many such discussions with XC/XL folks, and that
> was my
> > > opinion.  I have seen almost no public discussion about this
> because the
> > > idea had almost no chance of success.  If it was possible, someone
> would
> > > have already suggested it on this list.
> > >
> > >
> > > Or perhaps people invested in this area had other obligations or lacked
> > > motivation and/or time to work to push up for things in core. That's
> not
> > > possible to know, and what is done is done.
> >
> > Well, I have talked to everyone privately about this, and concluded that
> > while horizontal scalability/sharding is useful, it is unlikely that the
> > code volume of something like XC or XL would be accepted into the
> > community, and frankly, now that we have FDWs, it is hard to imagine why
> > we would _not_ go in the FDW direction.
>
> Actually, there was hope that XC or XL would get popular enough that it
> would justify adding their code into community Postgres, but that never
> happened.
>

AFAIK, XC/XL has already some customers and that is an additional pressure
on their development team, which is now called X2. I don't exactly know how
internal Huawei's MPPDB is connected to XC/XL.

We need community test suite for cluster and our company is working on
this. It's non-trivial work, but community will never accepts any cluster
solution without thorough testing of functionality and performance. Our
XC/XL experience was not good.



>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
>   + Everyone has their own god. +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Reducing ClogControlLock contention

2015-08-31 Thread Amit Kapila
On Wed, Aug 26, 2015 at 4:18 PM, Simon Riggs  wrote:
>
> On 26 August 2015 at 11:40, Amit Kapila  wrote:
>>
>> On Tue, Aug 25, 2015 at 2:21 PM, Simon Riggs 
wrote:
>>>
>>> On 22 August 2015 at 15:14, Andres Freund  wrote:
>>
>>

 TransactionIdSetPageStatus() calls TransactionIdSetStatusBit(), which
 writes an 8 byte variable (the lsn). That's not safe.
>>>
>>>
>>> Agreed, thanks for spotting that.
>>>
>>> I see this as the biggest problem to overcome with this patch.
>>
>>
>> How about using 64bit atomics or spinlock to protect this variable?
>
>
> Spinlock is out IMHO because this happens on every clog lookup. So it
must be an atomic read.
>

Agreed, however using atomics is still an option, yet another way could
be before updating group_lsn, check if we already have CLogControlLock
in Exclusive mode then update it, else release the lock, re-acquire in
Exclusive mode and update the variable.  The first thing that comes to mind
with this idea is that it will be less performant, yes thats true, but it
will be
only done for asynchronous commits (mode which is generally not recommended
for production-use) and that too not on every commit, so may be the impact
is
not that high.  I have updated the patch (attached with mail) to show
you what
I have in mind.


Another point about the latest patch:

+ (write_ok ||
+ shared->page_status[slotno] != SLRU_PAGE_WRITE_IN_PROGRESS))

Do you think that with new locking protocol as proposed in this
patch, it is safe to return if page status is SLRU_PAGE_WRITE_IN_PROGRESS
even if write_ok is true?

I think the case where it can cause problem is during
SlruInternalWritePage()
where it performs below actions:
1. first marks the status of page as SLRU_PAGE_WRITE_IN_PROGRESS.
2. then take buffer lock in Exclusive mode.
3. release control lock.
4. perform the write
5. re-acquire the control lock in Exclusive mode.

Now consider another client which has to update the transaction status:
1. Control lock in Shared mode.
2. Get the slot
3. Acquire the buffer lock in Exclusive mode

Now consider client which has to update the transaction status performs
its step-1 after step-3 of writer, if that happens, then that could lead to
deadlock because writer will wait for client to release control lock and
client will wait for writer to release buffer lock.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


clog_optimize.v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm failure from overly noisy warning message

2015-08-31 Thread Jeff Janes
On Tue, Jul 28, 2015 at 2:38 PM, Tom Lane  wrote:

> Kevin Grittner  writes:
> > Tom Lane  wrote:
> >> Kevin Grittner  writes:
> >>> I think a LOG entry when an autovacuum process is actually canceled
> >>> has value just in case it is happening on a particular table so
> >>> frequently that the table starts to bloat.  I see no reason to log
> >>> anything if there is an intention to cancel an autovacuum process
> >>> but it actually completes before we can do so.
>
> >> Hm.  By that logic, I'm not sure if we need anything to be logged here,
> >> because the autovacuum process will log something about having received
> >> a query cancel signal.
>
> > That seems sufficient to me for normal cases.
>
> Rather than remove the "sending signal" elog entirely, I reduced it to
> DEBUG1; that will avoid log chatter for normal cases but the info can
> still be obtained at need.
>

This part doesn't seem right to me.  Now the autovac worker logs that it
was cancelled, but we have no idea why it was cancelled i.e. which lock
request caused it to be cancelled and which query caused the lock request.

(It looks like the build-farm failures were fixed by the other part of the
change, which I have no objection to)

This effectively undoes commit d7318d43d891bd63e82dcfc27.  I thought that
that commit was a huge improvement and would prefer it not be undone.

commit d7318d43d891bd63e82dcfc27948113ed7b1db80
Author: Robert Haas 
Date:   Thu Jul 26 09:18:32 2012 -0400

Log a better message when canceling autovacuum.

The old message was at DEBUG2, so typically it didn't show up in the
log at all.  As a result, in most cases where autovacuum was canceled,
the only information that was logged was the table being vacuumed,
with no indication as to what problem caused the cancel.  Crank up
the level to LOG and add some more details to assist with debugging.

Although looking at the code from that patch, it is not clear to me why all
the string building needs to be done under the ProcArrayLock.  The string
depends only on the local state of the lock being blocked, not on the proc
doing the blocking.

Cheers,

Jeff


Re: [HACKERS] WIP: Rework access method interface

2015-08-31 Thread Petr Jelinek

On 2015-08-27 15:15, Alexander Korotkov wrote:

On Wed, Aug 26, 2015 at 7:20 PM, Alexander Korotkov
> wrote:

On Wed, Aug 26, 2015 at 6:50 PM, Tom Lane > wrote:

One thought here is that we might not want to just blindly duplicate
the existing pg_am behavior anyway.  For example, the main use
of the
amstrategies column was to allow validation of pg_amop.amopstrategy
entries --- but in 4 out of the 6 existing AMs, knowledge of the
AM alone
isn't sufficient information to determine the valid set of strategy
numbers anyway.  So inventing a "pg_amstrategies(am oid)"
function seems
like it would be repeating a previous failure.  Not quite sure
what to
do instead, though.  We could imagine something like
"pg_amstrategies(am
oid, opclass oid)", but I don't see how to implement it without
asking
opclasses to provide a validation function, which maybe is a
change we
don't want to take on here.


Could we add another function to access method interface which would
validate opclass?
Am could validate this way not only strategies, but also supporting
functions. For instance, in GIN, we now require opclass to specify
at least one of consistent and triconsistent. ISTM I would be nice
to let the access method check such conditions. Also, we would be
able to check opclass correction on its creation. Now one can create
opclass with missing support functions which doesn't work.
In the SQL-level we can create function which validates opclass
using this new method. This function can be used in regression tests.


Should I try to implement such new access method function, say 'amvalidate'?



Makes sense to me to do that, should be probably optional though.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-31 Thread Pavel Stehule
2015-08-31 11:30 GMT+02:00 Shulgin, Oleksandr 
:

> Do you still have the code somewhere around? Did it see production use?

>>> I sent it to mailing list year ago
>>
>>
>> http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com
>>
>
> Ah, thanks!  Somehow I've missed this mail.  You didn't add the patch to a
> commitfest back then I think?
>

I had no time to finish this patch - there is few issues in signal handling
and returning back result - but still I want it :) - and what I know -
almost all other SQL db has similar functionality.

>
> --
> Alex
>
>


[HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread YUriy Zhuravlev
Hello hackers

Recently, we were given access to the test server is IBM, 9119-MHE with 8 CPUs 
* 8 cores * 8 threads. We decided to take advantage of this and to find 
bottlenecks for read scalability (pgbench -S).

All detail you can read here: 
http://www.postgrespro.ru/blog/pgsql/2015/08/30/p8scaling

Performance 9.4 stopped growing after 100 clients, and 9.5 / 9.6 stopped after 
150 (at 4 NUMA nodes). After research using pref we saw that inhibits 
ProcArrayLock in GetSnaphotData. But inserting the stub instead of 
GetSnapshotData not significantly increased scalability. Trying to find the 
bottleneck with gdb, we found another place. We have noticed s_lock in 
PinBuffer and UnpinBuffer. For the test we rewrited PinBuffer and UnpinBuffer 
by atomic operations and we liked the result. Degradation of performance 
almost completely disappeared, and went scaling up to 400 clients (4 NUMA 
nodes with 256 "CPUs").

To scale Postgres for large NUMA machine must be ported to the atomic 
operations bufmgr. During our tests, we no found errors in our patch, but most 
likely it is not true and this patch only for test.

Who has any thoughts?

-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 3ae2848..5fdaca7 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -95,9 +95,9 @@ InitBufferPool(void)
 			BufferDesc *buf = GetBufferDescriptor(i);
 
 			CLEAR_BUFFERTAG(buf->tag);
-			buf->flags = 0;
-			buf->usage_count = 0;
-			buf->refcount = 0;
+			buf->flags.value = 0;
+			buf->usage_count.value = 0;
+			buf->refcount.value = 0;
 			buf->wait_backend_pid = 0;
 
 			SpinLockInit(>buf_hdr_lock);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index cd3aaad..8cf97cb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -714,8 +714,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		if (isLocalBuf)
 		{
 			/* Only need to adjust flags */
-			Assert(bufHdr->flags & BM_VALID);
-			bufHdr->flags &= ~BM_VALID;
+			Assert(bufHdr->flags.value & BM_VALID);
+			bufHdr->flags.value &= ~BM_VALID;
 		}
 		else
 		{
@@ -727,8 +727,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			do
 			{
 LockBufHdr(bufHdr);
-Assert(bufHdr->flags & BM_VALID);
-bufHdr->flags &= ~BM_VALID;
+Assert(bufHdr->flags.value & BM_VALID);
+bufHdr->flags.value &= ~BM_VALID;
 UnlockBufHdr(bufHdr);
 			} while (!StartBufferIO(bufHdr, true));
 		}
@@ -746,7 +746,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	 * it's not been recycled) but come right back here to try smgrextend
 	 * again.
 	 */
-	Assert(!(bufHdr->flags & BM_VALID));		/* spinlock not needed */
+	Assert(!(bufHdr->flags.value & BM_VALID));		/* spinlock not needed */
 
 	bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
 
@@ -824,7 +824,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	if (isLocalBuf)
 	{
 		/* Only need to adjust flags */
-		bufHdr->flags |= BM_VALID;
+		bufHdr->flags.value |= BM_VALID;
 	}
 	else
 	{
@@ -952,10 +952,10 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 */
 		buf = StrategyGetBuffer(strategy);
 
-		Assert(buf->refcount == 0);
+		Assert(buf->refcount.value == 0);
 
 		/* Must copy buffer flags while we still hold the spinlock */
-		oldFlags = buf->flags;
+		oldFlags = buf->flags.value;
 
 		/* Pin the buffer and then release the buffer spinlock */
 		PinBuffer_Locked(buf);
@@ -1149,8 +1149,8 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 * recycle this buffer; we must undo everything we've done and start
 		 * over with a new victim buffer.
 		 */
-		oldFlags = buf->flags;
-		if (buf->refcount == 1 && !(oldFlags & BM_DIRTY))
+		oldFlags = buf->flags.value;
+		if (buf->refcount.value == 1 && !(oldFlags & BM_DIRTY))
 			break;
 
 		UnlockBufHdr(buf);
@@ -1171,12 +1171,12 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	 * 1 so that the buffer can survive one clock-sweep pass.)
 	 */
 	buf->tag = newTag;
-	buf->flags &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT);
+	buf->flags.value &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT);
 	if (relpersistence == RELPERSISTENCE_PERMANENT)
-		buf->flags |= BM_TAG_VALID | BM_PERMANENT;
+		buf->flags.value |= BM_TAG_VALID | BM_PERMANENT;
 	else
-		buf->flags |= BM_TAG_VALID;
-	buf->usage_count = 1;
+		buf->flags.value |= BM_TAG_VALID;
+	buf->usage_count.value = 1;
 
 	UnlockBufHdr(buf);
 
@@ -1268,7 +1268,7 @@ retry:
 	 * yet done StartBufferIO, WaitIO will fall through and we'll effectively

Re: [HACKERS] Scaling PostgreSQL at multicore Power8

2015-08-31 Thread YUriy Zhuravlev
On Monday 31 August 2015 13:03:07 you wrote:
> That's definitely not correct, you should initialize the atomics using
> pg_atomic_init_u32() and write to by using pg_atomic_write_u32() - not
> access them directly. This breaks the fallback paths.

You right. Now it's just to silence the compiler. 
This patch is concept only.

-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-31 Thread Shulgin, Oleksandr
>
> Do you still have the code somewhere around? Did it see production use?
>>>
>> I sent it to mailing list year ago
>
>
> http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com
>

Ah, thanks!  Somehow I've missed this mail.  You didn't add the patch to a
commitfest back then I think?

--
Alex


Re: [HACKERS] [patch] Proposal for \rotate in psql

2015-08-31 Thread Daniel Verite
  I wrote:

> What I like in that representation is that it looks good enough
> to be pasted directly into a document in a word processor.

And ironically, the nice unicode borders came out all garbled
in the mail, thanks to a glitch in my setup that mis-reformatted them
before sending.

Sorry about that, the results with unicode linestyle were supposed to be
as follows:

Example 2:

Rotated query results
┌┬───┬┬───┬──┐
│  username  │ admin │ common │ mailusers │ readonly │
├┼───┼┼───┼──┤
│ daniel │   ││ ✓ │  │
│ drupal │   ││   │  │
│ dv │ ✓ │ ✓  │   │  │
│ extc   │   ││   │ ✓│
│ extu   │   ││   │  │
│ foobar │   ││   │  │
│ joel   │   ││   │  │
│ mailreader │   ││   │ ✓│
│ manitou│ ✓ ││ ✓ │  │
│ postgres   │   ││   │  │
│ u1 │   │ ✓  │   │  │
│ u2 │   ││ ✓ │  │
│ zaz│   ││ ✓ │  │
└┴───┴┴───┴──┘


Example 3, rotated in the other direction

(Cut horizontally to fit in a mail, the actual output is 116 chars wide).

   Rotated query results
┌───┬┬┬┬──┬──┬┬──┬
│ username  │ daniel │ drupal │ dv │ extc │ extu │ foobar │ joel │ mai...
├───┼┼┼┼──┼──┼┼──┼
│ mailusers │ ✓  │││  │  ││  │
│ admin │││ ✓  │  │  ││  │
│ common│││ ✓  │  │  ││  │
│ readonly  ││││ ✓│  ││  │ ✓  
└───┴┴┴┴──┴──┴┴──┴


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Adding since-version tags to the docs?

2015-08-31 Thread Shulgin, Oleksandr
Hello,

I often find it pity that our docs are missing any information on since
when a certain GUC setting, SQL-level command or function was introduced.

Clicking through the "this page in other versions" links at the top of a
webpage does help, but you still need to do some guessing (binary search?)
with the clicks. :-)

It would be nice if we could make a script that would parse the sgml files
and for every symbol it finds it would add a tag like "Since version 9.x".
Such a script could start by checking out REL9_0_STABLE and looking through
all symbols it can find, tagging them "Since 9.0".  Then it could commit
the result, check out the next version branch and apply said commit (some
manual effort to merge it might be required), and repeat the process,
assuming all newly found symbols must be introduced in this new version.

That is for the lists, tabular representation might require adding a new
column, I'm not sure what would be the best format.

After this process is done once, we can have a requirement that every newly
introduced symbol/command be tagged manually by the patch author.

Do you think such approach will work?  Is there interest in having this
done?

--
Alex


Re: [HACKERS] perlcritic

2015-08-31 Thread Michael Paquier
On Tue, Sep 1, 2015 at 12:57 PM, Peter Eisentraut  wrote:
> We now have 80+ Perl files in our tree, and it's growing.  Some of those
> files were originally written for Perl 4, and the coding styles and
> quality are quite, uh, divergent.  So I figured it's time to clean up
> that code a bit.  I ran perlcritic over the tree and cleaned up all the
> warnings at level 5 (the default, least severe).

Do you think we should be concerned about the increased difficulty to
backpatch fixes if this patch is applied? I personally think that's
fine to do this cleanup on HEAD only, still others may have a
different opinion.

> Testing guidelines:
> - msvc files need to be tested separately.  I tested as best as I could
> on a non-Windows system.

And tested on Windows, I am not seeing failures.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: About CMake v2

2015-08-31 Thread Noah Misch
On Fri, Aug 28, 2015 at 01:28:49PM -0300, Alvaro Herrera wrote:
> If it allows us to get rid of our custom MSVC scripts, it's a huge
> benefit, for sure -- that has been a huge pain in the neck since day
> one.

Moreover, I suggest beginning with a patch that replaces the src/tools/msvc
build system with CMake.  One can test almost all Windows configurations using
two machines, so author and reviewer can test most everything.  src/tools/msvc
has more defects than the GNU make build system, so the minimum-quality
acceptable replacement is easier to deliver.  With CMake driving all MSVC
builds, it would be much easier to gradually extend to other platforms.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Peter Geoghegan
On Mon, Aug 31, 2015 at 9:29 PM, Satoshi Nagayasu  wrote:
> BTW, I'm interested in improving the queryid portability now because
> I'd like to use it in other extensions. :)
> That's the reason why I'm looking at query jumbling here.

Are you interested in having the query fingerprinting/jumbling
infrastructure available to all backend code? That seems like a good
idea to me generally. I would like to be able to put queryId in
log_line_prefix, or to display it within EXPLAIN, and have it
available everywhere. I like the idea of per-query
log_min_duration_statement settings.

If you want to use the queryId field directly, which I recall you
mentioning before, then that's harder. There is simply no contract
among extensions for "owning" a queryId. But when the fingerprinting
code is moved into core, then I think at that point queryId may cease
to be even a thing that pg_stat_statements theoretically has the right
to write into. Rather, it just asks the core system to do the
fingerprinting, and finds it within queryId. At the same time, other
extensions may do the same, and don't need to care about each other.

Does that work for you?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-08-31 Thread Pavel Stehule
2015-08-31 20:43 GMT+02:00 dinesh kumar :

> Hi,
>
> On Sat, Aug 29, 2015 at 4:22 PM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>> I am starting to work review of this patch
>>
>> 2015-07-13 9:54 GMT+02:00 dinesh kumar :
>>
>>> Hi All,
>>>
>>> Greetings for the day.
>>>
>>> Would like to discuss on below feature here.
>>>
>>> Feature:
>>> Having an SQL function, to write messages to log destination.
>>>
>>> Justification:
>>> As of now, we don't have an SQL function to write custom/application
>>> messages to log destination. We have "RAISE" clause which is controlled by
>>> log_ parameters. If we have an SQL function which works irrespective of
>>> log settings, that would be a good for many log parsers. What i mean is, in
>>> DBA point of view, if we route all our native OS stats to log files in a
>>> proper format, then we can have our log reporting tools to give most
>>> effective reports. Also, Applications can log their own messages to
>>> postgres log files, which can be monitored by DBAs too.
>>>
>>> Implementation:
>>> Implemented a new function "pg_report_log" which takes one argument
>>> as text, and returns void. I took, "LOG" prefix for all the reporting
>>> messages.I wasn't sure to go with new prefix for this, since these are
>>> normal LOG messages. Let me know, if i am wrong here.
>>>
>>> Here is the attached patch.
>>>
>>
>> This patch is not complex, but the implementation doesn't cover a
>> "ereport" well.
>>
>> Although this functionality should be replaced by custom function in any
>> PL (now or near future), I am not against to have this function in core.
>> There are lot of companies with strong resistance against stored procedures
>> - and sometimes this functionality can help with SQL debugging.
>>
>> Issues:
>>
>> 1. Support only MESSAGE field in exception - I am expecting to support
>> all fields: HINT, DETAIL, ...
>>
>
> Added these functionalities too.
>
>
>> 2. Missing regress tests
>>
>
> Adding here.
>
>
>> 3. the parsing ereport level should be public function shared with
>> PLpgSQL and other PL
>>
>
> Sorry Pavel. I am not getting your point here. Would you give me an
> example.
>

The transformation: text -> error level is common task - and PLpgSQL it
does in pl_gram.y. My idea is to add new function to error utils named
"parse_error_level" and use it from PLpgSQL and from your code.


>
>
>> 4. should be hidestmt mandatory parameter?
>>
>
> I changed this argument's default value as "true".
>
>
>> 5. the function declaration is strange
>>
>> postgres=# \sf pg_report_log (text, anyelement, boolean)
>> CREATE OR REPLACE FUNCTION pg_catalog.pg_report_log(text, anyelement,
>> boolean)
>>  RETURNS void
>>  LANGUAGE sql
>>  STABLE STRICT COST 1
>> AS $function$SELECT pg_report_log($1::pg_catalog.text,
>> $2::pg_catalog.text, $3::boolean)$function$
>>
>> Why polymorphic? It is useless on any modern release
>>
>>
> I took quote_ident(anyelement) as referral code, for implementing this.
> Could you guide me if I am doing wrong here.
>

I was wrong - this is ok - it is emulation of force casting to text


>
>
>> postgres=# \sf pg_report_log (text, text, boolean)
>> CREATE OR REPLACE FUNCTION pg_catalog.pg_report_log(text, text, boolean)
>>  RETURNS void
>>  LANGUAGE internal
>>  IMMUTABLE STRICT
>> AS $function$pg_report_log$function$
>>
>> Why stable, why immutable? This function should be volatile.
>>
>> Fixed these to volatile.
>
>
>> 6. using elog level enum as errcode is wrong idea - errcodes are defined
>> in table http://www.postgresql.org/docs/9.4/static/errcodes-appendix.html
>>
>
> You mean, if the elevel is 'ERROR', then we need to allow errcode. Let me
> know your inputs.
>

I was blind, but the code was not good. Yes, error and higher needs error
code. From ANSI/SQL anything can has error code "00 is ok", "01 ..
warnings" ...

There is more possibilities - look to PLpgSQL implementation - it can be
optional parameter - it default can use ERRCODE_RAISE_EXCEPTION


>
> Adding new patch, with the above fixes.
>
> Thanks in advance.
>
> Regards,
> Dinesh
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> Regards,
>>> Dinesh
>>> manojadinesh.blogspot.com
>>>
>>
>>
>


Re: [HACKERS] buffer README is out of date

2015-08-31 Thread Jim Nasby

On 8/29/15 5:52 PM, Jeff Janes wrote:

! Obtaining the necessary lock is done by the bufmgr routines
! LockBufferForCleanup() or ConditionalLockBufferForCleanup().
! They first get an exclusive lock and then check to see if the
shared pin
! count is currently 1.  If not,
ConditionalLockBufferForCleanup() releases
! the exclusive lock and then returns false, while
LockBufferForCleanup()
! releases the exclusive lock (but not the caller's pin) and
waits until
! signaled by another backend, whereupon it tries again.  The
signal will
! occur when UnpinBuffer decrements the shared pin count to 1.  As


I don't think that's true. If 2 other backends have a pin then AFAIK
you'd wake up twice. There's also this comment in LockBufferForCleanup:

/*
  * Remove flag marking us as waiter. Normally this will not be set
  * anymore, but ProcWaitForSignal() can return for other signals as
  * well.  We take care to only reset the flag if we're the waiter, as
  * theoretically another backend could have started waiting. That's
  * impossible with the current usages due to table level locking, but
  * better be safe.
  */


If 2 other backends have a pin, only the last one to drop it should do
the waking.  I don't see a way they could both try to do the waking, the
interlock on the buffer header seems to prevent that.  But if it did,
that would just be another way you could have a spurious wake-up, which
can already happen for other reasons.  I don't think the spurious wake
up needs to be documented in the higher level README file.


My concern is someone will read too much into "signal will occur ... 
count to 1" and think there's no other ways to be woken up. I realize 
I'm being pedantic here, but given the nasty race condition bugs we've 
had lately maybe it's warranted.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Information of pg_stat_ssl visible to all users

2015-08-31 Thread Michael Paquier
On Tue, Sep 1, 2015 at 4:23 AM, Peter Eisentraut  wrote:
> On 8/31/15 9:13 AM, Andres Freund wrote:
>> I'm just saying that we should strive to behave at least somewhat
>> consistently, and change everything at once, not piecemal. Because the
>> latter will not decrease the pain of migrating to a new model in a
>> relevant way while making the system harder to understand.
>
> Well, we already hide a fair chunk of information from pg_stat_activity
> from unprivileged users, including everything related to the connection
> origin of other users.  So from that precedent, the entire SSL
> information ought to be considered privileged.

That being said we may want as well to bite the bullet and to hide
more information in pg_stat_activity, like datname, usename and
application_name, or simply hide completely those tuples for
non-privileged users.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] perlcritic

2015-08-31 Thread Peter Eisentraut
We now have 80+ Perl files in our tree, and it's growing.  Some of those
files were originally written for Perl 4, and the coding styles and
quality are quite, uh, divergent.  So I figured it's time to clean up
that code a bit.  I ran perlcritic over the tree and cleaned up all the
warnings at level 5 (the default, least severe).

Testing guidelines:

- Many files are part of the regular build or test process.

- msvc files need to be tested separately.  I tested as best as I could
on a non-Windows system.

- There are a couple of one-offs in contrib and src/test that need to be
run manually.

- The stuff under utils/mb/Unicode/ has a makefile that is not part of
the normal build process.  I'll send in a few more patches to that in a
separate message that should help testing.

To install perlcritic, run

cpan -i Perl::Critic

and then run

perlcritic .

at the top of the tree (or a subdirectory).
>From e38edbf5f911eb67750cf890cfd384758e43466e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 31 Aug 2015 23:06:07 -0400
Subject: [PATCH] Clean up Perl code according to perlcritic severity level 5

List of issues addressed:

123 Two-argument "open" used
114 Bareword file handle opened
 35 Loop iterator is not lexical
 26 "require" statement with library name as string
 21 Code before strictures are enabled
  3 Expression form of "eval"
  2 Package declaration must match filename
  1 Subroutine prototypes used
  1 Stricture disabled
  1 Glob written as <...>
  1 Don't modify $_ in list functions

Many additional fixes were the result of enabling strictures, especially
undeclared local variables.
---
 contrib/intarray/bench/create_test.pl  |  20 +-
 contrib/seg/seg-validate.pl|  35 +--
 contrib/seg/sort-segments.pl   |  10 +-
 doc/src/sgml/generate-errcodes-table.pl|   2 +-
 doc/src/sgml/mk_feature_tables.pl  |  14 +-
 src/backend/catalog/Catalog.pm |   8 +-
 src/backend/catalog/genbki.pl  |  64 ++---
 src/backend/parser/check_keywords.pl   |  30 +--
 src/backend/utils/Gen_fmgrtab.pl   |  24 +-
 src/backend/utils/generate-errcodes.pl |   2 +-
 src/backend/utils/mb/Unicode/UCS_to_BIG5.pl| 108 
 src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl  |  77 +++---
 .../utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl| 297 ++---
 src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl  | 141 +-
 src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl  |  77 +++---
 src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl  |  81 +++---
 src/backend/utils/mb/Unicode/UCS_to_GB18030.pl |  65 ++---
 .../utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl  | 241 -
 src/backend/utils/mb/Unicode/UCS_to_SJIS.pl|  75 +++---
 src/backend/utils/mb/Unicode/UCS_to_most.pl|  85 +++---
 .../utils/mb/Unicode/{ucs2utf.pl => ucs2utf.pm}|   8 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl   |  20 +-
 src/bin/pg_ctl/t/001_start_stop.pl |  10 +-
 src/bin/psql/create_help.pl|  28 +-
 src/interfaces/ecpg/preproc/check_rules.pl |  12 +-
 src/interfaces/libpq/test/regress.pl   |  10 +-
 src/pl/plperl/plc_perlboot.pl  |   6 +-
 src/pl/plperl/plc_trusted.pl   |   2 +-
 src/pl/plperl/text2macro.pl|   8 +-
 src/pl/plpgsql/src/generate-plerrcodes.pl  |   2 +-
 src/pl/plpython/generate-spiexceptions.pl  |   2 +-
 src/test/locale/sort-test.pl   |   6 +-
 src/test/perl/TestLib.pm   |  38 +--
 src/test/ssl/ServerSetup.pm|  42 +--
 src/test/ssl/t/001_ssltests.pl |   6 +-
 src/tools/msvc/Install.pm  |  10 +-
 src/tools/msvc/Mkvcbuild.pm|   2 +-
 src/tools/msvc/Project.pm  |  28 +-
 src/tools/msvc/Solution.pm | 166 ++--
 src/tools/msvc/build.pl|  12 +-
 src/tools/msvc/builddoc.pl |   2 +-
 src/tools/msvc/gendef.pl   |  24 +-
 src/tools/msvc/install.pl  |   4 +-
 src/tools/msvc/mkvcbuild.pl|   4 +-
 src/tools/msvc/pgbison.pl  |   4 +-
 src/tools/msvc/pgflex.pl   |  18 +-
 src/tools/msvc/vcregress.pl|  19 +-
 src/tools/pginclude/pgcheckdefines |  91 ---
 src/tools/pgindent/pgindent|   4 +-
 src/tools/version_stamp.pl |  26 +-
 src/tools/win32tzlist.pl   |   6 +-
 51 files changed, 1061 insertions(+), 1015 deletions(-)
 rename 

Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Satoshi Nagayasu

On 2015/09/01 13:41, Peter Geoghegan wrote:

On Mon, Aug 31, 2015 at 9:29 PM, Satoshi Nagayasu  wrote:

BTW, I'm interested in improving the queryid portability now because
I'd like to use it in other extensions. :)
That's the reason why I'm looking at query jumbling here.


Are you interested in having the query fingerprinting/jumbling
infrastructure available to all backend code? That seems like a good
idea to me generally.


Yes. I've been working on the sql_firewall extension[1], which is
totally built on top of the pg_stat_statements.

[1] http://pgsnaga.blogspot.jp/2015/08/postgresql-sql-firewall.html

As of today, sql_firewall has duplicated code for query jumbling.
But if it goes into the core, it looks fantastic.


I would like to be able to put queryId in
log_line_prefix, or to display it within EXPLAIN, and have it
available everywhere. I like the idea of per-query
log_min_duration_statement settings.


Sounds cool. :)


If you want to use the queryId field directly, which I recall you
mentioning before, then that's harder. There is simply no contract
among extensions for "owning" a queryId. But when the fingerprinting
code is moved into core, then I think at that point queryId may cease
to be even a thing that pg_stat_statements theoretically has the right
to write into. Rather, it just asks the core system to do the
fingerprinting, and finds it within queryId. At the same time, other
extensions may do the same, and don't need to care about each other.

Does that work for you?


Yes. I think so.

I need some query fingerprint to determine query group. I want queryid
to keep the same value when query strings are the same (except literal 
values).


Another reason is just because I need to import/export query ids.

Regards,
--
NAGAYASU Satoshi 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-31 Thread Pavel Stehule
Hi

2015-08-31 19:09 GMT+02:00 Shulgin, Oleksandr 
:

> On Mon, Aug 31, 2015 at 12:35 PM, Pavel Stehule 
> wrote:
>
>>

 http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com

>>>
>>> Ah, thanks!  Somehow I've missed this mail.  You didn't add the patch to
>>> a commitfest back then I think?
>>>
>>
>> I had no time to finish this patch - there is few issues in signal
>> handling and returning back result - but still I want it :) - and what I
>> know - almost all other SQL db has similar functionality.
>>
>
> I've updated the patch for the current master and also added some
> unexpected parameters handling, so attached is a v2.
>

Thank you very much


>
> I'd say we should hide the so-designed pg_cmdstatus() interface behind
> more friendly calls like pg_explain_backend() and pg_backend_progress() to
> give some naming examples, to remove the need for magic numbers in the
> second arg.
>

I had similar idea - this is good enough for start, but target interface
iis based on integration with EXPLAIN statement

some like EXPLAIN PROCESS or EXPLAIN PID or EXPLAIN VERBOSE PID ..


>
> What I've found missing in this approach is the insight into nested
> executor runs, so that if you're running a "SELECT my_func()", you only see
> this outer query in the pg_cmdstatus() output.  With the auto_explain
> approach, by hooking into executor I was able to capture the nested queries
> and their plans as well.
>

I understand - originally I didn't think about nested queries, but it is
good idea and probably not a problem:

Not for XML and JSON where we can describe nesting simply

It is little bit harder for plain text - but we can use similar format that
is used for subplans or some like

top query:
  SELECT fx()

nested (1. level) query:
   SELECT 


>
> It's conceptually trivial to add some code to use the Executor hooks here,
> but I don't see any precedent for this except for contrib modules
> (auto_explain and pg_stat_statements), I'm just not sure if that would be
> OK-ish.
>
> And when we solve that, there is another problem of having a sane
> interface to query the nested plans.  For a psql user, probably the most
> interesting would be the topmost (level=1) and the innermost (e.g.
> level=-1) plans.  We might also want to provide a full nesting of plans in
> a structured format like JSON or... *cough* XML, for programs to consume
> and display nicely with folding and stuff.
>
> And the most interesting would be making instrumentation work with all of
> the above.
>

the important functionality is drawing complete text of query - it was my
original motivation, because I had not way how to get complete query before
its finishing

Probably the communication between processes should be more complex :( -
the SHM queue should be used there, because some plans can be terrible long.

The using shared write buffer (one for all) is too simply solution probably
- good for prototype, but not good for core.

I have a idea about communication:

1. caller prepare buffer, shm queue and signalize target process -
parameter is pid od caller
2. target process fills a write buffer and close queue
3. caller show data and close buffer, close queue

Now almost all code for communication is in upstream - the missing part is
injection one end of queue to any process dynamicaly.

Regards

Pavel

>
> I'm adding this to the next CF.
>
> --
> Alex
>


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Satoshi Nagayasu
On 2015/09/01 14:01, Tom Lane wrote:
> Satoshi Nagayasu  writes:
>> On 2015/09/01 13:41, Peter Geoghegan wrote:
>>> If you want to use the queryId field directly, which I recall you
>>> mentioning before, then that's harder. There is simply no contract
>>> among extensions for "owning" a queryId. But when the fingerprinting
>>> code is moved into core, then I think at that point queryId may cease
>>> to be even a thing that pg_stat_statements theoretically has the right
>>> to write into. Rather, it just asks the core system to do the
>>> fingerprinting, and finds it within queryId. At the same time, other
>>> extensions may do the same, and don't need to care about each other.
>>>
>>> Does that work for you?
> 
>> Yes. I think so.
> 
>> I need some query fingerprint to determine query group. I want queryid
>> to keep the same value when query strings are the same (except literal
>> values).
> 
> The problem I've got with this is the unquestioned assumption that every
> application for query IDs will have exactly the same requirements for
> what the ID should include or ignore.

I'm not confident about that too, but at least, I think we will be able
to collect most common use cases as of today. (aka best guess. :)

And IMHO it would be ok to change the spec in future release.

Regards,
-- 
NAGAYASU Satoshi 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Pavan Deolasee
On Tue, Sep 1, 2015 at 3:17 AM, Robert Haas  wrote:

>
>
> It seems to me that sharding consists of (1) breaking your data set up
> into shards, (2) possibly replicating some of those shards onto
> multiple machines, and then (3) being able to access the remote data
> from local queries.   As far as (1) is concerned, we need declarative
> partitioning, which is being worked on by Amit Langote.  As far as (2)
> is concerned, I hope and expect BDR, or technology derived therefrom,
> to eventually fill that need.  As far as (3) is concerned, why
> wouldn't we use the foreign data wrapper interface, and specifically
> postgres_fdw?  That interface was designed for the explicit purpose of
> allowing access to remote data sources, and a lot of work has been put
> into it, so it would be highly surprising if we decided to throw that
> away and develop something completely new from the ground up.
>
> It's true that postgres_fdw doesn't do everything we need yet.  The
> new join pushdown hooks aren't used by postgres_fdw yet, and the API
> itself has some bugs with EvalPlanQual handling.  Aggregate pushdown
> is waiting on upper planner path-ification.   DML pushdown doesn't
> exist yet, and the hooks that would enable pushdown of ORDER BY
> clauses to the remote side aren't being used by postgres_fdw.  But all
> of these things have been worked on.  Patches for many of them have
> already been posted.  They have suffered from a certain amount of
> neglect by senior hackers, and perhaps also from a shortage of time on
> the part of the authors.  But an awful lot of the work that is needed
> here has already been done, if only we could get it committed.
> Aggregate pushdown is a notable exception, but abandoning the foreign
> data wrapper approach in favor of something else won't fix that.
>
> Postgres-XC developed a purpose-built system for talking to other
> nodes instead of using the FDW interface, for the very good reason
> that the FDW interface did not yet exist at the time that Postgres-XC
> was created.  But several people associated with the XC project have
> said, including one on this thread, that if it had existed, they
> probably would have used it.  And it's hard to see why you wouldn't:
> with XC's approach, the remote data source is presumed to be
> PostgreSQL (or Postgres-XC/XL/X2/whatever); and you can only use the
> facility as part of a sharding solution.  The FDW interface can talk
> to anything, and it can be used for stuff other than sharding, like
> making one remote table appear local because you just happen to want
> that for some reason.  This makes the XC approach look rather brittle
> by comparison.  I don't blame the XC folks for taking the shortest
> path between two points, but FDWs are better, and we ought to try to
> leverage that.
>
>
In my discussions on this topic with various folks including Robert, I've
conceded that if FDW was available when XC was first written, in all
likelihood we would have used and extended that interface. But that wasn't
the case and we did what we thought was the best solution at that time,
given the resources and the schedule. To be honest, when XC project was
started, I was quite skeptical about the whole thing given the goal was to
built something which can replace Oracle RAC with may be less than 1%
resources of what Oracle must have invested in building RAC. The lack of
resources at the start of the project keeps showing up in the quality
issues that users report from time to time. Having said that, I am quite
satisfied with what we have been able to build within the constraints.

But FDW is just one part of the story. There is this entire global
consistency problem that would require something like GTM to give out XIDs
and snapshots, atomicity which would require managing transactions across
multiple shards, join pushdowns when all data is not available locally,
something that XL is attempting to solve with datanode-datanode exchange of
information, other global states such as sequences, replicating some part
of the data to multiple shards for efficient operations, ability to
add/remove shards with least disruption, globally consistent
backups/restore. XC/XL has attempted to solve each of them to some extent.
I don't claim that they are completely solved and there are no corner cases
left, but we have made fairly good progress on each of them.

My worry is that if we start implementing them again from scratch, it will
take a few years before we get them in a usable state. What XC/XL lacked is
probably a Robert Haas or a Tom Lane who could look at the work and suggest
major edits. If that had happened, the quality of the product could have
been much better today. I don't mean to derate the developers who worked on
XC/XL, but there is no harm in accepting that if someone with a much better
understanding of the whole system was part of the team, that would have
positively impacted the project. Is that an angle worth exploring? 

Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Satoshi Nagayasu



On 2015/09/01 12:36, Peter Geoghegan wrote:

On Mon, Aug 31, 2015 at 8:32 PM, Satoshi Nagayasu  wrote:

Why don't we use relation name (with looking up the catalog)
on query jumbling? For performance reason?


I think that there is a good case for preferring this behavior. While
it is a little confusing that pg_stat_statements does not change the
representative query string, renaming a table does not make it a
substantively different table.

There is, IIRC, one case where a string is jumbled directly (CTE
name). It's usually not the right thing, IMV.



Thanks for the comment. I've never considered that. Interesting.

From the users point of view, IMHO, it would be better to avoid
confusing if queryid is determined by only visible values -- userid,
dbid and query string itself.

BTW, I'm interested in improving the queryid portability now because
I'd like to use it in other extensions. :)
That's the reason why I'm looking at query jumbling here.

Thoughts?

Regards,
--
NAGAYASU Satoshi 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: About CMake v2

2015-08-31 Thread Tom Lane
Noah Misch  writes:
> On Fri, Aug 28, 2015 at 01:28:49PM -0300, Alvaro Herrera wrote:
>> If it allows us to get rid of our custom MSVC scripts, it's a huge
>> benefit, for sure -- that has been a huge pain in the neck since day
>> one.

> Moreover, I suggest beginning with a patch that replaces the src/tools/msvc
> build system with CMake.  One can test almost all Windows configurations using
> two machines, so author and reviewer can test most everything.  src/tools/msvc
> has more defects than the GNU make build system, so the minimum-quality
> acceptable replacement is easier to deliver.  With CMake driving all MSVC
> builds, it would be much easier to gradually extend to other platforms.

mmm ... dunno about that plan.  In the first place, you seem to be
recommending that YUriy incorporate all the deficiencies of src/tools/msvc
into his first draft and then hope somehow to fix them later.  In the
second place, I think that the core problem here involves being sure
we can still cover all the portability issues that have been laboriously
hacked into the autoconf/gmake build system over nigh two decades.
Starting from a build system that was designed to cover exactly one target
platform, and has never been asked to do anything more than that, doesn't
sound like a recipe for success there.

Maybe you're right about how to approach this, but I wouldn't bet on it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Speed up Clog Access by increasing CLOG buffers

2015-08-31 Thread Amit Kapila
After reducing ProcArrayLock contention in commit
(0e141c0fbb211bdd23783afa731e3eef95c9ad7a), the other lock
which seems to be contentious in read-write transactions is
CLogControlLock.  In my investigation, I found that the contention
is mainly due to two reasons, one is that while writing the transaction
status in CLOG (TransactionIdSetPageStatus()), it acquires EXCLUSIVE
CLogControlLock which contends with every other transaction which
tries to access the CLOG for checking transaction status and to reduce it
already a patch [1] is proposed by Simon; Second contention is due to
the reason that when the CLOG page is not found in CLOG buffers, it
needs to acquire CLogControlLock in Exclusive mode which again contends
with shared lockers which tries to access the transaction status.

Increasing CLOG buffers to 64 helps in reducing the contention due to second
reason.  Experiments revealed that increasing CLOG buffers only helps
once the contention around ProcArrayLock is reduced.

Performance Data
-
RAM - 500GB
8 sockets, 64 cores(Hyperthreaded128 threads total)

Non-default parameters

max_connections = 300
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
checkpoint_timeout=35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 256MB

pgbench setup

scale factor - 300
Data is on magnetic disk and WAL on ssd.
pgbench -M prepared tpc-b

HEAD - commit 0e141c0f
Patch-1 - increase_clog_bufs_v1

Client Count/Patch_ver 1 8 16 32 64 128 256 HEAD 911 5695 9886 18028 27851
28654 25714 Patch-1 954 5568 9898 18450 29313 31108 28213


This data shows that there is an increase of ~5% at 64 client-count
and 8~10% at more higher clients without degradation at lower client-
count. In above data, there is some fluctuation seen at 8-client-count,
but I attribute that to run-to-run variation, however if anybody has doubts
I can again re-verify the data at lower client counts.

Now if we try to further increase the number of CLOG buffers to 128,
no improvement is seen.

I have also verified that this improvement can be seen only after the
contention around ProcArrayLock is reduced.  Below is the data with
Commit before the ProcArrayLock reduction patch.  Setup and test
is same as mentioned for previous test.

HEAD - commit 253de7e1
Patch-1 - increase_clog_bufs_v1


Client Count/Patch_ver 128 256 HEAD 16657 10512 Patch-1 16694 10477


I think the benefit of this patch would be more significant along
with the other patch to reduce CLogControlLock contention [1]
(I have not tested both the patches together as still there are
few issues left in the other patch), however it has it's own independent
value, so can be considered separately.

Thoughts?


[1] -
http://www.postgresql.org/message-id/canp8+j+imqfhxkchfyfnxdyi6k-arazrv+zg-v_ofxetjjo...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


increase_clog_bufs_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Tom Lane
Satoshi Nagayasu  writes:
> On 2015/09/01 13:41, Peter Geoghegan wrote:
>> If you want to use the queryId field directly, which I recall you
>> mentioning before, then that's harder. There is simply no contract
>> among extensions for "owning" a queryId. But when the fingerprinting
>> code is moved into core, then I think at that point queryId may cease
>> to be even a thing that pg_stat_statements theoretically has the right
>> to write into. Rather, it just asks the core system to do the
>> fingerprinting, and finds it within queryId. At the same time, other
>> extensions may do the same, and don't need to care about each other.
>> 
>> Does that work for you?

> Yes. I think so.

> I need some query fingerprint to determine query group. I want queryid
> to keep the same value when query strings are the same (except literal 
> values).

The problem I've got with this is the unquestioned assumption that every
application for query IDs will have exactly the same requirements for
what the ID should include or ignore.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-08-31 Thread Jim Nasby

On 8/31/15 11:59 PM, Pavel Stehule wrote:

The transformation: text -> error level is common task - and PLpgSQL it
does in pl_gram.y. My idea is to add new function to error utils named
"parse_error_level" and use it from PLpgSQL and from your code.


Wouldn't it be better to create an ENUM of error levels instead of 
inventing more parsing code?


Though, I guess ENUMs are case sensitive, but I'd rather solve that by 
creating a CI ENUM, which would be useful across the board...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unicode mapping scripts cleanup

2015-08-31 Thread Peter Eisentraut
Here is a series of patches to clean up the Unicode mapping script
business in src/backend/utils/mb/Unicode/.  It overlaps with the
perlcritic work that I recently wrote about, except that these pieces
are not strictly related to Perl, but wrong comments, missing makefile
pieces, and such.

I discovered that some of the source files that one is supposed to
download don't exist anymore or are labeled obsolete.  Also, running the
scripts produces slight differences in the output.  So apparently, the
CJK to Unicode mappings are still evolving and should be updated
occasionally.  Next steps would be to commit some or all of these
differences after additional verification, and then update the scripts
to use whatever the non-obsolete mapping sources are supposed to be.
>From 2c96b1154c300325654735984f4268df7cc6efcd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 31 Aug 2015 23:59:12 -0400
Subject: [PATCH 01/10] UCS_to_most.pl: Make executable, for consistency with
 other scripts

---
 src/backend/utils/mb/Unicode/UCS_to_most.pl | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 src/backend/utils/mb/Unicode/UCS_to_most.pl

diff --git a/src/backend/utils/mb/Unicode/UCS_to_most.pl b/src/backend/utils/mb/Unicode/UCS_to_most.pl
old mode 100644
new mode 100755
-- 
2.5.1

>From 374d2954b31867e7ffe4ed183f9a9cd7a098cb9b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 31 Aug 2015 23:59:12 -0400
Subject: [PATCH 02/10] Fix comments

Some of these comments were copied and pasted without updating them,
some of them were duplicates.
---
 src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl | 6 +-
 src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl | 4 
 src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl | 6 +-
 src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl | 6 +-
 src/backend/utils/mb/Unicode/UCS_to_SJIS.pl   | 4 
 5 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
index bfc9912..643f02b 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
@@ -49,10 +49,6 @@
 }
 close(FILE);
 
-#
-# first, generate UTF8 --> EUC_CN table
-#
-
 $file = "utf8_to_euc_cn.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapEUC_CN[ $count ] = {\n";
@@ -75,7 +71,7 @@
 close(FILE);
 
 #
-# then generate EUC_JP --> UTF8 table
+# then generate EUC_CN --> UTF8 table
 #
 reset 'array';
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
index 79bc05b..687e668 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
@@ -130,10 +130,6 @@
 }
 close(FILE);
 
-#
-# first, generate UTF8 --> EUC_JP table
-#
-
 $file = "utf8_to_euc_jp.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapEUC_JP[ $count ] = {\n";
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
index fa553fd..82490a0 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
@@ -49,10 +49,6 @@
 }
 close(FILE);
 
-#
-# first, generate UTF8 --> EUC_KR table
-#
-
 $file = "utf8_to_euc_kr.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapEUC_KR[ $count ] = {\n";
@@ -75,7 +71,7 @@
 close(FILE);
 
 #
-# then generate EUC_JP --> UTF8 table
+# then generate EUC_KR --> UTF8 table
 #
 reset 'array';
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
index 02414ba..697b6e6 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
@@ -65,10 +65,6 @@
 }
 close(FILE);
 
-#
-# first, generate UTF8 --> EUC_TW table
-#
-
 $file = "utf8_to_euc_tw.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapEUC_TW[ $count ] = {\n";
@@ -91,7 +87,7 @@
 close(FILE);
 
 #
-# then generate EUC_JP --> UTF8 table
+# then generate EUC_TW --> UTF8 table
 #
 reset 'array';
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
index 74cd7ac..e607e91 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
@@ -66,10 +66,6 @@
 
 close(FILE);
 
-#
-# first, generate UTF8 --> SJIS table
-#
-
 $file = "utf8_to_sjis.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapSJIS[ $count ] = {\n";
-- 
2.5.1

>From 8f19caa272880afc1a0857e6478ac3fac8203cd6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 31 Aug 2015 23:59:12 -0400
Subject: [PATCH 03/10] Remove manually added header 

Re: [HACKERS] [PATCH] SQL function to report log message

2015-08-31 Thread Pavel Stehule
2015-09-01 6:59 GMT+02:00 Pavel Stehule :

>
>
> 2015-08-31 20:43 GMT+02:00 dinesh kumar :
>
>> Hi,
>>
>> On Sat, Aug 29, 2015 at 4:22 PM, Pavel Stehule 
>> wrote:
>>
>>> Hi
>>>
>>> I am starting to work review of this patch
>>>
>>> 2015-07-13 9:54 GMT+02:00 dinesh kumar :
>>>
 Hi All,

 Greetings for the day.

 Would like to discuss on below feature here.

 Feature:
 Having an SQL function, to write messages to log destination.

 Justification:
 As of now, we don't have an SQL function to write
 custom/application messages to log destination. We have "RAISE" clause
 which is controlled by
 log_ parameters. If we have an SQL function which works irrespective of
 log settings, that would be a good for many log parsers. What i mean is, in
 DBA point of view, if we route all our native OS stats to log files in a
 proper format, then we can have our log reporting tools to give most
 effective reports. Also, Applications can log their own messages to
 postgres log files, which can be monitored by DBAs too.

 Implementation:
 Implemented a new function "pg_report_log" which takes one argument
 as text, and returns void. I took, "LOG" prefix for all the reporting
 messages.I wasn't sure to go with new prefix for this, since these are
 normal LOG messages. Let me know, if i am wrong here.

 Here is the attached patch.

>>>
>>> This patch is not complex, but the implementation doesn't cover a
>>> "ereport" well.
>>>
>>> Although this functionality should be replaced by custom function in any
>>> PL (now or near future), I am not against to have this function in core.
>>> There are lot of companies with strong resistance against stored procedures
>>> - and sometimes this functionality can help with SQL debugging.
>>>
>>> Issues:
>>>
>>> 1. Support only MESSAGE field in exception - I am expecting to support
>>> all fields: HINT, DETAIL, ...
>>>
>>
>> Added these functionalities too.
>>
>>
>>> 2. Missing regress tests
>>>
>>
>> Adding here.
>>
>>
>>> 3. the parsing ereport level should be public function shared with
>>> PLpgSQL and other PL
>>>
>>
>> Sorry Pavel. I am not getting your point here. Would you give me an
>> example.
>>
>
> The transformation: text -> error level is common task - and PLpgSQL it
> does in pl_gram.y. My idea is to add new function to error utils named
> "parse_error_level" and use it from PLpgSQL and from your code.
>
>
>>
>>
>>> 4. should be hidestmt mandatory parameter?
>>>
>>
>> I changed this argument's default value as "true".
>>
>>
>>> 5. the function declaration is strange
>>>
>>> postgres=# \sf pg_report_log (text, anyelement, boolean)
>>> CREATE OR REPLACE FUNCTION pg_catalog.pg_report_log(text, anyelement,
>>> boolean)
>>>  RETURNS void
>>>  LANGUAGE sql
>>>  STABLE STRICT COST 1
>>> AS $function$SELECT pg_report_log($1::pg_catalog.text,
>>> $2::pg_catalog.text, $3::boolean)$function$
>>>
>>> Why polymorphic? It is useless on any modern release
>>>
>>>
>> I took quote_ident(anyelement) as referral code, for implementing this.
>> Could you guide me if I am doing wrong here.
>>
>
> I was wrong - this is ok - it is emulation of force casting to text
>
>
>>
>>
>>> postgres=# \sf pg_report_log (text, text, boolean)
>>> CREATE OR REPLACE FUNCTION pg_catalog.pg_report_log(text, text, boolean)
>>>  RETURNS void
>>>  LANGUAGE internal
>>>  IMMUTABLE STRICT
>>> AS $function$pg_report_log$function$
>>>
>>> Why stable, why immutable? This function should be volatile.
>>>
>>> Fixed these to volatile.
>>
>>
>>> 6. using elog level enum as errcode is wrong idea - errcodes are defined
>>> in table
>>> http://www.postgresql.org/docs/9.4/static/errcodes-appendix.html
>>>
>>
>> You mean, if the elevel is 'ERROR', then we need to allow errcode. Let me
>> know your inputs.
>>
>
> I was blind, but the code was not good. Yes, error and higher needs error
> code. From ANSI/SQL anything can has error code "00 is ok", "01 ..
> warnings" ...
>
> There is more possibilities - look to PLpgSQL implementation - it can be
> optional parameter - it default can use ERRCODE_RAISE_EXCEPTION
>
>
>>
>> Adding new patch, with the above fixes.
>>
>
the code looks better

my objections:

1. I prefer default values be NULL
2. readability:
  * parsing error level should be in alone cycle
  * you don't need to use more ereport calls - one is good enough - look on
implementation of stmt_raise in PLpgSQL
3. test should be enhanced for optional parameters

Regards

Pavel


>
>> Thanks in advance.
>>
>> Regards,
>> Dinesh
>>
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>

 Regards,
 Dinesh
 manojadinesh.blogspot.com

>>>
>>>
>>
>


Re: [HACKERS] [PATCH] SQL function to report log message

2015-08-31 Thread Pavel Stehule
2015-09-01 7:20 GMT+02:00 Jim Nasby :

> On 8/31/15 11:59 PM, Pavel Stehule wrote:
>
>> The transformation: text -> error level is common task - and PLpgSQL it
>> does in pl_gram.y. My idea is to add new function to error utils named
>> "parse_error_level" and use it from PLpgSQL and from your code.
>>
>
> Wouldn't it be better to create an ENUM of error levels instead of
> inventing more parsing code?
>

Do you think SQL ENUM? I little bit afraid about possible problems with
pg_upgrade.

It is not simple question - the ENUM can be interesting from custom space
perspective, but from our internal perspective the parsing function is more
practical - and faster. The error level is our internal value, and users
should not to read it - for this purpouse is error level.


>
> Though, I guess ENUMs are case sensitive, but I'd rather solve that by
> creating a CI ENUM, which would be useful across the board...
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [HACKERS] Information of pg_stat_ssl visible to all users

2015-08-31 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
> On Sat, Aug 29, 2015 at 10:27 PM, Bruce Momjian  wrote:
> > I can see them having problems with a user being able to see the SSL
> > remote user names of all connected users.
> 
> I'm pretty sure Heroku don't use client certificates.
> 
> And if they did, I would assume the client certificate would be issued to
>  aafgrwewediiqz, or possibly aafgrwewedi...@customer.heroku.com or
> something along that line.
> 
> Client certificates don't show anything other than the username, unless you
> explicitly choose to put sensitive information in the CN. But we don't
> limit the view of the username in pg_stat_activity, even though people do
> put sensitive things in there (such as the customer name in case of shared
> hosting - everybody doesn't do what Heroku does).
> 
> So pg_stat_ssl doesn't show something that's not already visible.

I don't particularly disagree with any of the above but would instead
reiterate my up-thread comment: we already get grief from various
people, rightly in my mind, that we give unprivileged users too much
information about what other unprivileged users are on the system and
adding more information is going in the wrong direction, even if it's of
the same sensitivity level as what we already allow.

Perhaps it really isn't moving the bar all that much but at least for a
number of our users, it's increasing what they have to be worrying about
("well, we knew usernames were an issue, but now we also have to worry
about system usersnames and the CN in the certificate and...").

The answer, in my view at least, isn't necessairly to seperate the CN
from the username and make them differently levels of access or
sensitivity, but rather to allow administrators to control access to
that collective set of information.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Information of pg_stat_ssl visible to all users

2015-08-31 Thread Andres Freund
On 2015-08-31 09:06:27 -0400, Stephen Frost wrote:
> Perhaps it really isn't moving the bar all that much but at least for a
> number of our users, it's increasing what they have to be worrying about
> ("well, we knew usernames were an issue, but now we also have to worry
> about system usersnames and the CN in the certificate and...").

And to the majority it makes this behave entirely incoherent…

Who would realistically have a randomized username that people log in
with, and then CNs with meaningful contents? That'd mean you'd have to
write complex user mappings between CNs and usernames.


> The answer, in my view at least, isn't necessairly to seperate the CN
> from the username and make them differently levels of access or
> sensitivity, but rather to allow administrators to control access to
> that collective set of information.

I don't think anybody argues against that.

I'm just saying that we should strive to behave at least somewhat
consistently, and change everything at once, not piecemal. Because the
latter will not decrease the pain of migrating to a new model in a
relevant way while making the system harder to understand.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding since-version tags to the docs?

2015-08-31 Thread Shulgin, Oleksandr
On Mon, Aug 31, 2015 at 4:01 PM, Tom Lane  wrote:

> "Shulgin, Oleksandr"  writes:
> > I often find it pity that our docs are missing any information on since
> > when a certain GUC setting, SQL-level command or function was introduced.
> > It would be nice if we could make a script that would parse the sgml
> files
> > and for every symbol it finds it would add a tag like "Since version
> 9.x".
>
> TBH, I think this is a horrid idea.  We occasionally manually add remarks
> like "since version x.y, Postgres does this".  Inevitably, that just bulks
> up the documentation; and it starts to look seriously silly in a few years
> when x.y and all its predecessors are out of support.


Well, I wouldn't name it outright silly: what's so bad about knowing that
certain feature was there since 9.0, for example?  I think can actually
help then sending a docs link to someone who can read the docs, but not the
code (or at least not that easily).

I would also find it more reassuring for myself to read it stated in the
document rather than trying to track down a version where the feature did
appear using git log --grep or the mentioned click-through technique for
older versions.  Can't speak for the others, of course.

It'll be a real
> mess if we do that for everything.
>

I share the fear that it could become messy, but it doesn't necessary *have
to* be a mess.

--
Alex


Re: [HACKERS] Dependency between bgw_notify_pid and bgw_flags

2015-08-31 Thread Ashutosh Bapat
On Sat, Aug 8, 2015 at 7:46 PM, Robert Haas  wrote:

> On Wed, Aug 5, 2015 at 3:33 AM, Ashutosh Bapat
>  wrote:
> > This idea looks good.
>
> Thanks.  It needs testing though to see if it really works as
> intended.  Can you look into that?
>

PFA the patch containing your code changes + test module. See if that meets
your expectations.


>
> > Looking at larger picture, we should also enable this feature to be used
> by
> > auxilliary processes. It's very hard to add a new auxilliary process in
> > current code. One has to go add code at many places to make sure that the
> > auxilliary processes die and are re-started correctly. Even tougher to
> add a
> > parent auxilliary process, which spawns multiple worker processes.That
> would
> > be whole lot simpler if we could allow the auxilliary processes to use
> > background worker infrastructure (which is what they are utlimately).
>
> That's a separate patch, but, sure, we could do that.  I agree with
> Alvaro's comments: the postmaster should start all children.  Other
> processes should just request that it do so.  We have two mechanisms
> for that right now: the one used by bgworkers, and the one used by the
> AV launcher.
>

BY children I really meant workers that it requests postmaster to start,
not the OS definition of child.


>
> > BGWORKER_SHMEM_ACCESS has similar usage, except that it resets the on
> exit
> > callbacks and detaches the shared memory segment from the background
> worker.
> > That avoids a full cluster restart when one of those worker which can not
> > corrupt shared memory dies. But I do not see any check to prevent such
> > backend from calling PGSharedMemoryReattach()
>
> There isn't, but you shouldn't do that.  :-)
>
> This is C code; you can't protect against actively malicious code.
>

We have taken pains to check whether the worker was started with
BGWORKER_BACKEND_DATABASE_CONNECTION flag, when it requests to connect to a
database. I think it makes sense to do that with ACCESS_SHMEM flag as well.
Otherwise, some buggy extension would connect to the shared memory and exit
without postmaster restarting all the backends. Obvious one can argue that,
memory corruption is possible even without this flag, but we should try to
protect exposed interfaces.

>
> > So it looks like, it suffices to assume that background worker either
> needs
> > to access shared memory or doesn't. Any background worker having shared
> > memory access can also access database and thus becomes part of the
> backend
> > list. Or may be we just avoid these flags and treat every background
> worker
> > as if it passed both these flags. That will simplify a lot of code.
>
> I think it's useful to support workers that don't have shared memory
> access at all, because those can crash without causing a system-wide
> reset.  But I don't see the point in distinguishing between workers
> with shared-memory access and those with a database connection.  I
> mean, obviously the worker needs to be able to initialize itself
> either way, but there seems to be no reason to force that to be
> signalled in bgw_flags.  It can just depend on whether
> BackgroundWorkerInitializeConnection gets called.
>

+1.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 000524d..1818f7c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,22 +148,21 @@
  * children we have and send them appropriate signals when necessary.
  *
  * "Special" children such as the startup, bgwriter and autovacuum launcher
  * tasks are not in this list.  Autovacuum worker and walsender are in it.
  * Also, "dead_end" children are in it: these are children launched just for
  * the purpose of sending a friendly rejection message to a would-be client.
  * We must track them because they are attached to shared memory, but we know
  * they will never become live backends.  dead_end children are not assigned a
  * PMChildSlot.
  *
- * Background workers that request shared memory access during registration are
- * in this list, too.
+ * Background workers are in this list, too.
  */
 typedef struct bkend
 {
 	pid_t		pid;			/* process id of backend */
 	long		cancel_key;		/* cancel key for cancels for this backend */
 	int			child_slot;		/* PMChildSlot for this backend, if any */
 
 	/*
 	 * Flavor of backend or auxiliary process.  Note that BACKEND_TYPE_WALSND
 	 * backends initially announce themselves as BACKEND_TYPE_NORMAL, so if
@@ -397,27 +396,25 @@ static int	ServerLoop(void);
 static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool SSLdone);
 static void processCancelRequest(Port *port, void *pkt);
 static int	initMasks(fd_set *rmask);
 static void report_fork_failure_to_client(Port *port, int errnum);
 static CAC_state 

Re: [HACKERS] Commitfest remaining "Needs Review" items

2015-08-31 Thread Magnus Hagander
On Sun, Aug 30, 2015 at 2:53 PM, Michael Paquier 
wrote:

>
>
> And the commit fest of 2015-07 is now closed with the following score:
> Committed: 58.
> Moved to next CF: 25.
> Rejected: 9.
> Returned with Feedback: 25.
> Total: 117.
> Thanks!
>
>
Ugh. Good to have it closed, but it seems we're still in
continous-commitfest mode :(

Anyway - that CF is closed, but we seem to not have *any* CF open at this
point. Should we not make 2015-11 open?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Commitfest remaining "Needs Review" items

2015-08-31 Thread Andres Freund
On 2015-08-31 16:22:54 +0300, Alexander Korotkov wrote:
> Is it correct to switch 2015-09 commitfest to inprogress now?

Yea, isn't it only starting the 15th? Can we add an option to display
days in the CF app?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >