date:20171212

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-12 Thread Andres Freund

On 2017-12-13 16:02:45 +0900, Masahiko Sawada wrote:
> When we add extra blocks on a relation do we access to the disk? I
> guess we just call lseek and write and don't access to the disk. If so
> the performance degradation regression might not be much.

Usually changes in the file size require the filesystem to perform
metadata operations, which in turn requires journaling on most
FSs. Which'll often result in synchronous disk writes.

Greetings,

Andres Freund

Top-N sorts verses parallelism

2017-12-12 Thread Thomas Munro

Hi hackers,

The start-up cost of bounded (top-N) sorts is sensitive at the small
end of N, and the
comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
seem to correspond to reality.  Here's a contrived example that comes
from a benchmark:

  create table foo as
select generate_series(1, 100)::int a, (random() * 1000)::int b;
  analyze foo;
  select * from foo order by b limit N;

Estimated start-up costs for different N:

  limit 1 = 19425.00
  limit 2 = 24425.00
  limit 3 = 27349.81
  limit 4 = 29425.00
  limit 10 = 36034.64
  limit 50 = 47644.28
  limit 100 = 52644.28
  limit 133 = 54701.41

But the actual time doesn't really change on this random development
system: it's around 300ms.  I stopped at limit 133 because for this
particular query, at 134 a Gather Merge plan kicked in and I got
degree 3 parallel sorting and I got my answer just shy of three times
faster.  The speed up definitely paid for the parallel_setup_cost and
only one tuple was sent from each worker to the leader because the
limit was pushed down.

I want to get my answer three times as fast when I say limit 1 too!
But I can't because we seem to think that limit 1 is dramatically
cheaper than limit 134, so parallelism isn't worth bothering with.

With wider tuples the same effect can be seen, though it gets much
more difficult to convince Gather Merge to be selected at all for
reasons I haven't looked into yet.  Is Gather Merge for top-N sorts a
missed opportunity due to underestimation of top-N for small values of
N?  Or is there some other way to think about this problem?  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

2017-12-12 Thread Kuntal Ghosh

On Tue, Dec 12, 2017 at 5:20 PM, Amit Kapila  wrote:
> On Tue, Dec 12, 2017 at 4:00 PM, Kuntal Ghosh
>  wrote:
>> On Mon, Dec 11, 2017 at 2:26 PM, Thomas Munro
>>  wrote:
>>> On Mon, Dec 11, 2017 at 8:14 PM, Amit Kapila  
>>> wrote:
>>>
 Thanks for looking into it.  I will see if we can write some test.  In
 the meantime if possible, can you please request Patrick Hemmer to
 verify the attached patch?
>>>
>>> Our discussion was on the #postgresql Freenode channel.  I pointed him
>>> at this thread, but I'm not sure if he'll see my message or be able to
>>> test.
>> After discussing with Amit, I'm able to reproduce the scenario in a
>> master-standby setup. The issue occurs when we perform parallel
>> index(-only) scan on a BTP_HALF_DEAD -marked page. (If a page is
>> marked as BTP_DELETED, it's already unlinked from the index).
>>
>> When a btree page is deleted during vacuum, it's first marked as
>> BTP_HALF_DEAD in _bt_mark_page_halfdead and then marked as BTP_DELETED
>> in _bt_unlink_halfdead_page without releasing cleanup lock on the
>> buffer. Hence, any scan node cannot lock the same buffer. So, the
>> issue can't be reproduced on master.
>>
>> However, after replaying XLOG_BTREE_MARK_PAGE_HALFDEAD record, standby
>> releases the lock on the same buffer. If we force parallelism, an
>> index scan on the same page will cause hang the standby server.
>> Following is a (unpleasant)way to reproduce the issue:
>>
>> In master (with autovacuum = off):
>> 1. create table t1(a int primary key);
>> 2. insert into t1 select * from generate_series(1,1000); --generates 3
>> leaf nodes (block no 1,2,4) and 1 root node (block no 3)
>> 3. delete from t1 where a>=367 and a<=735; --delete all tuples pointed by 
>> leaf 2
>> 4. analyze t1; --update the stats
>> 5. explain analyze select * from t1 where a>=1 and a<=1000; --ensures
>> that the next vacuum will consider leaf 2 for page deletion
>
> What do you mean by next vacuum, here autovacuum is off?  Are you
> missing any step which manually performs the vacuum?
>
Yeah, you've to manually vacuum the table.
6. vacuum t1.

>> Now, put a break point at _bt_unlink_halfdead_page, so that vacuum
>> can't unlink the page.
>>
>> In standby,
>> 1. force parallelism.
>> 2. explain analyze select * from t1 where a>=1 and a<=1000; and the
>> parallel workers hang at the above-discussed place!
>>



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] SERIALIZABLE with parallel query

2017-12-12 Thread Thomas Munro

On Wed, Dec 13, 2017 at 2:09 PM, Haribabu Kommi
 wrote:
> Thanks for explaining the problem in generating an isolation test to
> test the serialize parallel query.
>
> Committer can decide whether existing test is fine to part of the test suite
> or remove it, other than that everything is fine. so I am moving the patch
> into "ready for committer" state.

Thank you!  I will try to find a good benchmark that will really
exercise parallel query + SSI.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

2017-12-12 Thread Thomas Munro

On Wed, Dec 13, 2017 at 4:20 PM, Thomas Munro
 wrote:
> On Wed, Dec 13, 2017 at 3:41 PM, Amit Kapila  wrote:
>> Good.  I hope that the patch I have posted above is able to resolve
>> this problem.  I am asking as you haven't explicitly mentioned that.
>
> I can confirm that your patch fixes the problem for forward scans.
> That is, I can see it reaching the BTP_DELETED case via an extra LOG
> statement I added, and it worked correctly.  Good.
>
> I don't know how to make it hit the backwards scan case.  I can get a
> backward scan in a worker by changing the query to "select count(*)
> from (select * from jobs where id + 1 > id order by status desc) ss"
> but I suspect that _bt_walk_left() may be hiding deleted pages from us
> so the condition may not be reachable with this technique.

Hmm, no that's not right: clearly it can return half-dead or deleted
pages to the caller.  So I don't know why I never seem to encounter
any, despite concurrent vacuums producing them; maybe something to do
with the interlocking you get with vacuum when you traverse the btree
by walking left -- my btree-fu is not yet strong enough.

-- 
Thomas Munro
http://www.enterprisedb.com

pg_ctl on windows can't open postmaster.pid: Permission denied

2017-12-12 Thread Andres Freund

Hi,

Buildfarm animal thrips just failed with a curious error:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thrips=2017-12-13%2002%3A27%3A27

== shutting down postmaster   ==
pg_ctl: could not open PID file 
"C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid":
 Permission denied

otherwise there were no failures.

I wonder if we're not opening the file with the right options to allow
us to open it while it's opened for writes in another backend? In the
backend we do so via FILE_SHARE_READ pgwin32_open which backs open and
fopen in backend code.

I've not seen this error previously, so maybe this is just a random
one-off.

Greetings,

Andres Freund

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-12-12 Thread Haribabu Kommi

On Wed, Nov 29, 2017 at 4:25 PM, Michael Paquier 
wrote:

> On Wed, Nov 8, 2017 at 8:50 AM, Haribabu Kommi 
> wrote:
> > Ok. Removed the documentation changes that it cannot be used for normal
> > scenarios, and also added a Note section explaining the problem of using
> > the dump with pg_restore command with --clean and --create options.
>
> Hari, the documentation portion of the patch does not apply. Could you
> rebase? For now I am moving it to next CF as this did not get any
> reviews, and the status is switched to "waiting on author".
>

Rebased patch attached that fixes the documentation build problem.

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v11.patch
Description: Binary data

Re: Incorrect debug info printed in generate_partition_wise_join_paths

2017-12-12 Thread Etsuro Fujita


(2017/12/13 0:56), Robert Haas wrote:

Committed.


Thanks!

Best regards,
Etsuro Fujita

Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2017-12-12 Thread Michael Paquier

On Wed, Dec 13, 2017 at 10:41 AM, Amit Langote
 wrote:
> On 2017/12/13 9:22, Ali Akbar wrote:
>> For me, it's better to prevent that from happening. So, attempts to
>> DROP NOT NULL on the child must be rejected. The attached patch does
>> that.
>>
>> Unfortunately, pg_class has no "has_parent" attribute, so in this
>> patch, it hits pg_inherits everytime DROP NOT NULL is attempted.
>>
>> Notes:
>> - It looks like we could remove the parent partition checking above?
>> Because the new check already covers what it does
>
> I noticed that too.
>
> So, if we're going to prevent DROP NOT NULL on *all* child table (not just
> partitions), we don't need the code that now sits there to prevent that
> only for partitions.

It is not the first time that this topic shows up. See for example
this thread from myself's version of last year:
https://www.postgresql.org/message-id/cab7npqtpxgx9hiyhhtagpw7jba1iskmcsoqxpeeb_kyxyy1...@mail.gmail.com
Or this one:
http://www.postgresql.org/message-id/21633.1448383...@sss.pgh.pa.us

And the conclusion is that we don't want to do what you are doing
here, and that it would be better to store NOT NULL constraints in a
way similar to CHECK constraints.

> How about: ".. if some parent has the same"
>
> +   heap_close(parent, AccessShareLock);
>
> Maybe, we shouldn't be dropping the lock so soon.

Yes, such things usually need to be kept until the end of the
transaction, and usually you need to be careful about potential lock
upgrades that could cause deadlocks. This patch looks wrong for both
of those things.
-- 
Michael

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread David G. Johnston

On Tue, Dec 12, 2017 at 6:26 PM, Craig Ringer  wrote:

>
>> There is also stackexchange question with 51000 views that asks how to
>> start the postgresql client. Should we rename psql to mysql to help
>> new users too?
>>
>
> Frankly, that's a legit question and I don't think it's one we should be
> mocking. The application is "PostgreSQL" and it's not at all surprising
> that people don't find it immediately obvious that the client is "psql".
> Yes, we have a manual, but if they have to refer to it for something that
> trivial then that's an opportunity for us to improve UX.
>
> Especially since "postgres" isn't on the PATH in most installs.
>
> I'd suggest having 'postgres' print a help msg suggesting "the client
> program is psql"... but that does no good if you get
>

This seems like a somewhat easy one to solve...add "postgresql" "postgres"
"postgre" "pgsql" "theelephantdb" "sql" and whatever other names we think
come to people's minds and place them in the client bin directory right
next to psql.  Have them do something useful - either the equivalent of
"exec psql" or "echo 'use psql to invoke the client'".  Make it a
compile-time option so package maintainers who disagree (or place postgres
server in the path) can opt-out (or opt-in) - but at least not every
packager would need to write their own.  That said, I consider this
particular aliasing problem to be a packaging problem and not the
responsibility of core.

David J.

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-12-12 Thread Masahiko Sawada

On Wed, Dec 13, 2017 at 12:03 AM, Robert Haas  wrote:
> On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada  
> wrote:
>>> The question I have is how would we deal with a foreign server that is
>>> not available for longer duration due to crash, longer network outage
>>> etc. Example is the foreign server crashed/got disconnected after
>>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
>>> blocked for much longer duration without user having an idea of what's
>>> going on. May be we should add some timeout.
>>
>> After more thought, I agree with adding some timeout. I can image
>> there are users who want the timeout, for example, who cannot accept
>> even a few seconds latency. If the timeout occurs backend unlocks the
>> foreign transactions and breaks the loop. The resolver process will
>> keep to continue to resolve foreign transactions at certain interval.
>
> I don't think a timeout is a very good idea.  There is no timeout for
> synchronous replication and the issues here are similar.  I will not
> try to block a patch adding a timeout, but I think it had better be
> disabled by default and have very clear documentation explaining why
> it's really dangerous.  And this is why: with no timeout, you can
> count on being able to see the effects of your own previous
> transactions, unless at some point you sent a query cancel or got
> disconnected.  With a timeout, you may or may not see the effects of
> your own previous transactions depending on whether or not you hit the
> timeout, which you have no sure way of knowing.
>
 transactions after the coordinator server recovered. On the other
 hand, for the reading a consistent result on such situation by
 subsequent reads, for example, we can disallow backends to inquiry SQL
 to the foreign server if a foreign transaction of the foreign server
 is remained.
>>>
>>> +1 for the last sentence. If we do that, we don't need the backend to
>>> be blocked by resolver since a subsequent read accessing that foreign
>>> server would get an error and not inconsistent data.
>>
>> Yeah, however the disadvantage of this is that we manage foreign
>> transactions per foreign servers. If a transaction that modified even
>> one table is remained as a in-doubt transaction, we cannot issue any
>> SQL that touches that foreign server. Can we occur an error at
>> ExecInitForeignScan()?
>
> I really feel strongly we shouldn't complicate the initial patch with
> this kind of thing.  Let's make it enough for this patch to guarantee
> that either all parts of the transaction commit eventually or they all
> abort eventually.  Ensuring consistent visibility is a different and
> hard project, and if we try to do that now, this patch is not going to
> be done any time soon.
>

Thank you for the suggestion.

I was really wondering if we should add a timeout to this feature.
It's a common concern that we want to put a timeout at critical
section. But currently we don't have such timeout to neither
synchronous replication or writing WAL. I can image there will be
users who want to a timeout for such cases but obviously it makes this
feature more complex. Anyway, even if we add a timeout to this feature
we can make it as a separated patch and feature. So I'd like to keep
it simple as first step. This patch guarantees that the transaction
commit or rollback on all foreign servers or not unless users doesn't
cancel.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

[PATCH] Possible NULL deref in str_time

2017-12-12 Thread Bill Moyers

I noticed that the output of pg_localtime() in str_time() is not checked 
and can sometimes return NULL. It's pretty unlikely to occur, I guess if 
the time() function was acting funny. For example if I define this:


time_t
fake_time(void *blah)
{
return 0x0110;
}

and then call fake_time() instead of time() everywhere, then str_time() 
does the NULL dereference at startup.


* possible-null-deref-in-str_time.patch

Check the pg_localtime() output. This should not have a performance 
impact because this function is not called often.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0791404263..d94be48908 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5187,9 +5187,19 @@ str_time(pg_time_t tnow)
 {
 	static char buf[128];
 
+	const struct pg_tm *t = pg_localtime(, log_timezone);
+	if (0 && t == NULL)
+	{
+		ereport(ERROR,
+(ERRCODE_INVALID_DATETIME_FORMAT,
+ errmsg("failed to convert to localtime")));
+		buf[0] = '\0';
+		return buf;
+	}
+
 	pg_strftime(buf, sizeof(buf),
 "%Y-%m-%d %H:%M:%S %Z",
-pg_localtime(, log_timezone));
+t);
 
 	return buf;
 }

Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2017-12-12 Thread Amit Langote

Hi.

On 2017/12/13 9:22, Ali Akbar wrote:
> For me, it's better to prevent that from happening. So, attempts to
> DROP NOT NULL on the child must be rejected. The attached patch does
> that.
> 
> Unfortunately, pg_class has no "has_parent" attribute, so in this
> patch, it hits pg_inherits everytime DROP NOT NULL is attempted.
> 
> Notes:
> - It looks like we could remove the parent partition checking above?
> Because the new check already covers what it does

I noticed that too.

So, if we're going to prevent DROP NOT NULL on *all* child table (not just
partitions), we don't need the code that now sits there to prevent that
only for partitions.

Minor comments on the patch:

+   /* If rel has parents, shoudn't drop NOT NULL if parent has the same */

How about: ".. if some parent has the same"

+   heap_close(parent, AccessShareLock);

Maybe, we shouldn't be dropping the lock so soon.

Thanks,
Amit

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Craig Ringer

On 12 December 2017 at 23:28, Avinash Kumar 
wrote:

> It takes a few hours to learn PostgreSQL and 1 second to remember, how to
> quit from psql.
> Surprised to see the depth of discussion on this.
>
> It takes sometime for a person to explore or know about PostgreSQL and
> install it.
> Likewise, it takes him some more time to understand that there exists a
> "psql" to connect to it.
> He knows about psql because he searched it somewhere. Just a psql --help
> (anybody uses help), should help me know much more about psql.
> I think many people doesn't find it difficult to quit using \q.
>
> I think its good to talk about or help with more features that helps a
> production database.
>
>
It's easy to dismiss usability concerns, but new users are what keeps a
community healthy. Users who go "WTF is this @#$@" and bail aren't
necessarily the ones you want to lose; sometimes they're just busy or are
evaluating a few things at once. Early UX matters.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

2017-12-12 Thread Thomas Munro

Hi,

Here's a reproducer which enabled me to reach this stuck state:

  pid  |  wait_event   |query
---+---+-
 64617 |   | select pid, wait_event, query from
pg_stat_activity where state = 'active';
 64619 | BufferPin | VACUUM jobs
 64620 | ExecuteGather | SELECT COUNT(*) FROM jobs
 64621 | ExecuteGather | SELECT COUNT(*) FROM jobs
 64622 | ExecuteGather | SELECT COUNT(*) FROM jobs
 64623 | ExecuteGather | SELECT COUNT(*) FROM jobs
 84167 | BtreePage | SELECT COUNT(*) FROM jobs
 84168 | BtreePage | SELECT COUNT(*) FROM jobs
 96440 |   | SELECT COUNT(*) FROM jobs
 96438 |   | SELECT COUNT(*) FROM jobs
 96439 |   | SELECT COUNT(*) FROM jobs
(11 rows)

The main thread deletes stuff in the middle of the key range (not sure
if this is important) and vacuum in a loop, and meanwhile 4 threads
(probably not important, might as well be 1) run Parallel Index Scans
over the whole range, in the hope of hitting the interesting case.  In
the locked-up case I just saw now opaque->btpo_flags had the
BTP_DELETED bit set, not BTP_HALF_DEAD (I could tell because I added
logging).  Clearly pages are periodically being marked half-dead but I
haven't yet managed to get an index scan to hit one of those.

-- 
Thomas Munro
http://www.enterprisedb.com
/*
   Set up the schema like this:

drop table jobs;
create table jobs (id int, blah text, status text);
create index on jobs(status);
insert into jobs select generate_series(1, 10), 'hello world';
update jobs set status = 'A' where id % 100 = 0;
analyze jobs;
alter table jobs set (parallel_workers = 2);

  You should get a Parallel Index Only Scan like this:

set enable_seqscan = off;
set max_parallel_workers_per_gather = 2;
set parallel_leader_participation = off;
set min_parallel_index_scan_size = 0;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
EXPLAIN SELECT COUNT(*) FROM jobs;

 */

#include 

#include 
#include 
#include 
#include 

#define CONNINFO "dbname=postgres"
#define NTHREADS 4

static void *
thread_main(void *data)
{
	int			i;
	PGconn	   *conn;
	PGresult   *res;

	conn = PQconnectdb(CONNINFO);
	assert(PQstatus(conn) == CONNECTION_OK);

	res = PQexec(conn, "SET enable_seqscan = off");
	assert(PQresultStatus(res) == PGRES_COMMAND_OK);
	PQclear(res);
	res = PQexec(conn, "SET max_parallel_workers_per_gather = 2");
	assert(PQresultStatus(res) == PGRES_COMMAND_OK);
	PQclear(res);
	res = PQexec(conn, "SET parallel_leader_participation = off");
	assert(PQresultStatus(res) == PGRES_COMMAND_OK);
	PQclear(res);
	res = PQexec(conn, "SET min_parallel_index_scan_size = 0");
	assert(PQresultStatus(res) == PGRES_COMMAND_OK);
	PQclear(res);
	res = PQexec(conn, "SET parallel_setup_cost = 0");
	assert(PQresultStatus(res) == PGRES_COMMAND_OK);
	PQclear(res);
	res = PQexec(conn, "SET parallel_tuple_cost = 0");
	assert(PQresultStatus(res) == PGRES_COMMAND_OK);
	PQclear(res);

	for (i = 0; i < 100; ++i)
	{
		//res = PQexec(conn, "SELECT COUNT(*) FROM jobs WHERE status = 'B'");
		res = PQexec(conn, "SELECT COUNT(*) FROM jobs");

		assert(PQresultStatus(res) == PGRES_TUPLES_OK);
		assert(PQnfields(res) == 1);
		assert(PQntuples(res) == 1);
		PQclear(res);
	}
	PQfinish(conn);
	return 0;
}

int
main(int argc, char **argv)
{
	pthread_t threads[NTHREADS];
	pthread_attr_t attr;
	int i;
	void *r;
	PGconn	   *conn;
	PGresult   *res;
	conn = PQconnectdb(CONNINFO);
	assert(PQstatus(conn) == CONNECTION_OK);

	pthread_attr_init();

	for (i = 0; i < NTHREADS; ++i)
	{
		if (pthread_create([i], , _main, NULL) != 0)
		{
			fprintf(stderr, "failed to create thread\n");
			exit(1);
		}
	}

	for (i = 5; i < 10; ++i)
	{
		char buf[1024];

		printf("updating %d\n", i);

		res = PQexec(conn, "BEGIN");
		assert(PQresultStatus(res) == PGRES_COMMAND_OK);
		PQclear(res);

		snprintf(buf, sizeof(buf), "DELETE FROM jobs WHERE id = %d", i);
		res = PQexec(conn, buf);
		assert(PQresultStatus(res) == PGRES_COMMAND_OK);
		PQclear(res);

		snprintf(buf, sizeof(buf), "INSERT INTO jobs VALUES (%d, 'xxx', %s)", i + 5, (i % 100 == 0) ? "'A'" : (i % 100 == 1) ? "'B'" : (i % 100 == 2) ? "'C'" : "NULL");
		res = PQexec(conn, buf);
		assert(PQresultStatus(res) == PGRES_COMMAND_OK);
		PQclear(res);

		res = PQexec(conn, "COMMIT");
		assert(PQresultStatus(res) == PGRES_COMMAND_OK);
		PQclear(res);

		res = PQexec(conn, "VACUUM jobs");
		assert(PQresultStatus(res) == PGRES_COMMAND_OK);
		PQclear(res);
	}
	PQfinish(conn);

	for (i = 0; i < NTHREADS; ++i)
		pthread_cancel(threads[i]);

	for (i = 0; i < NTHREADS; ++i)
		pthread_join(threads[i], );
}

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Craig Ringer

On 12 December 2017 at 23:02, Geoff Winkless  wrote:

> I wouldn't exactly say -1 but it's a "meh" from me.
>
> On 8 December 2017 at 13:57, Daniel Vérité" 
> wrote:
> > "How to exit from PostgreSQL command line utility: psql"
> >
> > now at 430k views and 1368 upvotes.
>
> There is also stackexchange question with 51000 views that asks how to
> start the postgresql client. Should we rename psql to mysql to help
> new users too?
>

Frankly, that's a legit question and I don't think it's one we should be
mocking. The application is "PostgreSQL" and it's not at all surprising
that people don't find it immediately obvious that the client is "psql".
Yes, we have a manual, but if they have to refer to it for something that
trivial then that's an opportunity for us to improve UX.

Especially since "postgres" isn't on the PATH in most installs.

I'd suggest having 'postgres' print a help msg suggesting "the client
program is psql"... but that does no good if you get

$ postgres
bash: postgres: command not found...
Install package 'postgresql-server' to provide command 'postgres'? [N/y]

(then there's the Ubuntu versions that "helpfully" suggest you might want
to install postgres-xc. Seriously. WTF, people.)

Ctrl-D works almost everywhere.

... if you are used to using a command line interface and know it even
exists.

It takes 2 seconds to find the answer
> and there are going to be many other things you have to find out if
> you're coming at PostgreSQL from a different DB.
>

Yep. But that doesn't mean we should make the simple things harder than
they need to be. Why have \? at all!

> As others have pointed out, there are _far_ more frustrating things
> (like the lack of SHOW CREATE [TABLE|etc]... support in the backend)
>

I cannot overemphasise the importance of that one. I don't care that much
about the spelling, tbh, but the fact that table metadata formatting and
display is done wholly client-side and cannot be used in applications is
absolutely horrid, and a real wart in the system. If we had the server-side
functionality then some special case HINTs in error messages for common
systems would be reasonable enough.

Frankly that's probably a good idea anyway, adding a specific errhint. But
it's really a separate topic.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] SERIALIZABLE with parallel query

2017-12-12 Thread Haribabu Kommi

On Fri, Dec 8, 2017 at 11:33 AM, Thomas Munro  wrote:

> On Thu, Nov 30, 2017 at 2:44 PM, Thomas Munro
>  wrote:
> > On Thu, Nov 30, 2017 at 2:32 PM, Michael Paquier
> >  wrote:
> >> Could this question be answered? The patch still applies so I am
> >> moving it to next CF.
>
> Rebased, 'cause it broke.

Thanks for explaining the problem in generating an isolation test to
test the serialize parallel query.

Committer can decide whether existing test is fine to part of the test suite
or remove it, other than that everything is fine. so I am moving the patch
into "ready for committer" state.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 5:07 PM, Tomas Vondra
 wrote:
>> I definitely think there's a place for compression built right into
>> the data type.  I'm still happy about commit
>> 145343534c153d1e6c3cff1fa1855787684d9a38 -- although really, more
>> needs to be done there.  But that type of improvement and what is
>> proposed here are basically orthogonal.  Having either one is good;
>> having both is better.
>>
> Why orthogonal?

I mean, they are different things.  Data types are already free to
invent more compact representations, and that does not preclude
applying pglz to the result.

> For example, why couldn't (or shouldn't) the tsvector compression be
> done by tsvector code itself? Why should we be doing that at the varlena
> level (so that the tsvector code does not even know about it)?

We could do that, but then:

1. The compression algorithm would be hard-coded into the system
rather than changeable.  Pluggability has some value.

2. If several data types can benefit from a similar approach, it has
to be separately implemented for each one.

3. Compression is only applied to large-ish values.  If you are just
making the data type representation more compact, you probably want to
apply the new representation to all values.  If you are compressing in
the sense that the original data gets smaller but harder to interpret,
then you probably only want to apply the technique where the value is
already pretty wide, and maybe respect the user's configured storage
attributes.  TOAST knows about some of that kind of stuff.

> It seems to me the main reason is that tsvector actually does not allow
> us to do that, as there's no good way to distinguish the different
> internal format (e.g. by storing a flag or format version in some sort
> of header, etc.).

That is also a potential problem, although I suspect it is possible to
work around it somehow for most data types.  It might be annoying,
though.

>> I think there may also be a place for declaring that a particular data
>> type has a "privileged" type of TOAST compression; if you use that
>> kind of compression for that data type, the data type will do smart
>> things, and if not, it will have to decompress in more cases.  But I
>> think this infrastructure makes that kind of thing easier, not harder.
>
> I don't quite understand how that would be done. Isn't TOAST meant to be
> entirely transparent for the datatypes? I can imagine custom TOAST
> compression (which is pretty much what the patch does, after all), but I
> don't see how the datatype could do anything smart about it, because it
> has no idea which particular compression was used. And considering the
> OIDs of the compression methods do change, I'm not sure that's fixable.

I don't think TOAST needs to be entirely transparent for the
datatypes.  We've already dipped our toe in the water by allowing some
operations on "short" varlenas, and there's really nothing to prevent
a given datatype from going further.  The OID problem you mentioned
would presumably be solved by hard-coding the OIDs for any built-in,
privileged compression methods.

> Well, it wasn't my goal to suddenly widen the scope of the patch and
> require it adds all these pieces. My intent was more to point to pieces
> that need to be filled in the future.

Sure, that's fine.  I'm not worked up about this, just explaining why
it seems reasonably well-designed to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Leftover reference to replacement selection 1 run case

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 6:44 PM, Peter Geoghegan  wrote:
> Yes, you have that right. Per dumptuples(), even the zero tuple run edge
> case will still write a run marker, and will therefore still consume a tape.
> We must have at least two initial runs to merge. (though dummy runs for non
> final merges are a slightly different matter.)

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2017-12-12 Thread Ali Akbar

> For me, it's better to prevent that from happening. So, attempts to
> DROP NOT NULL on the child must be rejected. The attached patch does
> that.

I'm sorry. Accidentaly i "--color"-ed the patch format. Attached the
correct patch.

Best Regards,
Ali Akbar
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 1bd8a58b7f..74903a8f24 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -242,6 +242,40 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
 }
 
 
+/*
+ * get_superclasses -
+ *		Returns a list of relation OIDs of direct parents
+ */
+List *
+get_superclasses(Oid relationId)
+{
+	List	   *list = NIL;
+	Relation	catalog;
+	SysScanDesc scan;
+	ScanKeyData skey;
+	HeapTuple	inheritsTuple;
+	Oid			inhparent;
+
+	catalog = heap_open(InheritsRelationId, AccessShareLock);
+	ScanKeyInit(, Anum_pg_inherits_inhrelid, BTEqualStrategyNumber,
+F_OIDEQ, ObjectIdGetDatum(relationId));
+	scan = systable_beginscan(catalog, InheritsRelidSeqnoIndexId, true,
+			  NULL, 1, );
+
+	while ((inheritsTuple = systable_getnext(scan)) != NULL)
+	{
+		inhparent = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhparent;
+		list = lappend_oid(list, inhparent);
+	}
+
+	systable_endscan(scan);
+
+	heap_close(catalog, AccessShareLock);
+
+	return list;
+}
+
+
 /*
  * has_subclass - does this relation have any children?
  *
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d979ce266d..c76fc3715d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5683,6 +5683,8 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	Relation	attr_rel;
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
+	List	   *parentlist;
+	ListCell   *parentscan;
 	ObjectAddress address;
 
 	/*
@@ -5773,6 +5775,24 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 		heap_close(parent, AccessShareLock);
 	}
 
+	/* If rel has parents, shoudn't drop NOT NULL if parent has the same */
+	parentlist = get_superclasses(RelationGetRelid(rel));
+	foreach(parentscan, parentlist) {
+		Oid			parentId = lfirst_oid(parentscan);
+		Relation	parent = heap_open(parentId, AccessShareLock);
+		TupleDesc	tupDesc = RelationGetDescr(parent);
+		AttrNumber	parent_attnum;
+
+		parent_attnum = get_attnum(parentId, colName);
+		if (parent_attnum != InvalidAttrNumber &&
+			TupleDescAttr(tupDesc, parent_attnum - 1)->attnotnull)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+	 errmsg("column \"%s\" is marked NOT NULL in parent table",
+			colName)));
+		heap_close(parent, AccessShareLock);
+	}
+
 	/*
 	 * Okay, actually perform the catalog change ... if needed
 	 */
diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h
index 7743388899..291861b846 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -20,6 +20,7 @@
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
 	List **parents);
+extern List *get_superclasses(Oid relationId);
 extern bool has_subclass(Oid relationId);
 extern bool has_superclass(Oid relationId);
 extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId);

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Michael Paquier

On Wed, Dec 13, 2017 at 8:30 AM, Andres Freund  wrote:
> On 2017-12-13 08:27:42 +0900, Michael Paquier wrote:
>> >> (OTOH, I bet we could drop reltime/abstime without too many complaints.
>> >> Y2038 is coming.)
>> >
>> > I'm actually about to send a patch doing so, that code is one mess WRT
>> > overflow handling.
>>
>> Agreed. I think as well that those should be fixed. It does not seem
>> much complicated to fix them.
>
> I'm not following. I was trying to say that I'll send a patch removing
> the abstime/reltime/tinterval code.

My mistake here. I thought that you were specifically referring to the
money data type here. I did not parse your previous message correctly.
-- 
Michael

Re: Leftover reference to replacement selection 1 run case

2017-12-12 Thread Peter Geoghegan

Yes, you have that right. Per dumptuples(), even the zero tuple run edge
case will still write a run marker, and will therefore still consume a
tape. We must have at least two initial runs to merge. (though dummy runs
for non final merges are a slightly different matter.)

--
Peter Geoghegan
(Sent from my phone)

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread David Steele


On 12/12/17 6:33 PM, Andres Freund wrote:


On 2017-12-12 18:30:47 -0500, David Steele wrote:

If we had a way to prevent relfilenode reuse across multiple checkpoints
this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate.


Or error the backup if there is wraparound?


That seems entirely unacceptable to me. On a machine with lots of
toasting etc going on an oid wraparound doesn't take a long time. We've
only one oid counter for all tables, and relfilenodes are inferred from
that 


Fair enough.  I'll think on it.

--
-David
da...@pgmasters.net

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread Andres Freund

Hi,

On 2017-12-12 18:30:47 -0500, David Steele wrote:
> > If we had a way to prevent relfilenode reuse across multiple checkpoints
> > this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate.
> 
> Or error the backup if there is wraparound?

That seems entirely unacceptable to me. On a machine with lots of
toasting etc going on an oid wraparound doesn't take a long time. We've
only one oid counter for all tables, and relfilenodes are inferred from
that 

Greetings,

Andres Freund

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread David Steele


On 12/12/17 6:21 PM, Andres Freund wrote:

On 2017-12-12 18:18:09 -0500, David Steele wrote:

On 12/12/17 6:07 PM, Andres Freund wrote:


It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled.  What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?


Well, that's a good point!

How about rechecking the presence of the init fork after a main/other fork
has been found?  Is it possible for an init fork to still be lying around
after an oid has been recycled? Seems like it could be...


I don't see how that'd help. You could just have gone through this cycle
multiple times by the time you get to rechecking. All not very likely,
but I don't want us to rely on luck here...


Definitely not.


If we had a way to prevent relfilenode reuse across multiple checkpoints
this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate.


Or error the backup if there is wraparound?

We already have an error if a standby is promoted during backup -- so 
there is some precedent.



I guess we could have the basebackup create placeholder files that
prevent relfilenode reuse, but that seems darned ugly.


Yes, very ugly.

--
-David
da...@pgmasters.net

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Andres Freund

On 2017-12-13 08:27:42 +0900, Michael Paquier wrote:
> On Wed, Dec 13, 2017 at 6:50 AM, Andres Freund  wrote:
> > On 2017-12-12 16:47:17 -0500, Tom Lane wrote:
> >> Really?  We've got test cases that intentionally exercise overflow
> >> in the money code?  I think we could just drop such tests, until
> >> such time as someone fixes the issue.
> >
> > Some parts at least (input & output), I think it's easy enough to fix
> > those up.
> 
> There could be two ways to fix that:
> 1) Call the int8 equivalents with DirectFunctionCall2 and rely on the
> overflow there, but this has a performance impact.
> 2) Add similar checks as in int8.c, which feels like duplicating code
> but those are cheap.
> You are heading to 2) I guess?

I don't think 1) makes much sense. The error messages will be bad, and
the harder cases (e.g. cash_in()) can't share code anyway.


> >> (OTOH, I bet we could drop reltime/abstime without too many complaints.
> >> Y2038 is coming.)
> >
> > I'm actually about to send a patch doing so, that code is one mess WRT
> > overflow handling.
> 
> Agreed. I think as well that those should be fixed. It does not seem
> much complicated to fix them.

I'm not following. I was trying to say that I'll send a patch removing
the abstime/reltime/tinterval code.

Greetings,

Andres Freund

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Michael Paquier

On Wed, Dec 13, 2017 at 6:50 AM, Andres Freund  wrote:
> On 2017-12-12 16:47:17 -0500, Tom Lane wrote:
>> Really?  We've got test cases that intentionally exercise overflow
>> in the money code?  I think we could just drop such tests, until
>> such time as someone fixes the issue.
>
> Some parts at least (input & output), I think it's easy enough to fix
> those up.

There could be two ways to fix that:
1) Call the int8 equivalents with DirectFunctionCall2 and rely on the
overflow there, but this has a performance impact.
2) Add similar checks as in int8.c, which feels like duplicating code
but those are cheap.
You are heading to 2) I guess?

>> (OTOH, I bet we could drop reltime/abstime without too many complaints.
>> Y2038 is coming.)
>
> I'm actually about to send a patch doing so, that code is one mess WRT
> overflow handling.

Agreed. I think as well that those should be fixed. It does not seem
much complicated to fix them.
-- 
Michael

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread David Steele


Hi Michael,

On 12/12/17 6:08 PM, Michael Paquier wrote:



If the forks are written out of order (i.e. main before init), which is
definitely possible, then I think worst case is some files will be backed up
that don't need to be.  The main fork is unlikely to be very large at that
point so it doesn't seem like a big deal.


As far as I recall the init forks are logged before the main forks. I
don't think that we should rely on that assumption though to be always
satisfied.


Indeed, nothing is sure until a checkpoint.  Until then we must assume 
writes are random.



Well, I would be happy if you had a look!


You can count me in. I think that this patch has value for some
dedicated workloads. 


Thanks!


It is a waste to backup stuff that will be
removed at recovery anyway.


It also causes confusion when the recovered database is smaller than the 
backup.  I can't tell you how many times I have answered this question...


--
-David
da...@pgmasters.net

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread Andres Freund

On 2017-12-12 18:18:09 -0500, David Steele wrote:
> On 12/12/17 6:07 PM, Andres Freund wrote:
> > > 
> > > I don't see this as any different than what happens during recovery. The
> > > unlogged forks are cleaned / re-inited before replay starts which is the
> > > same thing we are doing here.
> > 
> > It's quite different - in the recovery case there's no other write
> > activity going on. But on a normally running cluster the persistence of
> > existing tables can get changed, and oids can get recycled.  What
> > guarantees that between the time you checked for the init fork the table
> > hasn't been dropped, the oid reused and now a permanent relation is in
> > its place?
> 
> Well, that's a good point!
> 
> How about rechecking the presence of the init fork after a main/other fork
> has been found?  Is it possible for an init fork to still be lying around
> after an oid has been recycled? Seems like it could be...

I don't see how that'd help. You could just have gone through this cycle
multiple times by the time you get to rechecking. All not very likely,
but I don't want us to rely on luck here...

If we had a way to prevent relfilenode reuse across multiple checkpoints
this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate.
I guess we could have the basebackup create placeholder files that
prevent relfilenode reuse, but that seems darned ugly.

Greetings,

Andres Freund

Re: Learned Indexes in PostgreSQL?

2017-12-12 Thread Stefan Keller

2017-12-12 21:26 GMT+01:00 Robert Haas :
> Here are links to the other two threads:
>
> http://postgr.es/m/CANNMO+J1KeTSx5q5SYuwHf1v-gPRLrOZw1s7qOpqWx=3umm...@mail.gmail.com
> http://postgr.es/m/caaerrx8bhiw3rgaqpolqjhyhk7ghordtke5uc9dasv1w5fz...@mail.gmail.com

Thanks, Robert.
My fault not having searched before this list - in fact I did but inaccurately.

:Stefan

2017-12-12 21:26 GMT+01:00 Robert Haas :
> On Tue, Dec 12, 2017 at 6:38 AM, Stefan Keller  wrote:
>> This is an awesome paper about a new index called "Learned Index".
>> it's aka dense hash structure derived ("learned") from actual data.
>> Looks very promising for certain properties [*].
>>
>> Anyone already working on this in Postgres?
>> How could this be implemented in Postgres?
>
> You're the third person to start a thread on this topic in the last few days.
>
> Here are links to the other two threads:
>
> http://postgr.es/m/CANNMO+J1KeTSx5q5SYuwHf1v-gPRLrOZw1s7qOpqWx=3umm...@mail.gmail.com
> http://postgr.es/m/caaerrx8bhiw3rgaqpolqjhyhk7ghordtke5uc9dasv1w5fz...@mail.gmail.com
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread Andres Freund

Hi,

On 2017-12-12 18:04:44 -0500, David Steele wrote:
> On 12/12/17 5:52 PM, Andres Freund wrote:
> > On 2017-12-12 17:49:54 -0500, David Steele wrote:
> > > Including unlogged relations in base backups takes up space and is 
> > > wasteful
> > > since they are truncated during backup recovery.
> > > 
> > > The attached patches exclude unlogged relations from base backups except 
> > > for
> > > the init fork, which is required to recreate the main fork during 
> > > recovery.
> > 
> > How do you reliably identify unlogged relations while writes are going
> > on? Without locks that sounds, uh, nontrivial?
> 
> I don't think this is an issue.  If the init fork exists it should be OK if
> it is torn since it will be recreated from WAL.

I'm not worried about torn pages.


> If the forks are written out of order (i.e. main before init), which is
> definitely possible, then I think worst case is some files will be backed up
> that don't need to be.  The main fork is unlikely to be very large at that
> point so it doesn't seem like a big deal.
> 
> I don't see this as any different than what happens during recovery. The
> unlogged forks are cleaned / re-inited before replay starts which is the
> same thing we are doing here.

It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled.  What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?

Greetings,

Andres Freund

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread David Steele


Hi Andres,

On 12/12/17 5:52 PM, Andres Freund wrote:

On 2017-12-12 17:49:54 -0500, David Steele wrote:

Including unlogged relations in base backups takes up space and is wasteful
since they are truncated during backup recovery.

The attached patches exclude unlogged relations from base backups except for
the init fork, which is required to recreate the main fork during recovery.


How do you reliably identify unlogged relations while writes are going
on? Without locks that sounds, uh, nontrivial?


I don't think this is an issue.  If the init fork exists it should be OK 
if it is torn since it will be recreated from WAL.


If the forks are written out of order (i.e. main before init), which is 
definitely possible, then I think worst case is some files will be 
backed up that don't need to be.  The main fork is unlikely to be very 
large at that point so it doesn't seem like a big deal.


I don't see this as any different than what happens during recovery. 
The unlogged forks are cleaned / re-inited before replay starts which is 
the same thing we are doing here.



I decided not to try and document unlogged exclusions in the continuous
backup documentation yet (they are noted in the protocol docs).  I would
like to get some input on whether the community thinks this is a good idea.
It's a non-trivial procedure that would be easy to misunderstand and does
not affect the quality of the backup other than using less space. Thoughts?


Think it's a good idea, I've serious concerns about practicability of a
correct implementation though.


Well, I would be happy if you had a look!

Thanks.
--
-David
da...@pgmasters.net

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread Andres Freund

Hi,

On 2017-12-12 17:49:54 -0500, David Steele wrote:
> Including unlogged relations in base backups takes up space and is wasteful
> since they are truncated during backup recovery.
> 
> The attached patches exclude unlogged relations from base backups except for
> the init fork, which is required to recreate the main fork during recovery.

How do you reliably identify unlogged relations while writes are going
on? Without locks that sounds, uh, nontrivial?


> I decided not to try and document unlogged exclusions in the continuous
> backup documentation yet (they are noted in the protocol docs).  I would
> like to get some input on whether the community thinks this is a good idea.
> It's a non-trivial procedure that would be easy to misunderstand and does
> not affect the quality of the backup other than using less space. Thoughts?

Think it's a good idea, I've serious concerns about practicability of a
correct implementation though.

- Andres

PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread David Steele

Including unlogged relations in base backups takes up space and is 
wasteful since they are truncated during backup recovery.


The attached patches exclude unlogged relations from base backups except 
for the init fork, which is required to recreate the main fork during 
recovery.


* exclude-unlogged-v1-01.patch

Some refactoring of reinit.c was required to reduce code duplication but 
the coverage report showed that most of the interesting parts of 
reinit.c were not being tested.  This patch adds coverage for reinit.c.


* exclude-unlogged-v1-02.patch

Refactor reinit.c to allow other modules to identify and work with 
unlogged relation forks.


* exclude-unlogged-v1-03.patch

Exclude unlogged relation forks (except init) from pg_basebackup to save 
space (and time).


I decided not to try and document unlogged exclusions in the continuous 
backup documentation yet (they are noted in the protocol docs).  I would 
like to get some input on whether the community thinks this is a good 
idea.  It's a non-trivial procedure that would be easy to misunderstand 
and does not affect the quality of the backup other than using less 
space. Thoughts?


I'll add these patches to the next CF.

--
-David
da...@pgmasters.net
diff --git a/src/test/recovery/t/014_unlogged_reinit.pl 
b/src/test/recovery/t/014_unlogged_reinit.pl
new file mode 100644
index 00..35feba69a0
--- /dev/null
+++ b/src/test/recovery/t/014_unlogged_reinit.pl
@@ -0,0 +1,117 @@
+# Tests that unlogged tables are properly reinitialized after a crash.
+#
+# The behavior should be the same when restoring from a backup but that is not
+# tested here (yet).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Initialize node without replication settings
+my $node = get_new_node('main');
+
+$node->init;
+$node->start;
+my $pgdata = $node->data_dir;
+
+# Create an unlogged table to test that forks other than init are not copied
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+   q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+my $tablespaceDir = undef;
+my $ts1UnloggedPath = undef;
+
+SKIP:
+{
+   skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+# Create unlogged tables in a tablespace
+$tablespaceDir = TestLib::tempdir . "/ts1";
+
+mkdir($tablespaceDir)
+or die "unable to mkdir \"$tablespaceDir\"";
+
+$node->safe_psql('postgres',
+"CREATE TABLESPACE ts1 LOCATION '$tablespaceDir'");
+$node->safe_psql('postgres',
+'CREATE UNLOGGED TABLE ts1_unlogged (id int) TABLESPACE ts1');
+
+$ts1UnloggedPath = $node->safe_psql('postgres',
+   q{select pg_relation_filepath('ts1_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+}
+
+# Crash the postmaster
+$node->stop('immediate');
+
+# Write forks to test that they are removed during recovery
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_vm"],
+   'touch vm fork in base');
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_fsm"],
+   'touch fsm fork in base');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${baseUnloggedPath}")
+or die "unable to remove \"${baseUnloggedPath}\"";
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+   skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+# Write forks to test that they are removed by recovery
+$node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_vm"],
+   'touch vm fork in tablespace');
+$node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_fsm"],
+   'touch fsm fork in tablespace');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${ts1UnloggedPath}")
+or die "unable to remove \"${ts1UnloggedPath}\"";
+}
+
+# Start the postmaster
+$node->start;
+
+# Check unlogged table in base
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork not in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_fsm", 'fsm fork not in base');
+
+# Drop unlogged table
+$node->safe_psql('postgres', 'DROP TABLE base_unlogged');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+   skip "symlinks not supported on Windows", 4 if ($windows_os);
+
+# Check unlogged table in tablespace
+ok(-f

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Chapman Flack

On 12/12/2017 04:33 PM, Robert Haas wrote:

> you want to upgrade your existing tables to use lz4 rather than pglz,
> you can change the compression option for those columns to COMPRESS
> lz4 PRESERVE pglz if you want to do it incrementally or just COMPRESS

This is a thread I've only been following peripherally, so forgive
a question that's probably covered somewhere upthread: how will this
be done? Surely not with compression-type bits in each tuple? By
remembering a txid where the compression was changed, and the former
algorithm for older txids?

-Chap

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Tomas Vondra



On 12/12/2017 10:33 PM, Robert Haas wrote:
> On Mon, Dec 11, 2017 at 2:53 PM, Tomas Vondra
>  wrote:
>> But let me play the devil's advocate for a while and question the
>> usefulness of this approach to compression. Some of the questions were
>> mentioned in the thread before, but I don't think they got the attention
>> they deserve.
> 
> Sure, thanks for chiming in.  I think it is good to make sure we are
> discussing this stuff.
> 
>> But perhaps we should simply make it an initdb option (in which case the
>> whole cluster would simply use e.g. lz4 instead of pglz)?
>>
>> That seems like a much simpler approach - it would only require some
>> ./configure options to add --with-lz4 (and other compression libraries),
>> an initdb option to pick compression algorithm, and probably noting the
>> choice in cluster controldata.
>>
>> No dependencies tracking, no ALTER TABLE issues, etc.
>>
>> Of course, it would not allow using different compression algorithms for
>> different columns (although it might perhaps allow different compression
>> level, to some extent).
>>
>> Conclusion: If we want to offer a simple cluster-wide pglz alternative,
>> perhaps this patch is not the right way to do that.
> 
> I actually disagree with your conclusion here.   I mean, if you do it
> that way, then it has the same problem as checksums: changing
> compression algorithms requires a full dump-and-reload of the
> database, which makes it more or less a non-starter for large
> databases.  On the other hand, with the infrastructure provided by
> this patch, we can have a default_compression_method GUC that will be
> set to 'pglz' initially.  If the user changes it to 'lz4', or we ship
> a new release where the new default is 'lz4', then new tables created
> will use that new setting, but the existing stuff keeps working.  If
> you want to upgrade your existing tables to use lz4 rather than pglz,
> you can change the compression option for those columns to COMPRESS
> lz4 PRESERVE pglz if you want to do it incrementally or just COMPRESS
> lz4 to force a rewrite of an individual table.  That's really
> powerful, and I think users will like it a lot.
> 
> In short, your approach, while perhaps a little simpler to code, seems
> like it is fraught with operational problems which this design avoids.
> 

I agree the checksum-like limitations are annoying and make it
impossible to change the compression algorithm after the cluster is
initialized (although I recall a discussion about addressing that).

So yeah, if such flexibility is considered valuable/important, then the
patch is a better solution.

>> Custom datatype-aware compression (e.g. the tsvector)
>> --
>>
>> Exploiting knowledge of the internal data type structure is a promising
>> way to improve compression ratio and/or performance.
>>
>> The obvious question of course is why shouldn't this be done by the data
>> type code directly, which would also allow additional benefits like
>> operating directly on the compressed values.
>>
>> Another thing is that if the datatype representation changes in some
>> way, the compression method has to change too. So it's tightly coupled
>> to the datatype anyway.
>>
>> This does not really require any new infrastructure, all the pieces are
>> already there.
>>
>> In some cases that may not be quite possible - the datatype may not be
>> flexible enough to support alternative (compressed) representation, e.g.
>> because there are no bits available for "compressed" flag, etc.
>>
>> Conclusion: IMHO if we want to exploit the knowledge of the data type
>> internal structure, perhaps doing that in the datatype code directly
>> would be a better choice.
> 
> I definitely think there's a place for compression built right into
> the data type.  I'm still happy about commit
> 145343534c153d1e6c3cff1fa1855787684d9a38 -- although really, more
> needs to be done there.  But that type of improvement and what is
> proposed here are basically orthogonal.  Having either one is good;
> having both is better.
> 

Why orthogonal?

For example, why couldn't (or shouldn't) the tsvector compression be
done by tsvector code itself? Why should we be doing that at the varlena
level (so that the tsvector code does not even know about it)?

For example we could make the datatype EXTERNAL and do the compression
on our own, using a custom algorithm. Of course, that would require
datatype-specific implementation, but tsvector_compress does that too.

It seems to me the main reason is that tsvector actually does not allow
us to do that, as there's no good way to distinguish the different
internal format (e.g. by storing a flag or format version in some sort
of header, etc.).

> I think there may also be a place for declaring that a particular data
> type has a "privileged" type of TOAST compression; if you use that
> kind of compression for that data type, the data type

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Chapman Flack

On 12/12/2017 03:39 PM, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 10:02 AM, Geoff Winkless  
>> Ctrl-D works almost everywhere. It takes 2 seconds...
> 
> If it really took everyone 2 seconds, this question would not be as
> common as it is. ...

Some years ago, I took a job at a mostly-Unix shop, where the
default .profile supplied with every new account contained

  stty eof ^Z

... apparently, a sysadmin had liked the DOS convention better than
the Unix one, and then made it the default for everybody. It often
took people more than 2 seconds to learn to use ^Z where they
expected ^D to work (not to mention to learn why they couldn't
suspend).

I guess I only mention that to reinforce the point that some
environments have customizations you wouldn't expect, and assumptions
about how obvious something is can be thrown off by a variety of things.

-Chap

Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

2017-12-12 Thread Robert Haas

On Sat, Dec 9, 2017 at 5:30 AM, Amit Kapila  wrote:
> On Sat, Dec 9, 2017 at 1:24 AM, Robert Haas  wrote:
>> On Fri, Dec 8, 2017 at 5:11 AM, Amit Kapila  wrote:
 Okay, I have adjusted the patch accordingly.  I have also added a
 regression test which should produce the same result across different
 runs, see if that looks okay to you, then it is better to add such a
 test as well.
>>>
>>> The regression test added by patch needs cleanup at the end which I
>>> have added in the attached patch.
>>
>> Hmm.  If we're going this way, then shouldn't we revert the changes
>> commit 2c09a5c12a66087218c7f8cba269cd3de51b9b82 made to
>> ExecParallelRetrieveInstrumentation?
>>
>
> Yeah, it is better to revert it as ideally that is not required after
> this patch and that is what I have tried to convey above ("Ideally, it
> would have obviated the need for my previous patch which
> got committed as 778e78ae." (The commit id is for branch 10,
> otherwise, it is same as what you mention.)).  I have locally reverted
> that patch and then rebased it on top of that.

Uh, should I just revert that commit entirely first, and then we can
commit the new fix afterward?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Andres Freund

On 2017-12-12 16:47:17 -0500, Tom Lane wrote:
> Really?  We've got test cases that intentionally exercise overflow
> in the money code?  I think we could just drop such tests, until
> such time as someone fixes the issue.

Some parts at least (input & output), I think it's easy enough to fix
those up.

> (OTOH, I bet we could drop reltime/abstime without too many complaints.
> Y2038 is coming.)

I'm actually about to send a patch doing so, that code is one mess WRT
overflow handling.

Greetings,

Andres Freund

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Tom Lane

Andres Freund  writes:
>> Robert Haas  writes:
>>> Long story short, I don't think anyone cares about this enough to
>>> spend effort fixing it.  I suspect the money data type has very few
>>> users.

> I'm unfortunately not so sure :(.

There seem to be enough of them that we get pushback anytime somebody
suggests removing the type.

> The background is that I was working on committing the faster (&
> correct) overflow checks, and wanted to compile postgres with -ftrapv.
> Some rudimentary cash (as well as pgbench, rel/abstime) fixes were
> required to make the tests succeed...

Really?  We've got test cases that intentionally exercise overflow
in the money code?  I think we could just drop such tests, until
such time as someone fixes the issue.

(OTOH, I bet we could drop reltime/abstime without too many complaints.
Y2038 is coming.)

regards, tom lane

Re: plpgsql test layout

2017-12-12 Thread Tom Lane

Peter Eisentraut  writes:
> On 12/11/17 19:29, Michael Paquier wrote:
>> If I read vcregress.pl correctly, it seems to me that you need to do
>> more with MSVC (see plcheck). The tests would kick if sql/ and
>> expected/ are found, and the test list is fetched by looking at
>> REGRESSION in the test files. However plpgsql code has an additional
>> src/ folder which would cause the tests to not execute. If plpgsql
>> code was moved on folder down then the tests would execute properly.

> OK, I hacked something up for MSVC.  How about this?

Looks ok to me, though I'm not in a position to actually test the
msvc changes.

regards, tom lane

Re: CUBE seems a bit confused about ORDER BY

2017-12-12 Thread Tomas Vondra



On 12/12/2017 01:52 PM, Alexander Korotkov wrote:
> On Tue, Dec 12, 2017 at 3:49 PM, Teodor Sigaev  > wrote:
> 
> Yes, the thing is that we change behavior of existing ~>
> operator.  In general, this is not good idea because it could
> affect existing users whose already use this operator. 
> Typically in such situation, we could leave existing operator as
> is, and invent new operator with new behavior.  However, in this
> particular case, I think there are reasons to make an exception
> to the rules.  The reasons are following:
> 1) The ~> operator was designed especially for knn gist.
> 2) Knn gist support for current behavior is broken by design and
> can't be fixed.  Most we can do to fix existing ~> operator
> behavior as is to drop knn gist support.  But then ~> operator
> would be left useless.
> 3) It doesn't seems that ~> operator have many users yet,
> because an error wasn't reported during whole release cycle.
> 
> I hope these reasons justify altering behavior of existing
> operator as an exception to the rules.  Another question is
> whether we should backpatch it.  But I think we could leave this
> decision to committer.
> 
>     I think that this patch is ready for committer.
> 
> I'm agree with changing behavior of existing ~> operator, but is any
> objection here? Current implementation is not fixable and I hope
> that users which use this operator will understand why we change it.
> Fortunately, the fix doesn't require changes in system catalog.
> 
> The single question here is about index over expression with this
> operator, they will need to reindex, which should be noted in
> release notes.
> 
> 
> Yes.  I bet only few users have built indexes over ~> operator if any. 
> Ask them to reindex in the release notes seems OK for me.
> 

Is there a good way to detect such cases? Either in pg_upgrade, so that
we can print warnings, or at least manually (which would be suitable for
release notes).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 2:53 PM, Tomas Vondra
 wrote:
> But let me play the devil's advocate for a while and question the
> usefulness of this approach to compression. Some of the questions were
> mentioned in the thread before, but I don't think they got the attention
> they deserve.

Sure, thanks for chiming in.  I think it is good to make sure we are
discussing this stuff.

> But perhaps we should simply make it an initdb option (in which case the
> whole cluster would simply use e.g. lz4 instead of pglz)?
>
> That seems like a much simpler approach - it would only require some
> ./configure options to add --with-lz4 (and other compression libraries),
> an initdb option to pick compression algorithm, and probably noting the
> choice in cluster controldata.
>
> No dependencies tracking, no ALTER TABLE issues, etc.
>
> Of course, it would not allow using different compression algorithms for
> different columns (although it might perhaps allow different compression
> level, to some extent).
>
> Conclusion: If we want to offer a simple cluster-wide pglz alternative,
> perhaps this patch is not the right way to do that.

I actually disagree with your conclusion here.   I mean, if you do it
that way, then it has the same problem as checksums: changing
compression algorithms requires a full dump-and-reload of the
database, which makes it more or less a non-starter for large
databases.  On the other hand, with the infrastructure provided by
this patch, we can have a default_compression_method GUC that will be
set to 'pglz' initially.  If the user changes it to 'lz4', or we ship
a new release where the new default is 'lz4', then new tables created
will use that new setting, but the existing stuff keeps working.  If
you want to upgrade your existing tables to use lz4 rather than pglz,
you can change the compression option for those columns to COMPRESS
lz4 PRESERVE pglz if you want to do it incrementally or just COMPRESS
lz4 to force a rewrite of an individual table.  That's really
powerful, and I think users will like it a lot.

In short, your approach, while perhaps a little simpler to code, seems
like it is fraught with operational problems which this design avoids.

> Custom datatype-aware compression (e.g. the tsvector)
> --
>
> Exploiting knowledge of the internal data type structure is a promising
> way to improve compression ratio and/or performance.
>
> The obvious question of course is why shouldn't this be done by the data
> type code directly, which would also allow additional benefits like
> operating directly on the compressed values.
>
> Another thing is that if the datatype representation changes in some
> way, the compression method has to change too. So it's tightly coupled
> to the datatype anyway.
>
> This does not really require any new infrastructure, all the pieces are
> already there.
>
> In some cases that may not be quite possible - the datatype may not be
> flexible enough to support alternative (compressed) representation, e.g.
> because there are no bits available for "compressed" flag, etc.
>
> Conclusion: IMHO if we want to exploit the knowledge of the data type
> internal structure, perhaps doing that in the datatype code directly
> would be a better choice.

I definitely think there's a place for compression built right into
the data type.  I'm still happy about commit
145343534c153d1e6c3cff1fa1855787684d9a38 -- although really, more
needs to be done there.  But that type of improvement and what is
proposed here are basically orthogonal.  Having either one is good;
having both is better.

I think there may also be a place for declaring that a particular data
type has a "privileged" type of TOAST compression; if you use that
kind of compression for that data type, the data type will do smart
things, and if not, it will have to decompress in more cases.  But I
think this infrastructure makes that kind of thing easier, not harder.

> Custom datatype-aware compression with additional column-specific
> metadata (e.g. the jsonb with external dictionary).
> --
>
> Exploiting redundancy in multiple values in the same column (instead of
> compressing them independently) is another attractive way to help the
> compression. It is inherently datatype-aware, but currently can't be
> implemented directly in datatype code as there's no concept of
> column-specific storage (e.g. to store dictionary shared by all values
> in a particular column).
>
> I believe any patch addressing this use case would have to introduce
> such column-specific storage, and any solution doing that would probably
> need to introduce the same catalogs, etc.
>
> The obvious disadvantage of course is that we need to decompress the
> varlena value before doing pretty much anything with it, because the
> datatype is not aware of the compression.
>
> So I wonder

Re: Rethinking MemoryContext creation

2017-12-12 Thread Tom Lane

Andres Freund  writes:
> On 2017-12-12 14:50:37 -0500, Robert Haas wrote:
>> It strikes me that a way to optimize these cases even more would be to
>> postpone creating the context until it's actually needed.  That might
>> not always be a reasonable plan -- in particular, it occurs to me to
>> think that CurTransactionContext is probably so widely used that
>> changing anything about how it works would probably be really painful
>> -- but it might be possible in some cases.

> That's not generally easy without slowing things down though - e.g. we
> don't want to check for ExprContext's existence before every use, there
> can be billions of usages in a single analytics query. The branches (yea
> yea ;)) would show up as being noticeable.

Yeah.  Also, in most of these cases what we're doing with the context
is installing it as CurrentMemoryContext while we execute some random
code that might or might not need to palloc anything.  We can't just
set CurrentMemoryContext to null - for one thing, that would leave no
trace of how the context should get set up if it's needed.  You could
imagine installing some sort of placeholder, but really that's the
mechanism we've already got, in the case where we just make a context
header and no blocks.

>> Another idea I have is that perhaps we could arrange to reuse contexts
>> instead of destroying them and recreating them.

> I'm a bit doubtful that's going to help, maintaining that list isn't
> going to be free, and the lifetime and number of those contexts aren't
> always going to match up.

Actually, this seems like a really promising idea to me.  To the extent
that an empty context has standard parameters, they're all
interchangeable, so you could imagine that AllocSetDelete shoves it
onto a freelist after resetting it, instead of just giving it back to
libc.  I'm slightly worried about creating allocation islands that
way, but that problem is probably surmountable, if it's real at all.

However, that seems like a different patch from what I'm working on
here.

regards, tom lane

Re: Error generating coverage report

2017-12-12 Thread Peter Eisentraut

On 12/12/17 15:39, David Steele wrote:
> $ make -C test/build/src/bin/pg_basebackup coverage-html
> make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
> /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d 
> /home/vagrant/src/src/bin/pg_basebackup -o lcov_base.info
> geninfo: ERROR: no .gcno files found in 
> /home/vagrant/src/src/bin/pg_basebackup!
> make: *** [lcov_base.info] Error 255
> make: *** Deleting file `lcov_base.info'
> make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'

I tried a setup like this locally and it works fine.  So we might have
to be more precise.  Can you tell me the vagrant box you are using if
it's public, or the OS version, and the exact commands, then I can try
to reproduce it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 1:06 PM, Alexander Korotkov
 wrote:
> OK, but NOTICE that presumably unexpected table rewrite takes place could be
> still useful.

I'm not going to complain too much about that, but I think that's
mostly a failure of expectation rather than a real problem.  If the
documentation says what the user should expect, and they expect
something else, tough luck for them.

> Also we probably should add some view that will expose compression methods
> whose are currently preserved for columns.  So that user can correctly
> construct SET COMPRESSION query that doesn't rewrites table without digging
> into internals (like directly querying pg_depend).

Yes.  I wonder if \d or \d+ can show it somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 10:02 AM, Geoff Winkless  wrote:
> There is also stackexchange question with 51000 views that asks how to
> start the postgresql client. Should we rename psql to mysql to help
> new users too?

That's not a serious question, and I don't know what your point is in
suggesting it, unless it's that you want to just troll everyone, which
I don't find especially constructive.

> Ctrl-D works almost everywhere. It takes 2 seconds to find the answer
> and there are going to be many other things you have to find out if
> you're coming at PostgreSQL from a different DB.

If it really took everyone 2 seconds, this question would not be as
common as it is.  So, it probably takes some people quite a bit
longer.

> As others have pointed out, there are _far_ more frustrating things
> (like the lack of SHOW CREATE [TABLE|etc]... support in the backend)
> for anyone coming from mysql that I would suggest should be addressed
> if you were serious about making life easier for those moving DBs.

That is neither a reason to make a change along the lines discussed
here nor a reason not to do so.  You can start separate threads about
those other topics if you wish.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 11:58 AM, Everaldo Canuto
 wrote:
> Sorry to be so sarcastic on last comment but, I am a little frustrated, it
> is not like this patch is big and need a lots of review, it is already done
> and it takes 15 minutes from me and it was the first time I look at Postgres
> source code. Really sorry.

It's not really done in any useful sense, because it has a -1 vote
from a prominent committer.  There is however an alternative proposal
which has several +1 votes, including from that same committer and
from me.  If you would like to produce a patch that implements that
counter-proposal, it might go somewhere.  Very little of significance
gets done around here without some give and take; I know that's
sometimes frustrating, but it's how it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error generating coverage report

2017-12-12 Thread David Steele


Hi Peter,

On 12/12/17 3:20 PM, Peter Eisentraut wrote:


make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d
/postgres/src/bin/pg_basebackup -o lcov_base.info
geninfo: ERROR: no .gcno files found in /postgres/src/bin/pg_basebackup!
make: *** [lcov_base.info] Error 255
make: *** Deleting file `lcov_base.info'
make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'


The -d options should accumulate, so that it should look in both places.

Wild guess: If the /postgres directory is some kind of vboxsf mount,
there might be problems if symlinks are involved.


/postgres is a vboxsf mount.  However, I copied /postgres to 
/home/vagrant/src and still see the same error:


$ make -C test/build/src/bin/pg_basebackup coverage-html
make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d 
/home/vagrant/src/src/bin/pg_basebackup -o lcov_base.info
geninfo: ERROR: no .gcno files found in 
/home/vagrant/src/src/bin/pg_basebackup!

make: *** [lcov_base.info] Error 255
make: *** Deleting file `lcov_base.info'
make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'

Thoughts?

--
-David
da...@pgmasters.net

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Geoff Winkless

On 12 December 2017 at 18:32, Chapman Flack  wrote:
> It seems to me that this is very close to the Robert Haas suggestion
> already discussed upthread, except that suggestion only prints the
> hint message when you might need it, which I'd find less intrusive.

Well "close" yes, but I like "simpler", and I'm not sure I like the
idea of waiting for someone to flail at the quit command before giving
them a hint.

Also, to be frank, if someone's typed "quit" and you're going to
perform an action based on it, it feels like you might as well quit -
a bit like google saying "did you mean \q" in its sneering
supercilious way.

Geoff

Re: Learned Indexes in PostgreSQL?

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 6:38 AM, Stefan Keller  wrote:
> This is an awesome paper about a new index called "Learned Index".
> it's aka dense hash structure derived ("learned") from actual data.
> Looks very promising for certain properties [*].
>
> Anyone already working on this in Postgres?
> How could this be implemented in Postgres?

You're the third person to start a thread on this topic in the last few days.

Here are links to the other two threads:

http://postgr.es/m/CANNMO+J1KeTSx5q5SYuwHf1v-gPRLrOZw1s7qOpqWx=3umm...@mail.gmail.com
http://postgr.es/m/caaerrx8bhiw3rgaqpolqjhyhk7ghordtke5uc9dasv1w5fz...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Backfill bgworker Extension?

2017-12-12 Thread Peter Eisentraut

On 12/12/17 13:03, Jeremy Finzel wrote:
> To be clear, what I mean is batch updating a large set of data in small
> pieces so as to avoid things like lock contention and replication lags. 
> Sometimes these have a driving table that has the source data to update
> in a destination table based on a key column, but sometimes it is
> something like setting just a single specific value for a huge table.
> 
> I would love instead to have a Postgres extension that uses postgres
> background workers to accomplish this, especially if it were part of
> core.  Before I venture into exploring writing something like this as an
> extension, would this ever be considered something appropriate as an
> extension in Postgres core?  Would that be appropriate?

I don't see what the common ground between different variants of this
use case would be.  Aren't you basically just looking to execute a
use-case-specific stored procedure in the background?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Error generating coverage report

2017-12-12 Thread Peter Eisentraut

On 12/12/17 15:02, David Steele wrote:
> I'm working on improving coverage and would like to generate some
> reports (other than the text versions) to help me find uncovered code.
> However, my source path and build path are not the same and I'm running
> into problems.

Vpath builds were one of the things that was specifically tested for the
coverage tooling changes that went into PG11, so it should work in
principle, unless something has changed in the meantime.

> $ make -C test/build/src/bin/pg_basebackup coverage-html
> 
> make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
> /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d
> /postgres/src/bin/pg_basebackup -o lcov_base.info
> geninfo: ERROR: no .gcno files found in /postgres/src/bin/pg_basebackup!
> make: *** [lcov_base.info] Error 255
> make: *** Deleting file `lcov_base.info'
> make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'

The -d options should accumulate, so that it should look in both places.

Wild guess: If the /postgres directory is some kind of vboxsf mount,
there might be problems if symlinks are involved.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Oleg Bartunov

On Tue, Dec 12, 2017 at 6:07 PM, Alexander Korotkov
 wrote:
> Hi!
>
> Let me add my two cents too.
>
> On Tue, Dec 12, 2017 at 2:41 PM, Ildus Kurbangaliev
>  wrote:
>>
>> On Mon, 11 Dec 2017 20:53:29 +0100 Tomas Vondra
>>  wrote:
>> > Replacing the algorithm used to compress all varlena values (in a way
>> > that makes it transparent for the data type code).
>> > --
>> >
>> > While pglz served us well over time, it was repeatedly mentioned that
>> > in some cases it becomes the bottleneck. So supporting other state of
>> > the art compression algorithms seems like a good idea, and this patch
>> > is one way to do that.
>> >
>> > But perhaps we should simply make it an initdb option (in which case
>> > the whole cluster would simply use e.g. lz4 instead of pglz)?
>> >
>> > That seems like a much simpler approach - it would only require some
>> > ./configure options to add --with-lz4 (and other compression
>> > libraries), an initdb option to pick compression algorithm, and
>> > probably noting the choice in cluster controldata.
>>
>> Replacing pglz for all varlena values wasn't the goal of the patch, but
>> it's possible to do with it and I think that's good. And as Robert
>> mentioned pglz could appear as builtin undroppable compresssion method
>> so the others could be added too. And in the future it can open the
>> ways to specify compression for specific database or cluster.
>
>
> Yes, usage of custom compression methods to replace generic non
> type-specific compression method is really not the primary goal of this
> patch.  However, I would consider that as useful side effect.  However, even
> in this case I see some advantages of custom compression methods over initdb
> option.
>
> 1) In order to support alternative compression methods in initdb, we have to
> provide builtin support for them.  Then we immediately run into
> dependencies/incompatible-licenses problem.  Also, we tie appearance of new
> compression methods to our release cycle.  In real life, flexibility means a
> lot.  Giving users freedom to experiment with various compression libraries
> without having to recompile PostgreSQL core is great.
> 2) It's not necessary that users would be satisfied with applying single
> compression method to the whole database cluster.  Various columns may have
> different data distributions with different workloads.  Optimal compression
> type for one column is not necessary optimal for another column.
> 3) Possibility to change compression method on the fly without re-initdb is
> very good too.

I consider custom compression as the way to custom TOAST. For example,
to optimal access
parts of very long document we need to compress slices of document.
Currently, long jsonb
document get compressed and then sliced and that killed all benefits
of binary jsonb.  Also, we are
thinking about "lazy" access to the parts of jsonb from pl's, which is
currently awfully unefficient.

>
>> > Custom datatype-aware compression (e.g. the tsvector)
>> > --
>> >
>> > Exploiting knowledge of the internal data type structure is a
>> > promising way to improve compression ratio and/or performance.
>> >
>> > The obvious question of course is why shouldn't this be done by the
>> > data type code directly, which would also allow additional benefits
>> > like operating directly on the compressed values.
>> >
>> > Another thing is that if the datatype representation changes in some
>> > way, the compression method has to change too. So it's tightly coupled
>> > to the datatype anyway.
>> >
>> > This does not really require any new infrastructure, all the pieces
>> > are already there.
>> >
>> > In some cases that may not be quite possible - the datatype may not be
>> > flexible enough to support alternative (compressed) representation,
>> > e.g. because there are no bits available for "compressed" flag, etc.
>> >
>> > Conclusion: IMHO if we want to exploit the knowledge of the data type
>> > internal structure, perhaps doing that in the datatype code directly
>> > would be a better choice.
>>
>> It could be, but let's imagine there will be internal compression for
>> tsvector. It means that tsvector has two formats now and minus one bit
>> somewhere in the header. After a while we found a better compression
>> but we can't add it because there is already one and it's not good to
>> have three different formats for one type. Or, the compression methods
>> were implemented and we decided to use dictionaries for tsvector (if
>> the user going to store limited number of words). But it will mean that
>> tsvector will go two compression stages (for its internal and for
>> dictionaries).
>
>
> I would like to add that even for single datatype various compression
> methods may have different tradeoffs.  For instance,

Error generating coverage report

2017-12-12 Thread David Steele

I'm working on improving coverage and would like to generate some
reports (other than the text versions) to help me find uncovered code.
However, my source path and build path are not the same and I'm running
into problems.

This works fine and produces gcov output:

$ make -C test/build/src/bin/pg_basebackup check
$ ls -1 test/build/src/bin/pg_basebackup/*.gcno

test/build/src/bin/pg_basebackup/pg_basebackup.gcno
test/build/src/bin/pg_basebackup/pg_receivewal.gcno
test/build/src/bin/pg_basebackup/pg_recvlogical.gcno
test/build/src/bin/pg_basebackup/receivelog.gcno
test/build/src/bin/pg_basebackup/streamutil.gcno
test/build/src/bin/pg_basebackup/walmethods.gcno

But, when I try to generate the coverage report I get:

$ make -C test/build/src/bin/pg_basebackup coverage-html

make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d
/postgres/src/bin/pg_basebackup -o lcov_base.info
geninfo: ERROR: no .gcno files found in /postgres/src/bin/pg_basebackup!
make: *** [lcov_base.info] Error 255
make: *** Deleting file `lcov_base.info'
make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'

It's looking at my source directory rather than my build directory.
Here's how I'm calling configure:

$ cd /home/vagrant/test/build
$ /postgres/configure --prefix=/home/vagrant/test/pg \
--docdir=/postgres-doc --enable-cassert --enable-tap-tests \
--enable-coverage --with-openssl --enable-depend

I read docs/10/static/regress-coverage.html but didn't see anything that
helps.  Hopefully I'm missing something?

Thanks,
-- 
-David
da...@pgmasters.net

Re: Rethinking MemoryContext creation

2017-12-12 Thread Andres Freund

On 2017-12-12 14:50:37 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 2:30 PM, Tom Lane  wrote:
> > 379 CurTransactionContext was never used
> >  24 CurTransactionContext was used
> >   66978 ExprContext was never used
> >   17364 ExprContext was used
> >   11139 SPI Exec was never used
> >2421 SPI Exec was used
> >   38386 ginPlaceToPage temporary context was never used
> >3348 ginPlaceToPage temporary context was used
> 
> It strikes me that a way to optimize these cases even more would be to
> postpone creating the context until it's actually needed.  That might
> not always be a reasonable plan -- in particular, it occurs to me to
> think that CurTransactionContext is probably so widely used that
> changing anything about how it works would probably be really painful
> -- but it might be possible in some cases.

That's not generally easy without slowing things down though - e.g. we
don't want to check for ExprContext's existence before every use, there
can be billions of usages in a single analytics query. The branches (yea
yea ;)) would show up as being noticeable.

There are a few places where could probably reliably *detect* that
they're not needed however. E.g. plenty executor nodes only need a
ExprContext when either a qual or projection is needed, both are pretty
cheap to detect at ExecInitNode() time.

> Another idea I have is that perhaps we could arrange to reuse contexts
> instead of destroying them and recreating them.  For example, when
> asked to delete a context, we could instead push it on a linked list
> of old contexts, or only if the list isn't too long already, and when
> asked to create one, we could pop from the list.  Or we could keep
> around an array of, say, 1024 contexts that are never freed and only
> allocated dynamically when we run out.

I'm a bit doubtful that's going to help, maintaining that list isn't
going to be free, and the lifetime and number of those contexts aren't
always going to match up.

I think you're somewhat on to something however: I do think that
especially executor startup would have a good chance of avoiding
noticeable overhead by batching the allocation of all these tiny
contexts together - I just don't quite know how given how our current
initialization works.  The best thing to do would probably to do two
walks during executor initialization, one to compute memory sizes, and a
second to initialize pre-requested memory that's laid out
serially... But obviously that's not a small change.

Greetings,

Andres Freund

Re: Rethinking MemoryContext creation

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 2:30 PM, Tom Lane  wrote:
> 379 CurTransactionContext was never used
>  24 CurTransactionContext was used
>   66978 ExprContext was never used
>   17364 ExprContext was used
>   11139 SPI Exec was never used
>2421 SPI Exec was used
>   38386 ginPlaceToPage temporary context was never used
>3348 ginPlaceToPage temporary context was used

It strikes me that a way to optimize these cases even more would be to
postpone creating the context until it's actually needed.  That might
not always be a reasonable plan -- in particular, it occurs to me to
think that CurTransactionContext is probably so widely used that
changing anything about how it works would probably be really painful
-- but it might be possible in some cases.

Another idea I have is that perhaps we could arrange to reuse contexts
instead of destroying them and recreating them.  For example, when
asked to delete a context, we could instead push it on a linked list
of old contexts, or only if the list isn't too long already, and when
asked to create one, we could pop from the list.  Or we could keep
around an array of, say, 1024 contexts that are never freed and only
allocated dynamically when we run out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rethinking MemoryContext creation

2017-12-12 Thread Tom Lane

I wrote:
> I've not done any benchmarking on this yet, just confirmed that it
> compiles and passes check-world.

So once I'd done some benchmarking, I was a bit disappointed: I could
not measure any difference anymore in "pgbench -S", and poking into
other scenarios found cases that were actually slower, like

do $$
begin
  for i in 1..1000 loop
declare x int;
begin
  x := 42;
exception when others then null;
end;
  end loop;
end$$;

which went from about 12.4 seconds in HEAD to about 13.6 with my
v2 patch.  When I looked into the reason, I found that my earlier
blithe pronouncement that "optimizing for the unused-context case
seems like the wrong thing" was too simple.  In this example we
create and delete two memory contexts per loop (a subtransaction
context and an ExprContext) and neither of them receives any
palloc requests.  Some of the contexts created in a "pgbench -S"
scenario are the same way.  So in these examples, we were on the
losing side of the replace-a-palloc-with-a-malloc trade.

However, if we can predict which contexts are more likely not to
receive traffic, we can fix this by doing things the old way for
those contexts.  I instrumented AllocSetDelete to log whether
the target context had received any requests in its lifespan,
and aggregated the reports over a run of the core regression tests.
I found these cases where there were significantly more reports
of "context was never used" than "context was used":

379 CurTransactionContext was never used
 24 CurTransactionContext was used
  66978 ExprContext was never used
  17364 ExprContext was used
993 HashTableContext was never used
 25 HashTableContext was used
 88 Logical Replication Launcher sublist was never used
  2 Logical Replication Launcher sublist was used
  11139 SPI Exec was never used
   2421 SPI Exec was used
 36 SetOp was never used
  2 SetOp was used
185 Statistics snapshot was never used
104 Subplan HashTable Context was never used
 44 Subplan HashTable Context was used
148 Subplan HashTable Temp Context was never used
   1481 Table function arguments was never used
 45 Table function arguments was used
 22 TableFunc per value context was never used
 52 Unique was never used
  4 Unique was used
137 WindowAgg Aggregates was never used
  2 WindowAgg Aggregates was used
127 WindowAgg Partition was never used
 12 WindowAgg Partition was used
288 _bt_pagedel was never used
 14 _bt_pagedel was used
 35 event trigger context was never used
  3 event trigger context was used
  38386 ginPlaceToPage temporary context was never used
   3348 ginPlaceToPage temporary context was used
454 ident parser context was never used
229 tSRF function arguments was never used
 46 tSRF function arguments was used

A lot of these probably aren't bothering with optimizing because they just
don't occur often enough to move the needle.  (And in workloads where that
wasn't true, maybe the usage statistics would be different anyway.)  But
it looked to me like CurTransactionContext, ExprContext, HashTableContext,
SPI Exec, and ginPlaceToPage temporary context would be worth doing it
the old way for.

Accordingly, attached is a v3 patch that adds another flag to
AllocSetContextCreateExtended telling it to do things the old way
with the context header in TopMemoryContext.  (We can still optimize
context name copying as before.)  This fixes the subtransaction case
I showed above, bringing it to 10.7 sec which is noticeably better
than HEAD.  I now see maybe a 1 or 2 percent improvement in "pgbench -S",
which isn't much, but it turns out that that test case only involves
half a dozen context creations/deletions per transaction.  So probably
we aren't going to get any noticeable movement on that benchmark, and
it'd be brighter to look for test cases where more contexts are involved.

regards, tom lane

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 868c14e..adbbc44 100644
*** a/contrib/amcheck/verify_nbtree.c
--- b/contrib/amcheck/verify_nbtree.c
*** bt_check_every_level(Relation rel, bool 
*** 295,303 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_MINSIZE,
!  ALLOCSET_DEFAULT_INITSIZE,
!  ALLOCSET_DEFAULT_MAXSIZE);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);

  	/* Get true root block from meta-page */
--- 295,301 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_SIZES);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);

  	/* Get true root block from meta-page */
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund

On 2017-12-12 14:16:59 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund  wrote:
> > I agree that it's more reliable - I hope there's no meaningful safety
> > difference.  I think you overestimate users a bit however - far from
> > most of them are going to be able to extract a very long log entry from
> > a busy log file. There's no generally available easy way to copy a few
> > pages of text from a logfile that's a few gigabytes large...
> 
> Well, a lot of users will send us the whole logfile rather than just
> the relevant bits, but that doesn't bother me.

That doesn't really work on some busy servers, and also often in cases
where the log potentially contains sensitive (e.g. HIPPA) data.

I think a function returning the dump would be great, a function "just"
dumping to the server log still pretty good.

Greetings,

Andres Freund

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund  wrote:
> I agree that it's more reliable - I hope there's no meaningful safety
> difference.  I think you overestimate users a bit however - far from
> most of them are going to be able to extract a very long log entry from
> a busy log file. There's no generally available easy way to copy a few
> pages of text from a logfile that's a few gigabytes large...

Well, a lot of users will send us the whole logfile rather than just
the relevant bits, but that doesn't bother me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ALTER TABLE ADD COLUMN fast default

2017-12-12 Thread Andrew Dunstan

On 12/06/2017 11:26 AM, Andrew Dunstan wrote:
>> In general, you need to be thinking about this in terms of making sure
>> that it's impossible to accidentally not update code that needs to be
>> updated.  Another example is that I noticed that some places the patch
>> has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
>> the former will fail to copy the missing-attribute support data.
>> I think that's an astonishingly bad idea.  There is basically no situation
>> in which CreateTupleDescCopy defined in that way will ever be safe to use.
>> The missing-attribute info is now a fundamental part of a tupdesc and it
>> has to be copied always, just as much as e.g. atttypid.
>>
>> I would opine that it's a mistake to design TupleDesc as though the
>> missing-attribute data had anything to do with default values.  It may
>> have originated from the same place but it's a distinct thing.  Better
>> to store it in a separate sub-structure.
>
> OK, will work on all that. Thanks again.
>

Looking closer at this. First, there is exactly one place where the
patch does  s/CreateTupleDescCopy/CreateTupleDescCopyConstr/.

making this like atttypid suggests that it belongs right outside the
TupleConstr structure. But then it seems to me that the change could
well end up being a darn site more invasive. For example, should we be
passing the number of missing values to CreateTemplateTupleDesc and
CreateTupleDesc? I'm happy to do whatever work is required, but I want
to have a firm understanding of the design before I spend lots of time
cutting code. Once I understand how tupdesc.h should look I should be good.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund

On 2017-12-12 14:04:55 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund  wrote:
> > I do wonder if the right thing here wouldn't be to put the result into a
> > dsm segment, and then return that to the UDF on the requesting
> > side. Logging to the server log and then have the requestor dig that out
> > doesn't seem particularly user friendly.
> 
> I think that dumping it to the server log will be fine for most
> people, and it's *significantly* safer.

I agree that it's more reliable - I hope there's no meaningful safety
difference.  I think you overestimate users a bit however - far from
most of them are going to be able to extract a very long log entry from
a busy log file. There's no generally available easy way to copy a few
pages of text from a logfile that's a few gigabytes large...

I'd suggest adding two functions.

> Creating a DSM segment could fail, and the last thing we want is for
> interrogating a long-running backend to make it crap out.

It'd need to handle memory alloc failures gracefully, I agree. That
doesn't seem impossible.  Also don't think that's meaningfully different
in the non-dsm case, it'd be good to handle malloc failures gracefully
in that case as well.

The requesting side should create the dsm segment and preallocate a
reasonable amount of memory, if we go for it.

> +1 for this whole concept, just BTW.  As I've said before, I grow
> weary of asking customers to run gdb.

Indeed.

Greetings,

Andres Freund

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund  wrote:
> I do wonder if the right thing here wouldn't be to put the result into a
> dsm segment, and then return that to the UDF on the requesting
> side. Logging to the server log and then have the requestor dig that out
> doesn't seem particularly user friendly.

I think that dumping it to the server log will be fine for most
people, and it's *significantly* safer.  Creating a DSM segment could
fail, and the last thing we want is for interrogating a long-running
backend to make it crap out.

+1 for this whole concept, just BTW.  As I've said before, I grow
weary of asking customers to run gdb.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 10:41 PM, Amit Kapila  wrote:
> In particular, if the worker crashed (say somebody forcefully killed
> the worker), then I think it will lead to a restart of the server.  So
> not sure, adding anything about that in the comment will make much
> sense.

I think it's dangerous to rely on that assumption.  If somebody stuck
proc_exit(1) into the very beginning of the worker startup sequence,
the worker would exit but the server would not restart.

As I started experimenting with this, I discovered that your patch is
actually unsafe.   Here's a test case that shows the breakage:

set force_parallel_mode = on;
select 1;

The expected outcome is:

 ?column?
--
1
(1 row)

If I hack do_start_bgworker() to make worker startup fail, like this:

--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5701,7 +5701,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 #ifdef EXEC_BACKEND
 switch ((worker_pid = bgworker_forkexec(rw->rw_shmem_slot)))
 #else
-switch ((worker_pid = fork_process()))
+switch ((worker_pid = -1))
 #endif
 {
 case -1:

...then it hangs forever.  With your patch it no longer hangs forever,
which is good, but it also returns the wrong answer, which is bad:

 ?column?
--
(0 rows)

The problem is that Gather sees nworkers_launched > 0 and assumes that
it is therefore entitled not to perform a local scan.  I'm pretty sure
that there could also be cases where this patch makes us miss an error
reported by a worker; suppose the worker error is reported and the
worker exits after WaitForParallelWorkersToFinish does
CHECK_FOR_INTERRUPTS() but before we reach GetBackgroundWorkerPid().
The code will simply conclude that the worker is dead and stop
waiting, never mind the fact that there's a pending error in the
queue.  This is exactly why I made parallel workers send a Terminate
message to the leader instead of just disappearing -- the leader needs
to see that message to know that it's reached the end of what the
worker is going to send.

So, I think we need to regard fork failure and related failure modes
as ERROR conditions.  It's not equivalent to failing to register the
background worker.  When we fail to register the background worker,
the parallel.c machinery knows at LaunchParallelWorkers() time that
the worker will never show up and lets the calling code by setting
nworkers_launched, and the calling code can then choose to proceed
with that number of workers or ERROR out as it chooses.  But once we
decide on the value of nworkers_launched to report to callers, it's
really not OK for workers to just silently not ever start, as the test
case above shows.  We could make it the job of code that uses the
ParallelContext machinery to cope with that situation, but I think
that's a bad plan, because every user would have to worry about this
obscure case.  Right now, the only in-core use of the ParallelContext
machinery is Gather/Gather Merge, but there are pending patches for
parallel index scan and parallel VACUUM, so we'll just end up adding
this handling to more and more places.  And for what?  If you've got
max_connections and/or max_parallel_workers set so high that your
system can't fork(), you have a big problem, and forcing things that
would have run using parallel query to run serially instead of
erroring out is not going to fix it.  I think that deciding we're
going to handle fork() failure by reporting an ERROR is just fine.

So, here's a patch.  This patch keeps track of whether we've ever
received a message from each worker via the error queue.  If there are
no remaining workers which have neither exited cleanly nor sent us at
least one message, then it calls GetBackgroundWorkerPid() for each
one.  If any worker reports that it is BGWH_STOPPED, then it
double-checks whether the worker has attached to the error queue.  If
it did, then it assumes that we'll detect clean shutdown or an error
on the next iteration and ignores the problem.  If not, then the
worker died without ever attaching the error queue, so it throws an
ERROR.  I tested that this causes the test case above to throw a
proper ERROR when fork_process() returns -1.  Please review...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

parallel-worker-fork-failure.patch
Description: Binary data

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund

On 2017-12-12 14:25:48 +0800, Craig Ringer wrote:
> If we were willing to wrap variadic calls in a callback and rely on
> vfprintf, we could:

I don't think there's problems with relying on variadic calls, we do
that in plenty places.

> and let the ProcSignal based caller pass an elog wrapper instead, or form a
> StringInfo with appendStringInfoVA then elog it in one go after the
> MemoryContextStatsDetail call returns.

I don't think we want a simple elog wrapper - outputting the context
stats as hundreds of log messages doesn't seem right. So at the least it
seems we should bunch it up in stringinfo - which seems to at least
require expanding the API to pass around a void *callback_data or such.

I do wonder if the right thing here wouldn't be to put the result into a
dsm segment, and then return that to the UDF on the requesting
side. Logging to the server log and then have the requestor dig that out
doesn't seem particularly user friendly.

Greetings,

Andres Freund

Re: pgbench's expression parsing & negative numbers

2017-12-12 Thread Andres Freund

On 2017-12-12 07:21:21 +0100, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> > working on overflow correctness in pg I noticed that pgbench isn't quite
> > there.  I assume it won't matter terribly often, but the way it parses
> > integers makes it incorrect for, at least, the negativemost number.
> > 
> > It directly parses the integer input using:
> > {digit}+{
> > yylval->ival = strtoint64(yytext);
> > return INTEGER_CONST;
> > }
> > 
> > but that unfortunately means that the sign is no included in the call to
> > strtoint64. Which in turn means you can't correctly parse PG_INT64_MIN,
> 
> Indeed. For a benchmarking script which generates a pseudo random load I do
> not see as a big issue, but maybe I'm missing some use case.

IDK, behaving correctly seems like a good idea. Also far from impossible
that that code gets propagated further.  Doesn't seem like an entirely
crazy idea that somebody actually wants to benchmark boundary cases,
which might be slower and such.

> > because that's not representable as a positive number on a two's
> > complement machine (i.e. everywhere).  Preliminary testing shows that
> > that can relatively easily fixed by just prefixing [-+]? to the relevant
> > exprscan.l rules.
> 
> Beware that it must handle the unary/binary plus/minus case consistently:
> 
>   "-123" vs "a -123" vs "a + -123"

Which is a lot easier to solve on the parser side of things...


> > But it might also not be a bad idea to simply defer parsing of the
> > number until exprparse.y has its hand on the number?
> 
> It is usually simpler to let the lexer handle constants.


> > There's plenty other things wrong with overflow handling too, especially
> > evalFunc() basically just doesn't care.
> 
> Sure.
> 
> There are some overflow checking with div and double to int cast, which were
> added because of previous complaints, but which are not very useful to me.

I think handling it inconsistently is the worst of all worlds.


> ISTM that it is a voluntary feature, which handles int64 operations with a
> modulo overflow like C. Generating exceptions on such overflows does not
> look very attractive to me.

No, it's not really voluntary. Signed overflow isn't actually defined
behaviour in C. We kinda make it so on some platforms, but that's not
really a good idea.


> >  I'll come up with a patch for that sometime soon.
> 
> I wish you could provide some motivation: why does it matter much to a
> benchmarking script to handle overflows with more case?

a) I want to be able to compile without -fwrapv in the near
   future. Which means you can't have signed overflows, because they're
   undefined behaviour in C.
b) I want to be able to compile with -ftrapv /
   -fsanitize=signed-integer-overflow, to be sure code is actually
   correct. Currently pgbench crashes with that.
c) Randomly handling things in some but not all places is a bad idea.


> > A third complaint I have is that the tap tests are pretty hard to parse
> > for humans.
> 
> Probably.
> 
> Perl is pretty hard to humans to start with:-)

Meh.


> There is a lot of code factorization to cram many tests together, so that
> one test is more or less one line and there are hundreds of them. Not sure
> it would help if it was expanded.

I don't think expanding it is really a problem, I think they're just
largely not well documented, formatted. With some effort the tests could
be a lot easier to read.


Greetings,

Andres Freund

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Chapman Flack

On 12/12/2017 12:00 PM, Geoff Winkless wrote:

> I quite like this idea, although I would rather a line printed after
> each command is run: something like
> 
> mydbname=# select ;
> --
> (1 row)
> 
> \h for help, \q for quit, \expert to turn off this message
> osprey=#

It seems to me that this is very close to the Robert Haas suggestion
already discussed upthread, except that suggestion only prints the
hint message when you might need it, which I'd find less intrusive.

-Chap

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Andres Freund

Hi,

On 2017-12-12 11:01:04 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > Long story short, I don't think anyone cares about this enough to
> > spend effort fixing it.  I suspect the money data type has very few
> > users.

I'm unfortunately not so sure :(.

> Somebody did contribute the effort not too long ago to install that
> overflow check into cash_in.  So maybe somebody will step up and fix
> the other money functions.  I can't get too excited about it in the
> meantime.

I can't really either. But I think that kinda suggest we ought to rip
that code out in the not too far away future.

The background is that I was working on committing the faster (&
correct) overflow checks, and wanted to compile postgres with -ftrapv.
Some rudimentary cash (as well as pgbench, rel/abstime) fixes were
required to make the tests succeed...

Greetings,

Andres Freund

Backfill bgworker Extension?

2017-12-12 Thread Jeremy Finzel

One of our challenges we have is that our engineers have written frameworks
to backfill data in several different DSLs, and every time they adopt a new
language, they maybe need to write another one.

To be clear, what I mean is batch updating a large set of data in small
pieces so as to avoid things like lock contention and replication lags.
Sometimes these have a driving table that has the source data to update in
a destination table based on a key column, but sometimes it is something
like setting just a single specific value for a huge table.

I would love instead to have a Postgres extension that uses postgres
background workers to accomplish this, especially if it were part of core.
Before I venture into exploring writing something like this as an
extension, would this ever be considered something appropriate as an
extension in Postgres core?  Would that be appropriate?

Thanks,
Jeremy

Re: Learned Index

2017-12-12 Thread Oleg Ivanov


On 12/12/2017 12:16 PM, Laurenz Albe wrote:

I have read into the paper.

This may be interesting or not, but the paper is very vague about
its concepts and algorithms, so it's hard to tell.

I'd say that the paper does not meet publication standards.

For example, they say that their results were generated by comparing
a B-tree implementation with "learned indexes using a 2-stage
RMI model and different second-stage sizes (i.e., 10k, 50k, 100k, and 
200k)",

but they don't say exactly what the neural network in these stages is
(at least it is not obvious to me).  Their "Learning Index Framework" 
(LIF)

is described with a few vague sentences and a reference to the literature
saying that is where they got some ideas from.
That is not the answer, but gives us the idea of which kind of neural 
networks was used: "For this paper, we only focused on 2 types of 
models, simple neural nets with zero to two fully-connected hidden 
layers and ReLU activation functions and a layer width of up to 32 
neurons and B-Trees (a.k.a. decision trees)".

There is also no clear concept of how these indexes should handle
data modifications, so I think that there are some loose ends to be
tied up before it is ready for implementation.

Finally, I don't see any clear statement as to the error guarantees
that the neural network prediction can give, and if it is possible that
it may degrade to scanning relevant parts of the table in some cases.
No guarantees are provided (I don't think it is even possible), besides 
the guarantee that if the error of the neural network prediction is more 
than the error of B-tree prediction, B-tree will be used: "Note, that 
hybrid indexes allow us to bound the worst case performance of learned 
indexes to the performance of B-Trees".


Oleg Ivanov
Postgres Professional
The Russian PostgreSQL Company

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2017-12-12 Thread Oleg Ivanov


On 12/12/2017 04:33 PM, Oleg Bartunov wrote:

On Mon, Dec 11, 2017 at 11:11 PM, Nikolay Samokhvalov
 wrote:

Very interesting read: https://arxiv.org/abs/1712.01208

HN discussion: https://news.ycombinator.com/item?id=15894896

Some of the comments (from Twitter
https://twitter.com/schrockn/status/940037656494317568): "Jeff Dean and co
at GOOG just released a paper showing how machine-learned indexes can
replace B-Trees, Hash Indexes, and Bloom Filters. Execute 3x faster than
B-Trees, 10-100x less space. Executes on GPU, which are getting faster
unlike CPU. Amazing."

Can those ideas be applied to Postgres in its current state? Or it's not
really down-to-earth?

Oleg made some analysis of the paper.
If the keys are comparable and the data distribution is simple enough 
(which seems to happen rather often) and the data does not change, we 
can learn the distribution, and so perform the search faster, and save 
the memory also. That is what authors wrote and that is what must work 
in my opinion.


The limitations of the approach, which authors mention in their work:
+ The data does not change. The proposed method can be extended more or 
less straightforward to nearly append-only workloads and to workloads in 
which the data distribution does not change or changes very slowly (the 
data itself or its amount may change, nevertheless). Otherwise it is 
challenging, because the key positions cannot be considered as 
independent values anymore, but it looks still possible. The other 
proposed by the authors option is to store new objects in buffer and 
retrain model in the background, but that does not look as a nice solution.
+ GPU are fast, but the latency of doing operations on the GPU almost 
certainly avoids all speed benefits for such small models. The only case 
in which it is reasonable to use GPU is training models (i. e. building 
such indexes).


The following left unclear for me from the paper:
+ How effectively the approach can be implemented taking into account 
the limitations above.
+ For hash functions authors propose to replace hash function with the 
learned CDF, which can decrease the amount of collisions, which decreaes 
the memory consumption. That is reasonable, but this hash function 
implicitly uses the information about the order on the keys. I suppose 
that using the ordering on the data and with access to the data 
histogram one could achieve similar memory consumption decrease.
+ The proposed hierarchical learning looks natural for the problem, but 
it is not clear that it is the best option. For example, one might use 
the end-to-end learning procedure with weights sparsity regularizers as 
well. I wonder whether the last approach can obtain the competitive 
results with the first one, and if not, then why. Anyway, I have a 
feeling that the proposed method has a room for improvement.


In general, the approach still needs some additional research (mostly 
about the changing data), and has some points to think about how to 
implement them properly (paging, for example), but it looks promising 
enough nevertheless.

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Geoff Winkless

On 12 December 2017 at 16:36, Arthur Nascimento  wrote:
> Nano keeps a cheatsheet of commands on the bottom; and it's regarded
> as a fairly good editor for newcomers. Some shells have a right-hand
> side status bar.

I quite like this idea, although I would rather a line printed after
each command is run: something like

mydbname=# select ;
--
(1 row)

\h for help, \q for quit, \expert to turn off this message
osprey=#

This has the advantage that it doesn't need curses support, is pretty
simple and actually has the "off" information inline.

When you type \expert you could print a line that says

put \expert in ~/.psqlrc to make this change permanent

or something.

Geoff

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Everaldo Canuto

On Tue, Dec 12, 2017 at 2:39 PM, Everaldo Canuto 
wrote:

>
> As others have pointed out, there are _far_ more frustrating things
>> (like the lack of SHOW CREATE [TABLE|etc]... support in the backend)
>> for anyone coming from mysql that I would suggest should be addressed
>> if you were serious about making life easier for those moving DBs.
>>
>
> Well theres lots of hungry people in the world, maybe we should stop
> improve software tool until everyone have food.
>

Sorry to be so sarcastic on last comment but, I am a little frustrated, it
is not like this patch is big and need a lots of review, it is already done
and it takes 15 minutes from me and it was the first time I look at
Postgres source code. Really sorry.

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Geoff Winkless

On 12 December 2017 at 16:39, Everaldo Canuto  wrote:
> On Tue, Dec 12, 2017 at 1:02 PM, Geoff Winkless  wrote:
>> There is also stackexchange question with 51000 views that asks how to
>> start the postgresql client. Should we rename psql to mysql to help
>> new users too?

> No, we shouldn't but it will help a lot is postgres client have names like
> "postgres", "postgres-client" or "portgres-sql". Even pgsql is better than
> psql. On mysql normally you type mysql then tab and bash show all mysql
> tools.

If you're going down _that_ route I'd vote for pg_sql since we have
pg_dump and pg_ctl etc.

But in order to know how to call those things you would still need to
find out what they are. You think that typing mysql is "natural"
but you only know how to do it because someone told you to type
"mysql" to run the client.

> Not everybody knows CTRL-D.

And it takes two seconds to find the answer if you don't. Adding
"quit" and/or "exit" would save each person a maximum of one web
search per lifetime and would just confuse the paradigm that
backslashed commands are handled by the client.

> Well theres lots of hungry people in the world, maybe we should stop improve
> software tool until everyone have food.

That's a nonsense metaphor. If you said "there's lots of hungry people
in the world, maybe we should stop bakers making croissants until
everyone has bread" it might be slightly more appropriate.

Geoff

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Everaldo Canuto

On Tue, Dec 12, 2017 at 1:02 PM, Geoff Winkless  wrote:
>
> There is also stackexchange question with 51000 views that asks how to
> start the postgresql client. Should we rename psql to mysql to help
> new users too?
>

No, we shouldn't but it will help a lot is postgres client have names like
"postgres", "postgres-client" or "portgres-sql". Even pgsql is better than
psql. On mysql normally you type mysql then tab and bash show all mysql
tools.

>
> Ctrl-D works almost everywhere. It takes 2 seconds to find the answer
> and there are going to be many other things you have to find out if
> you're coming at PostgreSQL from a different DB.
>

Not everybody knows CTRL-D.

>
> As others have pointed out, there are _far_ more frustrating things
> (like the lack of SHOW CREATE [TABLE|etc]... support in the backend)
> for anyone coming from mysql that I would suggest should be addressed
> if you were serious about making life easier for those moving DBs.
>

Well theres lots of hungry people in the world, maybe we should stop
improve software tool until everyone have food.

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Arthur Nascimento

On 12 December 2017 at 13:04, Jeremy Finzel  wrote:
> FWIW I am +1 in favor of not overcomplicating the special psql commands and
> keep to the "\" standard.

I find the general rule of \ starts a client command and anything else
starts a server command to be very good to begin with. The 'help' is
an ugly exception to the rule and I wouldn't like to see it expanding.
If it were up to me, I'd rename 'help' to '\help' and add '\q' to the
initial welcome message.

Of course, that wouldn't be accepted and it wouldn't solve the SO
statistics or the OP's complaint. It's been already established that
we're now trying to cater to a special kind of user - not the one who
doesn't want/have time to read the docs - but the one who didn't even
read the start message or who read it and forgot what it was after a
screen full of content has passed, then went to SO instead of the
docs. (I'm not criticizing - just trying to paint a clear picture of
the goal here.) And since I view the lonely exit|quit in an uglier
solution still, how about this idea:

Nano keeps a cheatsheet of commands on the bottom; and it's regarded
as a fairly good editor for newcomers. Some shells have a right-hand
side status bar. And even vim these days shows several lines of
explanation of :q if you open it alone. If psql had a right-hand side
status bar that defaulted to a short cheatsheet-like message like the
initial "type 'help' for help and '\q' to quit" (perhaps empty if
started with quiet), then it would follow new users around on every
new command, so they would always know how to quit or ask for help. I
think the message mentioning '\q' instead of 'quit' is important
anyway since it works even on a dirty buffer. And experienced users
already set their status lines anyway, so they would have an extra
status space on the right for general things; perhaps RPROMPT, as zsh
calls it. I'd rather have SO fill with "how to change the right prompt
in psql" than "how to quit psql" since then we're talking about
different user profiles.

Tureba - Arthur Nascimento

Re: Boolean partitions syntax

2017-12-12 Thread Peter Eisentraut

On 12/12/17 04:12, Ashutosh Bapat wrote:
> May be you should use opt_boolean_or_string instead of TRUE_P and
> FALSE_P. It also supports ON and OFF, which will be bonus.

But ON and OFF are not valid boolean literals in SQL.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: plpgsql test layout

2017-12-12 Thread Peter Eisentraut

On 12/11/17 19:29, Michael Paquier wrote:
> If I read vcregress.pl correctly, it seems to me that you need to do
> more with MSVC (see plcheck). The tests would kick if sql/ and
> expected/ are found, and the test list is fetched by looking at
> REGRESSION in the test files. However plpgsql code has an additional
> src/ folder which would cause the tests to not execute. If plpgsql
> code was moved on folder down then the tests would execute properly.

OK, I hacked something up for MSVC.  How about this?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 90383a92d457ac0cc23faf72b68e87a12f000def Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 7 Dec 2017 14:03:29 -0500
Subject: [PATCH v2] Start a separate test suite for plpgsql

The plpgsql.sql test file in the main regression tests is now by far the
largest after numeric_big, making editing and managing the test cases
very cumbersome.  The other PLs have their own test suites split up into
smaller files by topic.  It would be nice to have that for plpgsql as
well.  So, to get that started, set up test infrastructure in
src/pl/plpgsql/src/ and split out the recently added procedure test
cases into a new file there.  That file now mirrors the test cases added
to the other PLs, making managing those matching tests a bit easier too.
---
 src/pl/plpgsql/src/.gitignore|  3 ++
 src/pl/plpgsql/src/Makefile  | 14 
 src/pl/plpgsql/src/expected/plpgsql_call.out | 41 +++
 src/pl/plpgsql/src/sql/plpgsql_call.sql  | 47 ++
 src/test/regress/expected/plpgsql.out| 41 ---
 src/test/regress/sql/plpgsql.sql | 49 
 src/tools/msvc/vcregress.pl  | 18 +++---
 7 files changed, 119 insertions(+), 94 deletions(-)
 create mode 100644 src/pl/plpgsql/src/expected/plpgsql_call.out
 create mode 100644 src/pl/plpgsql/src/sql/plpgsql_call.sql

diff --git a/src/pl/plpgsql/src/.gitignore b/src/pl/plpgsql/src/.gitignore
index 92387fa3cb..ff6ac965fd 100644
--- a/src/pl/plpgsql/src/.gitignore
+++ b/src/pl/plpgsql/src/.gitignore
@@ -1,3 +1,6 @@
 /pl_gram.c
 /pl_gram.h
 /plerrcodes.h
+/log/
+/results/
+/tmp_check/
diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index 95348179ac..64991c3115 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -24,6 +24,8 @@ OBJS = pl_gram.o pl_handler.o pl_comp.o pl_exec.o \
 
 DATA = plpgsql.control plpgsql--1.0.sql plpgsql--unpackaged--1.0.sql
 
+REGRESS = plpgsql_call
+
 all: all-lib
 
 # Shared library stuff
@@ -65,6 +67,18 @@ pl_gram.c: BISONFLAGS += -d
 plerrcodes.h: $(top_srcdir)/src/backend/utils/errcodes.txt 
generate-plerrcodes.pl
$(PERL) $(srcdir)/generate-plerrcodes.pl $< > $@
 
+
+check: submake
+   $(pg_regress_check) $(REGRESS_OPTS) $(REGRESS)
+
+installcheck: submake
+   $(pg_regress_installcheck) $(REGRESS_OPTS) $(REGRESS)
+
+.PHONY: submake
+submake:
+   $(MAKE) -C $(top_builddir)/src/test/regress pg_regress$(X)
+
+
 distprep: pl_gram.h pl_gram.c plerrcodes.h
 
 # pl_gram.c, pl_gram.h and plerrcodes.h are in the distribution tarball,
diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out 
b/src/pl/plpgsql/src/expected/plpgsql_call.out
new file mode 100644
index 00..d0f35163bc
--- /dev/null
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -0,0 +1,41 @@
+--
+-- Tests for procedures / CALL syntax
+--
+CREATE PROCEDURE test_proc1()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+NULL;
+END;
+$$;
+CALL test_proc1();
+-- error: can't return non-NULL
+CREATE PROCEDURE test_proc2()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+RETURN 5;
+END;
+$$;
+CALL test_proc2();
+ERROR:  cannot return a value from a procedure
+CONTEXT:  PL/pgSQL function test_proc2() while casting return value to 
function's return type
+CREATE TABLE test1 (a int);
+CREATE PROCEDURE test_proc3(x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+INSERT INTO test1 VALUES (x);
+END;
+$$;
+CALL test_proc3(55);
+SELECT * FROM test1;
+ a  
+
+ 55
+(1 row)
+
+DROP PROCEDURE test_proc1;
+DROP PROCEDURE test_proc2;
+DROP PROCEDURE test_proc3;
+DROP TABLE test1;
diff --git a/src/pl/plpgsql/src/sql/plpgsql_call.sql 
b/src/pl/plpgsql/src/sql/plpgsql_call.sql
new file mode 100644
index 00..38fd220e8f
--- /dev/null
+++ b/src/pl/plpgsql/src/sql/plpgsql_call.sql
@@ -0,0 +1,47 @@
+--
+-- Tests for procedures / CALL syntax
+--
+
+CREATE PROCEDURE test_proc1()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+NULL;
+END;
+$$;
+
+CALL test_proc1();
+
+
+-- error: can't return non-NULL
+CREATE PROCEDURE test_proc2()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+RETURN 5;
+END;
+$$;
+
+CALL test_proc2();
+
+
+CREATE TABLE test1 (a int);
+
+CREATE PROCEDURE test_proc3(x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+INSERT INTO test1 VALUES (x);
+END;
+$$;
+
+CALL test_proc3(55);

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Tom Lane

Robert Haas  writes:
> Long story short, I don't think anyone cares about this enough to
> spend effort fixing it.  I suspect the money data type has very few
> users.

Somebody did contribute the effort not too long ago to install that
overflow check into cash_in.  So maybe somebody will step up and fix
the other money functions.  I can't get too excited about it in the
meantime.

regards, tom lane

Re: Incorrect debug info printed in generate_partition_wise_join_paths

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 9:05 PM, Etsuro Fujita
 wrote:
> (2017/12/11 17:24), Ashutosh Bapat wrote:
>> Yes, that's the correct fix. We should be printing debug information
>> about the child and not the parent.
>
> Thanks for reviewing!

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 5:50 PM, Andres Freund  wrote:
> This leads to fun like:
>
> postgres[2002][1]=# SELECT '92233720368547758.07'::money+'0.1';
> ┌─┐
> │  ?column?   │
> ├─┤
> │ -$92,233,720,368,547,757.99 │
> └─┘

It seems like I could have a lot MORE fun if I did it the other way --
i.e. spent myself so deeply into debt that I went positive again.

Seriously, though, this same issue was noted in a discussion back in 2011:

https://www.postgresql.org/message-id/flat/AANLkTi%3Dzbyy2%3Dcq8Wa3K3%2B%3Dn2ynkR1kdTHECnoruWS_G%40mail.gmail.com#AANLkTi=zbyy2=cq8Wa3K3+=n2ynkr1kdthecnoruw...@mail.gmail.com

Long story short, I don't think anyone cares about this enough to
spend effort fixing it.  I suspect the money data type has very few
users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 4:10 PM, Masahiko Sawada  wrote:
> Thank you for updating the patch. Here is two minor comments.
>
> + * we acquire the same relation extension lock repeatedly.  nLocks is 0 is 
> the
> + * number of times we've acquired that lock;
>
> Should it be "nLocks is the number of times we've acquired that lock:"?

Yes.

> +/* Remember lock held by this backend */
> +held_relextlock.relid = relid;
> +held_relextlock.lock = relextlock;
> +held_relextlock.nLocks = 1;
>
> We set held_relextlock.relid and held_relextlock.lock again. Can we remove 
> them?

Yes.

Can you also try the experiment Andres mentions: "Measure two COPYs to
relations on different filesystems, reduce N_RELEXTLOCK_ENTS to 1, and
measure performance. Then increase the concurrency of the copies to
each relation."  We want to see whether and how much this regresses
performance in that case.  It simulates the case of a hash collision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

portal pinning

2017-12-12 Thread Peter Eisentraut

While working on transaction control in procedures, I noticed some
inconsistencies in how portal pinning is used.

This mechanism was introduced in
eb81b6509f4c9109ecf8839d8c482cc597270687 to prevent user code from
closing cursors that PL/pgSQL has created internally, mainly for FOR
loops.  Otherwise, user code could just write CLOSE ''
to mess with the language internals.

It seems to me that PL/Perl and PL/Python should also use that
mechanism, because the same problem could happen there.  (PL/Tcl does
not expose any cursor functionality AFAICT.)  Currently, in PL/Perl, if
an internally generated cursor is closed, PL/Perl just thinks the cursor
has been exhausted and silently does nothing.  PL/Python comes back with
a slightly bizarre error message "closing a cursor in an aborted
subtransaction", which might apply in some situations but not in all.

Attached is a sample patch that adds portal pinning to PL/Perl and
PL/Python.

But I also wonder whether we shouldn't automatically pin/unpin portals
in SPI_cursor_open() and SPI_cursor_close().  This makes sense if you
consider "pinned" to mean "internally generated".  I don't think there
is a scenario in which user code should directly operate on a portal
created by SPI.

Comments?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ee052fc03d51d35617935679c30041127b635e21 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 12 Dec 2017 10:26:47 -0500
Subject: [PATCH] Use portal pinning in PL/Perl and PL/Python

---
 src/backend/utils/mmgr/portalmem.c  | 14 ++
 src/pl/plperl/plperl.c  |  8 
 src/pl/plpython/plpy_cursorobject.c |  8 
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/mmgr/portalmem.c 
b/src/backend/utils/mmgr/portalmem.c
index d03b779407..e1872b8fda 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -464,11 +464,17 @@ PortalDrop(Portal portal, bool isTopCommit)
 
/*
 * Don't allow dropping a pinned portal, it's still needed by whoever
-* pinned it. Not sure if the PORTAL_ACTIVE case can validly happen or
-* not...
+* pinned it.
 */
-   if (portal->portalPinned ||
-   portal->status == PORTAL_ACTIVE)
+   if (portal->portalPinned)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_CURSOR_STATE),
+errmsg("cannot drop pinned portal \"%s\"", 
portal->name)));
+
+   /*
+* Not sure if the PORTAL_ACTIVE case can validly happen or not...
+*/
+   if (portal->status == PORTAL_ACTIVE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_CURSOR_STATE),
 errmsg("cannot drop active portal \"%s\"", 
portal->name)));
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 9f5313235f..5dd3026dc2 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -3405,6 +3405,8 @@ plperl_spi_query(char *query)
 SPI_result_code_string(SPI_result));
cursor = cstr2sv(portal->name);
 
+   PinPortal(portal);
+
/* Commit the inner transaction, return to outer xact context */
ReleaseCurrentSubTransaction();
MemoryContextSwitchTo(oldcontext);
@@ -3468,6 +3470,7 @@ plperl_spi_fetchrow(char *cursor)
SPI_cursor_fetch(p, true, 1);
if (SPI_processed == 0)
{
+   UnpinPortal(p);
SPI_cursor_close(p);
row = _sv_undef;
}
@@ -3519,7 +3522,10 @@ plperl_spi_cursor_close(char *cursor)
p = SPI_cursor_find(cursor);
 
if (p)
+   {
+   UnpinPortal(p);
SPI_cursor_close(p);
+   }
 }
 
 SV *
@@ -3883,6 +3889,8 @@ plperl_spi_query_prepared(char *query, int argc, SV 
**argv)
 
cursor = cstr2sv(portal->name);
 
+   PinPortal(portal);
+
/* Commit the inner transaction, return to outer xact context */
ReleaseCurrentSubTransaction();
MemoryContextSwitchTo(oldcontext);
diff --git a/src/pl/plpython/plpy_cursorobject.c 
b/src/pl/plpython/plpy_cursorobject.c
index 9467f64808..a527585b81 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -151,6 +151,8 @@ PLy_cursor_query(const char *query)
 
cursor->portalname = MemoryContextStrdup(cursor->mcxt, 
portal->name);
 
+   PinPortal(portal);
+
PLy_spi_subtransaction_commit(oldcontext, oldowner);
}
PG_CATCH();
@@ -266,6 +268,8 @@ PLy_cursor_plan(PyObject *ob, PyObject *args)

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Avinash Kumar

It takes a few hours to learn PostgreSQL and 1 second to remember, how to
quit from psql.
Surprised to see the depth of discussion on this.

It takes sometime for a person to explore or know about PostgreSQL and
install it.
Likewise, it takes him some more time to understand that there exists a
"psql" to connect to it.
He knows about psql because he searched it somewhere. Just a psql --help
(anybody uses help), should help me know much more about psql.
I think many people doesn't find it difficult to quit using \q.

I think its good to talk about or help with more features that helps a
production database.

Hope this discussion gets to a final conclusion soon :)

On Tue, Dec 12, 2017 at 11:19 AM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, Dec 12, 2017 at 8:04 AM, Jeremy Finzel  wrote:
>
>> I don't think this is a significant stumbling block for users adopting
>> postgres.
>>
>
> The OP complained, and I tend to agree, that 90% of the world uses the
> command "exit" to end a program and that it would be nice if we did as
> well.  This is not just a new user consideration - and I'd say that making
> it so help/quit/exit/\q on a line of their own is helpfully reported to the
> user as having been ineffective if in the middle of a string or
> statement-continuation, would benefit more than just new users as well.
>
> David J.
>
>

-- 
9000799060

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread David G. Johnston

On Tue, Dec 12, 2017 at 8:04 AM, Jeremy Finzel  wrote:

> I don't think this is a significant stumbling block for users adopting
> postgres.
>

The OP complained, and I tend to agree, that 90% of the world uses the
command "exit" to end a program and that it would be nice if we did as
well.  This is not just a new user consideration - and I'd say that making
it so help/quit/exit/\q on a line of their own is helpfully reported to the
user as having been ineffective if in the middle of a string or
statement-continuation, would benefit more than just new users as well.

David J.

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Alexander Korotkov

Hi!

Let me add my two cents too.

On Tue, Dec 12, 2017 at 2:41 PM, Ildus Kurbangaliev <
i.kurbangal...@postgrespro.ru> wrote:

> On Mon, 11 Dec 2017 20:53:29 +0100 Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
> > Replacing the algorithm used to compress all varlena values (in a way
> > that makes it transparent for the data type code).
> > --
> >
> > While pglz served us well over time, it was repeatedly mentioned that
> > in some cases it becomes the bottleneck. So supporting other state of
> > the art compression algorithms seems like a good idea, and this patch
> > is one way to do that.
> >
> > But perhaps we should simply make it an initdb option (in which case
> > the whole cluster would simply use e.g. lz4 instead of pglz)?
> >
> > That seems like a much simpler approach - it would only require some
> > ./configure options to add --with-lz4 (and other compression
> > libraries), an initdb option to pick compression algorithm, and
> > probably noting the choice in cluster controldata.
>
> Replacing pglz for all varlena values wasn't the goal of the patch, but
> it's possible to do with it and I think that's good. And as Robert
> mentioned pglz could appear as builtin undroppable compresssion method
> so the others could be added too. And in the future it can open the
> ways to specify compression for specific database or cluster.
>

Yes, usage of custom compression methods to replace generic non
type-specific compression method is really not the primary goal of this
patch.  However, I would consider that as useful side effect.  However,
even in this case I see some advantages of custom compression methods over
initdb option.

1) In order to support alternative compression methods in initdb, we have
to provide builtin support for them.  Then we immediately run into
dependencies/incompatible-licenses problem.  Also, we tie appearance of new
compression methods to our release cycle.  In real life, flexibility means
a lot.  Giving users freedom to experiment with various compression
libraries without having to recompile PostgreSQL core is great.
2) It's not necessary that users would be satisfied with applying single
compression method to the whole database cluster.  Various columns may have
different data distributions with different workloads.  Optimal compression
type for one column is not necessary optimal for another column.
3) Possibility to change compression method on the fly without re-initdb is
very good too.

> Custom datatype-aware compression (e.g. the tsvector)
> > --
> >
> > Exploiting knowledge of the internal data type structure is a
> > promising way to improve compression ratio and/or performance.
> >
> > The obvious question of course is why shouldn't this be done by the
> > data type code directly, which would also allow additional benefits
> > like operating directly on the compressed values.
> >
> > Another thing is that if the datatype representation changes in some
> > way, the compression method has to change too. So it's tightly coupled
> > to the datatype anyway.
> >
> > This does not really require any new infrastructure, all the pieces
> > are already there.
> >
> > In some cases that may not be quite possible - the datatype may not be
> > flexible enough to support alternative (compressed) representation,
> > e.g. because there are no bits available for "compressed" flag, etc.
> >
> > Conclusion: IMHO if we want to exploit the knowledge of the data type
> > internal structure, perhaps doing that in the datatype code directly
> > would be a better choice.
>
> It could be, but let's imagine there will be internal compression for
> tsvector. It means that tsvector has two formats now and minus one bit
> somewhere in the header. After a while we found a better compression
> but we can't add it because there is already one and it's not good to
> have three different formats for one type. Or, the compression methods
> were implemented and we decided to use dictionaries for tsvector (if
> the user going to store limited number of words). But it will mean that
> tsvector will go two compression stages (for its internal and for
> dictionaries).

I would like to add that even for single datatype various compression
methods may have different tradeoffs.  For instance, one compression method
can have better compression ratio, but another one have faster
decompression.  And it's OK for user to choose different compression
methods for different columns.

Depending extensions on datatype internal representation doesn't seem evil
for me.  We already have bunch of extension depending on much more deeper
guts of PostgreSQL.  On major release of PostgreSQL, extensions must adopt
the changes, that is the rule.  And note, the datatype internal
representation alters relatively rare in comparison with other internals,
because it's related to on-disk

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Jeremy Finzel

Someone mentioned about needing to read the documentation of vim to learn
how to exit the program.  I personally think exactly the same applies here,
and am a bit surprised at the depth of discussion over this.

When I first was new to cli programs, the only "standard" way to exit a
program I found across the board was CTRL+D.  I have never even thought
that psql, even though different, was odd in its choice of how to exit the
program.  And their choices for why \q is used in a SQL cli program have
very good reasons as has been discussed.

So what if that stack overflow is the most hit on psql questions?  How many
of you have learned what you need within 5 seconds on google?  I don't
think this is a significant stumbling block for users adopting postgres.
psql is amazing and for anyone who likes cli programs, they will be fine.

FWIW I am +1 in favor of not overcomplicating the special psql commands and
keep to the "\" standard.

On Tue, Dec 12, 2017 at 8:35 AM, Gasper Zejn  wrote:

> On 12. 12. 2017 06:58, Tom Lane wrote:
> > Craig Ringer  writes:
> >> I doubt I've ever written just "exit" or "quit" without indentation. I
> >> think if it requires them to be a bareword with no indentation, strictly
> >> ^(exit|quit)\n when isatty, then that's probably a safe and helpful
> choice.
> > FWIW, I think that the special behavior (whatever it winds up being
> > exactly) ought to trigger on an input line matching
> >
> >   [whitespace]*help[whitespace]*(;[whitespace]*)?
> >
> > and similarly for exit/quit.  I think that novices might have
> internalized
> > enough SQL syntax to think that they need to terminate the command with a
> > semicolon --- in fact, we regularly see examples in which seasoned users
> > think they need to terminate backslash commands with a semicolon, so
> > that's hardly far-fetched.  And we might as well allow as much whitespace
> > as we can, because nobody but Guido Rossum thinks that having whitespace
> > be semantically significant is a good idea.
> >
> >   regards, tom lane
> >
> If tabs are considered whitespace, psql can sometimes treat it as
> semantically significant, since
> "create materializedtest11 as select 1;" will be autocompleted to
> correct syntax if you paste the line into interactive psql session.
>
> I have seen psql error out with invalid syntax when a similar query was
> pasted and psql autocompleted it.
>
> kind regards, Gasper
>
>
>

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada  wrote:
>> The question I have is how would we deal with a foreign server that is
>> not available for longer duration due to crash, longer network outage
>> etc. Example is the foreign server crashed/got disconnected after
>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
>> blocked for much longer duration without user having an idea of what's
>> going on. May be we should add some timeout.
>
> After more thought, I agree with adding some timeout. I can image
> there are users who want the timeout, for example, who cannot accept
> even a few seconds latency. If the timeout occurs backend unlocks the
> foreign transactions and breaks the loop. The resolver process will
> keep to continue to resolve foreign transactions at certain interval.

I don't think a timeout is a very good idea.  There is no timeout for
synchronous replication and the issues here are similar.  I will not
try to block a patch adding a timeout, but I think it had better be
disabled by default and have very clear documentation explaining why
it's really dangerous.  And this is why: with no timeout, you can
count on being able to see the effects of your own previous
transactions, unless at some point you sent a query cancel or got
disconnected.  With a timeout, you may or may not see the effects of
your own previous transactions depending on whether or not you hit the
timeout, which you have no sure way of knowing.

>>> transactions after the coordinator server recovered. On the other
>>> hand, for the reading a consistent result on such situation by
>>> subsequent reads, for example, we can disallow backends to inquiry SQL
>>> to the foreign server if a foreign transaction of the foreign server
>>> is remained.
>>
>> +1 for the last sentence. If we do that, we don't need the backend to
>> be blocked by resolver since a subsequent read accessing that foreign
>> server would get an error and not inconsistent data.
>
> Yeah, however the disadvantage of this is that we manage foreign
> transactions per foreign servers. If a transaction that modified even
> one table is remained as a in-doubt transaction, we cannot issue any
> SQL that touches that foreign server. Can we occur an error at
> ExecInitForeignScan()?

I really feel strongly we shouldn't complicate the initial patch with
this kind of thing.  Let's make it enough for this patch to guarantee
that either all parts of the transaction commit eventually or they all
abort eventually.  Ensuring consistent visibility is a different and
hard project, and if we try to do that now, this patch is not going to
be done any time soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Testing Extension Upgrade Paths

2017-12-12 Thread Jeremy Finzel

On Mon, Dec 11, 2017 at 7:55 PM, Craig Ringer  wrote:

> On 12 December 2017 at 07:25, Michael Paquier 
> wrote:
>
>> On Tue, Dec 12, 2017 at 2:55 AM, Jeremy Finzel  wrote:
>> > I understand how to setup a regression suite for a postgres extension,
>> but
>> > what I'm not clear on from the docs is if there is a pattern that
>> exists for
>> > testing not only the latest version of an extension, but also an
>> upgraded
>> > previous version.
>> >
>> > Is there any straightforward way to do this that doesn't involve
>> manually
>> > copying tests?
>>
>> It is perfectly possible to do tests on a specific version and test
>> upgrade paths using CREATE EXTENSION and ALTER EXTENSION which support
>> versioning in a SQL regression script, say:
>> CREATE EXTENSION foo VERSION "0.1";
>> -- Do stuff
>> ALTER EXTENSION foo UPDATE TO "0.2";
>> -- Do other stuff
>>
>
> This works well when you want to test 1.1 binaries with a 1.0 SQL
> extension definition.
>
> It's not effective if you need to test 1.0 binaries+extension, then an
> upgrade to 1.1 binaries and extension. You have no way to install and run
> the 1.0 binaries without going outside pg_regress / TAP and using an
> external test harness/framework/script. I've been bitten by this multiple
> times before. For example, if 1.1 always assigns a value to some column
> that was nullable in 1.0, you're never going to exercise 1.1's handling of
> nulls in that column.
>
> It also doesn't lend its self to complex multi-path upgrades, where for
> example you could upgrade 1.0.0 -> 1.0.1 -> 1.1.0 or  upgrade directly from
> 1.0.0 -> 1.1.0. This rapidly becomes an issue when you release 1.0.0,
> release 1.1.0, then release a maintenance 1.0.1  version. Now you have
> toadd the 1.0.1 -> 1.1.0 upgrade path, but you still have the 1.0.0 ->
> 1.1.0 path.
>
> You can handle that with TAP if you're willing to write something to do
> the various combinations of steps and exercise them. It's not practical
> with pg_regress.
>
> More thorough testing benefits from an external harness.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Thanks both of you for the feedback.

I understood that I can do CREATE then ALTER EXTENSION UPDATE, but my goal
was to actually be able to run this workflow multiple ways without having
to hardcode that or duplicate steps.  To clarify, my intention was to only
test the binary for 1.1, but to then run the same test steps (without
having to duplicate things) under 2 separate scenarios:

   1. Create the extension right away at version 1.1, run the suite
   2. Create the extension at 1.0, populate extension config tables as if I
   have actually been using it, then upgrade to 1.1, and run the suite.

Here is how I ended up implementing this, and I am very open to feedback:

The first file is like this:

-- Allow running regression suite with upgrade paths
\set v `echo ${FROMVERSION:-1.1}`
SET client_min_messages = warning;
CREATE EXTENSION pglogical;
CREATE EXTENSION pgl_ddl_deploy VERSION :'v';

So if you run the suite without any environment variable, it will start at
1.1.  But if you add FROMVERSION=1.0, it will start at 1.0.

Then in test steps 2 and 3, I assume test results are identical between 1.0
and 1.1.  But in step 4, I run ALTER EXTENSION UPDATE, which is either a
no-op, or actually upgrades from 1.0.  The remaining tests in the whole
suite are designed to yield identical results in either path.

I am fairly happy with this because what I really wanted to test is
upgrading from 1.0 to 1.1 as opposed to a bare 1.1 install.

With yet later versions, I would of course need to modify this as
necessary, and I would want to test then yet more of the upgrade paths as
it is feasible.

Thanks,
Jeremy

Re: Out of date comment in cached_plan_cost

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 5:00 AM, David Rowley
 wrote:
> On 11 December 2017 at 21:39, Ashutosh Bapat
>  wrote:
>> I don't see much difference in the old and new wording. The word
>> "generally" confuses more than clarifying the cases when the planning
>> cost curves do not change with the number of relations i.e.
>> partitions.
>
> I added that to remove the false claim that inheritance children don't
> make the join problem more complex. This was only true before we had
> partition-wise joins.
>
> I've re-read my original patch and I don't really see the problem with
> it. The comment is talking about inheritance child relations, which
> you could either interpret to mean INHERITS (sometable), or some table
> listed in pg_inherits. The statement that I added forces me into
> thinking of the former rather than the latter, so I don't really see
> any issue.
>
> I'm going to leave it here. I don't want to spend too much effort
> rewording a comment. My vote is for the original patch I sent. I only
> changed it because Robert complained that technically an inheritance
> child could actually be a partition.

I basically feel like we're not really solving any problem by changing
this.  I mean, partition-wise join makes this statement less true, but
adding the word "generally" doesn't really help anybody understand the
situation better.  If we're going to add anything here, I think it
should be something like:

(This might need to be rethought in light of partition-wise join.)

If that's more specific than we want to get, then let's just leave it
alone.  Partition-wise join isn't even enabled by default at this
point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Inconsistency in plpgsql's error context reports

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 3:52 PM, Tom Lane  wrote:
> Here's a quick hack at that.  I guess the main question that needs to be
> asked is whether we're happy with plpgsql getting so much chattier
> (as per all the regression test changes).
>
> If we're not, the alternative that could be considered is to make SPI
> expose some way to suppress its use of a context callback, and change
> plpgsql to invoke that when dealing with an expression.  That would
> be rather more invasive code-wise, but would largely eliminate the
> behavioral change as seen by users.
>
> Another angle, if we do keep it like this, is that maybe SPI should
> export _SPI_error_callback so that plpgsql can use it directly,
> rather than having a copy that needs to be kept in sync.

I confess to never having really grokked, even in the pre-patch state,
why we sometimes get an "SQL statement" context line and sometimes
not.  However, what strikes me about this is that the SQL statement is
a completely fabricated one that the user never entered.  Consider
this bit from the existing regression test output, unpatched:

create function namedparmcursor_test7() returns void as $$
declare
  c1 cursor (p1 int, p2 int) for
select * from tenk1 where thousand = p1 and tenthous = p2;
begin
  open c1 (p2 := 77, p1 := 42/0);
end $$ language plpgsql;
select namedparmcursor_test7();
ERROR:  division by zero
CONTEXT:  SQL statement "SELECT 42/0 AS p1, 77 AS p2;"
PL/pgSQL function namedparmcursor_test7() line 6 at OPEN

Quite obviously, nothing like "SELECT 42/0 AS p1, 77 AS p2;" is
present in there anywhere.  When people see an SQL statement in the
context, or at least when I see it, my inclination is to go grep for
where that SQL statement is to be found, and to be unhappy when the
answer is nowhere.  It would really be a lot better if we could say
something like

CONTEXT: SQL expression 42/0

...but I realize that's probably hard to do.  However, the current
situation is that plpgsql.out contains 5 "SQL statement" context
lines, of which only 1 is an SQL statement that actually appears in
the procedure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 10:04 PM, Alvaro Herrera
 wrote:
> David Rowley wrote:
>
>> ATTACH/REPLACE sounds fine. My objection was more about the
>> DETACH/ATTACH method to replace an index.
>
> So what happens if you do ALTER INDEX .. ATTACH and you already have
> another index in that partition that is attached to the same parent in
> the index?

I think that should be an ERROR.  You can use REPLACE if you want to
switch which index is attached, but you shouldn't be able to attach
two indexes from the same partition at the same time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2017-12-12 Thread Oleg Bartunov

On Mon, Dec 11, 2017 at 11:11 PM, Nikolay Samokhvalov
 wrote:
> Very interesting read: https://arxiv.org/abs/1712.01208
>
> HN discussion: https://news.ycombinator.com/item?id=15894896
>
> Some of the comments (from Twitter
> https://twitter.com/schrockn/status/940037656494317568): "Jeff Dean and co
> at GOOG just released a paper showing how machine-learned indexes can
> replace B-Trees, Hash Indexes, and Bloom Filters. Execute 3x faster than
> B-Trees, 10-100x less space. Executes on GPU, which are getting faster
> unlike CPU. Amazing."
>
> Can those ideas be applied to Postgres in its current state? Or it's not
> really down-to-earth?

Oleg made some analysis of the paper.

Re: CUBE seems a bit confused about ORDER BY

2017-12-12 Thread Alexander Korotkov

On Tue, Dec 12, 2017 at 3:49 PM, Teodor Sigaev  wrote:

> Yes, the thing is that we change behavior of existing ~> operator.  In
>> general, this is not good idea because it could affect existing users whose
>> already use this operator.  Typically in such situation, we could leave
>> existing operator as is, and invent new operator with new behavior.
>> However, in this particular case, I think there are reasons to make an
>> exception to the rules.  The reasons are following:
>> 1) The ~> operator was designed especially for knn gist.
>> 2) Knn gist support for current behavior is broken by design and can't be
>> fixed.  Most we can do to fix existing ~> operator behavior as is to drop
>> knn gist support.  But then ~> operator would be left useless.
>> 3) It doesn't seems that ~> operator have many users yet, because an
>> error wasn't reported during whole release cycle.
>>
>> I hope these reasons justify altering behavior of existing operator as an
>> exception to the rules.  Another question is whether we should backpatch
>> it.  But I think we could leave this decision to committer.
>>
>> I think that this patch is ready for committer.
>>
> I'm agree with changing behavior of existing ~> operator, but is any
> objection here? Current implementation is not fixable and I hope that users
> which use this operator will understand why we change it. Fortunately, the
> fix doesn't require changes in system catalog.
>
> The single question here is about index over expression with this
> operator, they will need to reindex, which should be noted in release notes.


Yes.  I bet only few users have built indexes over ~> operator if any.  Ask
them to reindex in the release notes seems OK for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: CUBE seems a bit confused about ORDER BY

2017-12-12 Thread Teodor Sigaev

Yes, the thing is that we change behavior of existing ~> operator.  In general, 
this is not good idea because it could affect existing users whose already use 
this operator.  Typically in such situation, we could leave existing operator as 
is, and invent new operator with new behavior.  However, in this particular 
case, I think there are reasons to make an exception to the rules.  The reasons 
are following:

1) The ~> operator was designed especially for knn gist.
2) Knn gist support for current behavior is broken by design and can't be 
fixed.  Most we can do to fix existing ~> operator behavior as is to drop knn 
gist support.  But then ~> operator would be left useless.
3) It doesn't seems that ~> operator have many users yet, because an error 
wasn't reported during whole release cycle.


I hope these reasons justify altering behavior of existing operator as an 
exception to the rules.  Another question is whether we should backpatch it.  
But I think we could leave this decision to committer.


I think that this patch is ready for committer.
I'm agree with changing behavior of existing ~> operator, but is any objection 
here? Current implementation is not fixable and I hope that users which use this 
operator will understand why we change it. Fortunately, the fix doesn't require 
changes in system catalog.


The single question here is about index over expression with this operator, they 
will need to reindex, which should be noted in release notes.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] logical decoding of two-phase transactions

2017-12-12 Thread Simon Riggs

On 12 December 2017 at 12:04, Nikhil Sontakke  wrote:

> This is an issue because of the way we are handling invalidations. We
> don't allow ReorderBufferAddInvalidations() at COMMIT PREPARE time
> since we assume that handling them at PREPARE time is enough.
> Apparently, it's not enough.

Not sure what that means.

I think we would need to fire invalidations at COMMIT PREPARED, yet
logically decode them at PREPARE.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2017-12-12 Thread Teodor Sigaev

0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement to 
documentation of word_similarity() and related operators.  I decided to give 
formal definition first (what exactly it internally does), and then example and 
some more human-understandable description.  This patch also adjusts two 
comments where lower and upper bounds mess up.


I'm ready for commit that, but I'd like someone from native English speaker to 
check that. Thank you.


And, suppose, this patch should be backpatched to 9.6

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-12 Thread Etsuro Fujita


Hi Maksim,

(2017/12/12 17:57), Maksim Milyutin wrote:

Your patch already is not applied on master. Please rebase it.


Done.  Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***
*** 176,182  COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
--- 176,188 
  SELECT tableoid::regclass, * FROM pt;
  SELECT tableoid::regclass, * FROM p1;
  SELECT tableoid::regclass, * FROM p2;
+ \t on
+ EXPLAIN (VERBOSE, COSTS FALSE) INSERT INTO pt VALUES (1, 'xyzzy');
+ \t off
  INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
+ \t on
+ EXPLAIN (VERBOSE, COSTS FALSE) INSERT INTO pt VALUES (2, 'xyzzy');
+ \t off
  INSERT INTO pt VALUES (2, 'xyzzy');
  SELECT tableoid::regclass, * FROM pt;
  SELECT tableoid::regclass, * FROM p1;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***
*** 315,321  SELECT tableoid::regclass, * FROM p2;
  (0 rows)
  
  COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter ','); -- ERROR
! ERROR:  cannot route inserted tuples to a foreign table
  CONTEXT:  COPY pt, line 2: "1,qux"
  COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
  SELECT tableoid::regclass, * FROM pt;
--- 315,321 
  (0 rows)
  
  COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter ','); -- ERROR
! ERROR:  cannot route copied tuples to a foreign table
  CONTEXT:  COPY pt, line 2: "1,qux"
  COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
  SELECT tableoid::regclass, * FROM pt;
***
*** 341,348  SELECT tableoid::regclass, * FROM p2;
   p2   | 2 | qux
  (2 rows)
  
  INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
! ERROR:  cannot route inserted tuples to a foreign table
  INSERT INTO pt VALUES (2, 'xyzzy');
  SELECT tableoid::regclass, * FROM pt;
   tableoid | a |   b   
--- 341,366 
   p2   | 2 | qux
  (2 rows)
  
+ \t on
+ EXPLAIN (VERBOSE, COSTS FALSE) INSERT INTO pt VALUES (1, 'xyzzy');
+  Insert on public.pt
+Foreign Insert on public.p1
+Insert on public.p2
+->  Result
+  Output: 1, 'xyzzy'::text
+ 
+ \t off
  INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
! ERROR:  cannot route inserted tuples to foreign table "p1"
! \t on
! EXPLAIN (VERBOSE, COSTS FALSE) INSERT INTO pt VALUES (2, 'xyzzy');
!  Insert on public.pt
!Foreign Insert on public.p1
!Insert on public.p2
!->  Result
!  Output: 2, 'xyzzy'::text
! 
! \t off
  INSERT INTO pt VALUES (2, 'xyzzy');
  SELECT tableoid::regclass, * FROM pt;
   tableoid | a |   b   
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 7088,7093  NOTICE:  drop cascades to foreign table bar2
--- 7088,7295 
  drop table loct1;
  drop table loct2;
  -- ===
+ -- test tuple routing for foreign-table partitions
+ -- ===
+ create table pt (a int, b int) partition by list (a);
+ create table locp partition of pt for values in (1);
+ create table locfoo (a int check (a in (2)), b int);
+ create foreign table remp partition of pt for values in (2) server loopback options (table_name 'locfoo');
+ explain (verbose, costs off)
+ insert into pt values (1, 1), (2, 1);
+QUERY PLAN
+ -
+  Insert on public.pt
+Insert on public.locp
+Foreign Insert on public.remp
+  Remote SQL: INSERT INTO public.locfoo(a, b) VALUES ($1, $2)
+->  Values Scan on "*VALUES*"
+  Output: "*VALUES*".column1, "*VALUES*".column2
+ (6 rows)
+ 
+ insert into pt values (1, 1), (2, 1);
+ select tableoid::regclass, * FROM pt;
+  tableoid | a | b 
+ --+---+---
+  locp | 1 | 1
+  remp | 2 | 1
+ (2 rows)
+ 
+ select tableoid::regclass, * FROM locp;
+  tableoid | a | b 
+ --+---+---
+  locp | 1 | 1
+ (1 row)
+ 
+ select tableoid::regclass, * FROM remp;
+  tableoid | a | b 
+ --+---+---
+  remp | 2 | 1
+ (1 row)
+ 
+ explain (verbose, costs off)
+ insert into pt values (1, 2), (2, 2) returning *;
+QUERY PLAN   
+ 
+  Insert on public.pt
+Output: pt.a, pt.b
+Insert on public.locp
+Foreign Insert on public.remp
+  Remote SQL: INSERT INTO public.locfoo(a, b) VALUES ($1, $2) RETURNING a, b
+->  Values Scan on "*VALUES*"
+  Output: "*VALUES*".column1, "*VALUES*".column2
+ (7 rows)
+ 
+ insert into pt values (1, 2), (2, 2) returning *;
+  a | b 
+ ---+---
+  1 | 2
+  2 | 2
+ (2 rows)
+ 
+ select

Re: [HACKERS] logical decoding of two-phase transactions

2017-12-12 Thread Nikhil Sontakke

Hi,

Thanks for the warning fix, I will also look at the cassert case soon.

I have been adding more test cases to this patch. I added a TAP test
which now allows us to do a concurrent ROLLBACK PREPARED when the
walsender is in the midst of decoding this very prepared transaction.

Have added  a "decode-delay" parameter to test_decoding via which each
apply call sleeps for a few configurable number of seconds allowing us
to have deterministic rollback in parallel. This logic seems to work
ok.

However, I am battling an issue with invalidations now. Consider the
below test case:

CREATE TABLE test_prepared1(id integer primary key);
-- test prepared xact containing ddl
BEGIN; INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
COMMIT PREPARED 'test_prepared#3';
SELECT data FROM pg_logical_slot_get_changes(..) <-- this shows the
2PC being decoded appropriately
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
SELECT data FROM pg_logical_slot_get_changes(..)

The last pg_logical_slot_get_changes call, shows:

table public.test_prepared1: INSERT: id[integer]:8

whereas since the 2PC committed, it should have shown:

table public.test_prepared1: INSERT: id[integer]:8 data[text]:null

This is an issue because of the way we are handling invalidations. We
don't allow ReorderBufferAddInvalidations() at COMMIT PREPARE time
since we assume that handling them at PREPARE time is enough.
Apparently, it's not enough. Am trying to allow invalidations at
COMMIT PREPARE time as well, but maybe calling
ReorderBufferAddInvalidations() blindly again is not a good idea.
Also, if I do that, then I am getting some restart_lsn inconsistencies
which causes subsequent pg_logical_slot_get_changes() calls to
re-decode older records. I continue to investigate.

I am attaching the latest WIP patch. This contains the additional TAP
test changes.

Regards,
Nikhils

On 8 December 2017 at 01:15, Peter Eisentraut
 wrote:
> On 12/7/17 08:31, Peter Eisentraut wrote:
>> On 12/4/17 10:15, Nikhil Sontakke wrote:
>>> PFA, latest patch for this functionality.
>>
>> This probably needs documentation updates for the logical decoding chapter.
>
> You need the attached patch to be able to compile without warnings.
>
> Also, the regression tests crash randomly for me at
>
> frame #4: 0x00010a6febdb
> postgres`heap_prune_record_prunable(prstate=0x7ffee5578990, xid=0)
> at pruneheap.c:625
>622   * This should exactly match the PageSetPrunable macro.  We
> can't store
>623   * directly into the page header yet, so we update working 
> state.
>624   */
> -> 625  Assert(TransactionIdIsNormal(xid));
>626  if (!TransactionIdIsValid(prstate->new_prune_xid) ||
>627  TransactionIdPrecedes(xid, prstate->new_prune_xid))
>628  prstate->new_prune_xid = xid;
>
> Did you build with --enable-cassert?
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

2pc_logical_12_12_17_wip.patch
Description: Binary data

Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

2017-12-12 Thread Amit Kapila

On Tue, Dec 12, 2017 at 4:00 PM, Kuntal Ghosh
 wrote:
> On Mon, Dec 11, 2017 at 2:26 PM, Thomas Munro
>  wrote:
>> On Mon, Dec 11, 2017 at 8:14 PM, Amit Kapila  wrote:
>>
>>> Thanks for looking into it.  I will see if we can write some test.  In
>>> the meantime if possible, can you please request Patrick Hemmer to
>>> verify the attached patch?
>>
>> Our discussion was on the #postgresql Freenode channel.  I pointed him
>> at this thread, but I'm not sure if he'll see my message or be able to
>> test.
> After discussing with Amit, I'm able to reproduce the scenario in a
> master-standby setup. The issue occurs when we perform parallel
> index(-only) scan on a BTP_HALF_DEAD -marked page. (If a page is
> marked as BTP_DELETED, it's already unlinked from the index).
>
> When a btree page is deleted during vacuum, it's first marked as
> BTP_HALF_DEAD in _bt_mark_page_halfdead and then marked as BTP_DELETED
> in _bt_unlink_halfdead_page without releasing cleanup lock on the
> buffer. Hence, any scan node cannot lock the same buffer. So, the
> issue can't be reproduced on master.
>
> However, after replaying XLOG_BTREE_MARK_PAGE_HALFDEAD record, standby
> releases the lock on the same buffer. If we force parallelism, an
> index scan on the same page will cause hang the standby server.
> Following is a (unpleasant)way to reproduce the issue:
>
> In master (with autovacuum = off):
> 1. create table t1(a int primary key);
> 2. insert into t1 select * from generate_series(1,1000); --generates 3
> leaf nodes (block no 1,2,4) and 1 root node (block no 3)
> 3. delete from t1 where a>=367 and a<=735; --delete all tuples pointed by 
> leaf 2
> 4. analyze t1; --update the stats
> 5. explain analyze select * from t1 where a>=1 and a<=1000; --ensures
> that the next vacuum will consider leaf 2 for page deletion

What do you mean by next vacuum, here autovacuum is off?  Are you
missing any step which manually performs the vacuum?

> Now, put a break point at _bt_unlink_halfdead_page, so that vacuum
> can't unlink the page.
>
> In standby,
> 1. force parallelism.
> 2. explain analyze select * from t1 where a>=1 and a<=1000; and the
> parallel workers hang at the above-discussed place!
>
> The attached patch fixes the issue. One comment on the same:
> + else if (scan->parallel_scan != NULL)
> + {
> + /* allow next page be processed by parallel worker */
> + _bt_parallel_release(scan, opaque->btpo_next);
> + }
>
>   /* nope, keep going */
>   if (scan->parallel_scan != NULL)
>
> IMHO, There is no need to add an extra if condition.
> _bt_parallel_release can be included in the next one.
>

I don't think so because it is quite possible that _bt_readpage has
already released it (consider the case where _bt_readpage returns
false).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

1 2 >

1 - 100 of 120 matches

Mail list logo