Re: [CORE] [HACKERS] back-branch multixact fixes 9.5 alpha/beta: schedule

2015-06-09 Thread David Gould
On Mon, 8 Jun 2015 13:53:42 -0300
Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 * people with the wrong oldestMulti setting in pg_control (which would
 be due to a buggy pg_upgrade being used long ago) will be unable to
 start if they upgrade to 9.3.7 or 9.3.8.  A solution for them would be
 to downgrade to 9.3.6.  We had reports of this problem starting just a
 couple of days after we released 9.4.2, I think.

Does this mean that for people with wrong oldestMulti settings in pg_control
due to a buggy pg_upgrade being used long ago can fix this by updating to
9.3.9 when it is released? Asking for a friend...

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [CORE] Restore-reliability mode

2015-06-08 Thread David Gould
On Mon, 8 Jun 2015 13:03:56 -0300
Claudio Freire klaussfre...@gmail.com wrote:

  Ohmygosh, you have to rpm install a bunch of -devel stuff? What a massive
  hardship.
 
 It's not about the 5 minutes of compile time, it's about the signalling.
 
 Just *when* is git ready for testing? You don't know from the outside.
 
 I do lurk here a lot and still am unsure quite often.
 
 Even simply releasing an alpha *tarball* would be useful enough. What
 is needed is the signal to test, rather than a fully-built package.

This. The clients I referred to earlier don't even use the rpm packages,
they build from sources. They need to know when it is worthwhile to take a
new set of sources and test. Some sort of labeling about what the contents
are would enable them to do this.

I don't think a monthly snapshot would work as well as the requirement is
knowing that grouping sets are in not that it is July now.

-dg

-- 
David Gouldda...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [CORE] Restore-reliability mode

2015-06-08 Thread David Gould

I think Alphas are valuable and useful and even more so if they have release
notes. For example, some of my clients are capable of fetching sources and
building from scratch and filing bug reports and are often interested in
particular new features. They even have staging infrastructure that could
test new postgres releases with real applications. But they don't do it.
They also don't follow -hackers, they don't track git, and they don't have
any easy way to tell if if the new feature they are interested in is
actually complete and ready to test at any particular time. A lot of
features are developed in multiple commits over a period of time and they
see no point in testing until at least most of the feature is complete and
expected to work. But it is not obvious from outside when that happens for
any given feature. For my clients the value of Alpha releases would
mainly be the release notes, or some other mark in the sand that says As of
Alpha-3 feature X is included and expected to mostly work.

-dg

-- 
David Gould   da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vitesse DB call for testing

2014-10-17 Thread David Gould
On Fri, 17 Oct 2014 13:12:27 -0400
Tom Lane t...@sss.pgh.pa.us wrote:

 CK Tan ck...@vitessedata.com writes:
  The bigint sum,avg,count case in the example you tried has some 
  optimization. We use int128 to accumulate the bigint instead of numeric in 
  pg. Hence the big speed up. Try the same query on int4 for the improvement 
  where both pg and vitessedb are using int4 in the execution.
 
 Well, that's pretty much cheating: it's too hard to disentangle what's
 coming from JIT vs what's coming from using a different accumulator
 datatype.  If we wanted to depend on having int128 available we could
 get that speedup with a couple hours' work.
 
 But what exactly are you compiling here?  I trust not the actual data
 accesses; that seems far too complicated to try to inline.
 
   regards, tom lane
 
 

I don't have any inside knowledge, but from the presentation given at the
recent SFPUG followed by a bit of google-fu I think these papers are
relevant:

  http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
  http://sites.computer.org/debull/A14mar/p3.pdf

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spin Lock sleep resolution

2013-06-18 Thread David Gould
On Tue, 18 Jun 2013 10:09:55 +0300
Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 02.04.2013 22:58, David Gould wrote:

  I'll give the patch a try, I have a workload that is impacted by spinlocks
  fairly heavily sometimes and this might help or at least give me more
  information. Thanks!
 
 Did you ever get around to test this?
 
 I repeated these pgbench tests I did earlier:
 
 http://www.postgresql.org/message-id/5190e17b.9060...@vmware.com
 
 I concluded in that thread that on this platform, the TAS_SPIN macro 
 really needs a non-locked test before the locked one. That fixes the big 
 fall in performance with more than 28 clients.
... 
 I wasn't expecting much of a gain from this, just wanted to verify that 
 it's not making things worse. So looks good to me.

Thanks for the followup, and I really like your graph, it looks exactly
like what we were hitting. My client ended up configuring around it
and adding more hosts so the urgency to run more tests sort of declined,
although I think we still hit it from time to time.

If you would like to point me at or send me the latest flavor of the patch
it may be timely for me to test again. Especially if this is a more or less
finished version, we are about to roll out a new build to all these hosts
and I'd be happy to try to incorporate this patch and get some production
experience with it on 80 core hosts.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spin Lock sleep resolution

2013-06-18 Thread David Gould
On Tue, 18 Jun 2013 11:41:06 +0300
Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Oh, interesting. What kind of hardware are you running on? To be honest, 
 I'm not sure what my test hardware is, it's managed by another team 
 across the world, but /proc/cpuinfo says:
 
 model name: Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz

It claims to have 80 of these:

  model name : Intel(R) Xeon(R) CPU E7-L8867  @2.13GHz

Postgres is on ramfs on these with unlogged tables.


 And it's running in a virtual machine on VMware; that might also be a 
 factor.

I'm not a fan of virtualization. It makes performance even harder to
reason about.

 
 It would be good to test the TAS_SPIN nonlocked patch on a variety of 
 systems. The comments in s_lock.h say that on Opteron, the non-locked 
 test is a huge loss. In particular, would be good to re-test that on a 
 modern AMD system.

I'll see what I can do. However I don't have acces to any large modern AMD
systems.

-dg


-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spin Lock sleep resolution

2013-04-02 Thread David Gould
On Tue, 2 Apr 2013 09:01:36 -0700
Jeff Janes jeff.ja...@gmail.com wrote:

 Sorry.  I triple checked that the patch was there, but it seems like if you
 save a draft with an attachment, when you come back later to finish and
 send it, the attachment may not be there anymore.  The Gmail Offline teams
 still has a ways to go.  Hopefully it is actually there this time.

I'll give the patch a try, I have a workload that is impacted by spinlocks
fairly heavily sometimes and this might help or at least give me more
information. Thanks!

-dg

-- 
David Gould   da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-14 Thread David Gould
On Fri, 14 Dec 2012 15:39:44 -0500
Robert Haas robertmh...@gmail.com wrote:

 On Wed, Dec 12, 2012 at 8:29 AM, David Gould da...@sonic.net wrote:
  We lose noticable performance when we raise fill-factor above 10. Even 20 is
  slower.

 Whoa.

Any interest in a fill-factor patch to place exactly one row per page? That
would be the least contended. There are applications where it might help.

  During busy times these hosts sometimes fall into a stable state
  with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
  PinBuffer plus very high system cpu in the scheduler (I don't have the perf
  trace in front of me so take this with a grain of salt). In this mode they
  fall from the normal 7000 queries per second to below 3000.
 
 I have seen signs of something similar to this when running pgbench -S
 tests at high concurrency.  I've never been able to track down where

I think I may have seen that with pgbench -S too. I did not have time to
investigate more, but out of a sequence of three minute runs I got most
runs at 300k+ qps and but a couple were around 200k qps.

 the problem is happening.  My belief is that once a spinlock starts to
 be contended, there's some kind of death spiral that can't be arrested
 until the workload eases up.  But I haven't had much luck identifying
 exactly which spinlock is the problem or if it even is just one...

I agree about the death spiral. I think what happens is all the backends
get synchcronized by waiting and they are more likely to contend again.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread David Gould

COPY IN loops in heap_multi_insert() extending the table until it fills the
disk when trying to insert a wide row into a table with a low fill-factor.
Internally fill-factor is implemented by reserving some space space on a
page. For large enough rows and small enough fill-factor bulk_multi_insert()
can't fit the row even on a new empty page, so it keeps allocating new pages
but is never able to place the row. It should always put at least one row on
an empty page.

In the excerpt below saveFreeSpace is the reserved space for the fill-factor.

while (ndone  ntuples)
{ ...
/*
 * Find buffer where at least the next tuple will fit.  If the page is
 * all-visible, this will also pin the requisite visibility map page.
 */
buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]-t_len,
...
/* Put as many tuples as fit on this page */
for (nthispage = 0; ndone + nthispage  ntuples; nthispage++)
{
HeapTuple   heaptup = heaptuples[ndone + nthispage];

if (PageGetHeapFreeSpace(page)  MAXALIGN(heaptup-t_len) + 
saveFreeSpace)
break;

RelationPutHeapTuple(relation, buffer, heaptup);
}
...Do a bunch of dirtying and logging etc ...
 }

This was introduced in 9.2 as part of the bulk insert speedup.

One more point, in the case where we don't insert any rows, we still do all the
dirtying and logging work even though we did not modify the page. I have tried
skip all this if no rows are added (nthispage == 0), but my access method foo
is sadly out of date, so someone should take a skeptical look at that.

A test case and patch against 9.2.2 is attached. It fixes the problem and passes
make check. Most of the diff is just indentation changes. Whoever tries this 
will
want to test this on a small partition by itself.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.
\set ECHO all
\timing on
drop table if exists big_sparse;
CREATE TABLE big_sparse (
u_id integer NOT NULL,
c_id integer DEFAULT 0 NOT NULL,
a_id integer DEFAULT 0 NOT NULL,
cr_id integer DEFAULT 0 NOT NULL,
b integer,
c_t_u text,
l_date abstime DEFAULT now(),
c_b double precision DEFAULT 0.0,
d_b double precision,
c_s text,
c_s_u abstime,
c_e_u abstime,
co_b_r_ts abstime,
e_a_s integer,
c_c_i_l text,
c_c_e_l text,
ag_c_i_l text,
ag_c_e_l text,
c_c_t_l text,
c_r_t_l text,
c_m_t_l text,
c_ci_t_l text,
ag_c_t_l text,
ag_r_t_l text,
ag_m_t_l text,
ag_ci_t_l text,
c_l_t_l text,
ag_l_t_l text,
r_s_id smallint DEFAULT 1 NOT NULL,
r_s text NOT NULL,
i_u_l text,
e_u_l text,
f_l text,
f smallint,
e_s_l_e text,
r_h_f_n text,
cr_h smallint,
cr_w smallint,
n_b integer,
cr_a_l text,
e_id bigint NOT NULL,
a_ex text,
ca_b_a real[],
su smallint DEFAULT 0 NOT NULL,
h_b_f real[],
p_id integer,
ca_b_aj real[],
ca_b_aj_i integer[]
)
WITH (fillfactor=10);

COPY big_sparse FROM stdin;
1688	700032834	700580073	704810483	25	foobar.grotxbarb/xdexyzzyen.xyzzyv=xxixsrc=201002RAA9	2011-11-12 05:00:01+00	0	500	i	2010-10-31 04:00:00+00	\N	2011-11-12 05:00:01+00	113	\N	\N	\N	\N	US	\N	\N	\N	\N	\N	\N	\N	en	\N	5	Rxyzzyn	\N	foobar./|foobar.xyzzyatxyzzy.xyzzyk/|foobar.grotxyzzyroupbarb/|grotxyzzxyzzybarb|foobar.grotxyxyzzybarb/|foobar.xyzzyxyzzyily.xyzzy/|grotxyzzyancebarb|grotxyzzydog|foobar.grotxyzzyxyzzysbarbdog/|foobar.grotxyzzyaobarb/|grotxyzzxyzzywsbarb|foobar.xyzzyun.xyzzy/|foobar.grotogejbarb/|foobar.grotxyzzxyzzybarb/|foobar.grotogcdbarbdog/|grotxyzzyetbarb|foobar.grotxyzzyipobarb/|foobar.grotxyzzyaobarb/|grotxyzzyxyzzyiabarb|grotxyzzybarbdog	xyzzyign	2	1:1291|0|0 ABC DEF 36|0|0 OR 1290|0|0 ABC DEF 36|0|0 OR 13|0|0 ABC DEF 36|0|0 OR 84|2592000|0 ABC DEF 36|0|0 OR 83|2592000|0 ABC DEF 36|0|0 OR 82|2592000|0 ABC DEF 36|0|0 OR 12|0|0 ABC DEF 36|0|0	\N	0	0	25	\N	0	\N	\N	0	{1,1,1,1,1,0.20003,0.40006,0.60024,0.80012,1,1,1,1,1,1,1!
 ,1,1,1,1,1,1,1,1}	\N	\N	\N
\.

*** postgresql-9.2.2/src/backend/access/heap/heapam.c	2012-12-03 12:16:10.0 -0800
--- postgresql-9.2.2dg/src/backend/access/heap/heapam.c	2012-12-12 01:55:58.174653706 -0800
***
*** 2158,2163 
--- 2158,2164 
  		Buffer		buffer;
  		Buffer		vmbuffer = InvalidBuffer;
  		bool		all_visible_cleared = false;
+ 		bool		page_is_empty;
  		int			nthispage;
  
  		/*
***
*** 2173,2299 
  		START_CRIT_SECTION();
  
  		/* Put as many tuples as fit on this page */
  		for (nthispage = 0; ndone + nthispage  ntuples; nthispage++)
  		{
  			HeapTuple	heaptup = heaptuples[ndone + nthispage];
  
! 			if (PageGetHeapFreeSpace(page)  MAXALIGN(heaptup-t_len) + saveFreeSpace)
  break;
! 
  			RelationPutHeapTuple(relation, buffer, heaptup

Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread David Gould
On Wed, 12 Dec 2012 12:27:11 +0100
Andres Freund and...@2ndquadrant.com wrote:

 On 2012-12-12 03:04:19 -0800, David Gould wrote:
 
  COPY IN loops in heap_multi_insert() extending the table until it fills the

 Heh. Nice one. Did you hit that in practice?

Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy
happens late in the initial setup script for new hosts. The first batch of
new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over
within a minute. Since the script setups up a lot of stuff we had no idea
at first who oomed.

 ISTM this would be fixed with a smaller footprint by just making
 
 if (PageGetHeapFreeSpace(page)   MAXALIGN(heaptup-t_len) + saveFreeSpace)
 
 if (!PageIsEmpty(page) 
 PageGetHeapFreeSpace(page)   MAXALIGN(heaptup-t_len) + saveFreeSpace)
 
 I think that should work?

I like PageIsEmpty() better (and would have used if I I knew), but I'm not
so crazy about the negation.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread David Gould
On Wed, 12 Dec 2012 13:56:08 +0200
Heikki Linnakangas hlinnakan...@vmware.com wrote:

 However, RelationGetBufferForTuple() won't return such 
 a page, it guarantees that the first tuple does indeed fit on the page 
 it returns. For the same reason, the later check that at least one tuple 
 was actually placed on the page is not necessary.
 
 I committed a slightly different version, which unconditionally puts the 
 first tuple on the page, and only applies the freespace check to the 
 subsequent tuples. Since RelationGetBufferForTuple() guarantees that the 
 first tuple fits, we can trust that, like heap_insert does.
 
 --- a/src/backend/access/heap/heapam.c
 +++ b/src/backend/access/heap/heapam.c
 @@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple 
 *tuples, int ntuples,
   /* NO EREPORT(ERROR) from here till changes are logged */
   START_CRIT_SECTION();
 
 - /* Put as many tuples as fit on this page */
 - for (nthispage = 0; ndone + nthispage  ntuples; nthispage++)
 + /*
 +  * ()  has ensured that the first tuple fits.
 +  * Put that on the page, and then as many other tuples as fit.
 +  */
 + RelationPutHeapTuple(relation, buffer, heaptuples[ndone]);
 + for (nthispage = 1; ndone + nthispage  ntuples; nthispage++)
   {
   HeapTuple   heaptup = heaptuples[ndone + nthispage];

I don't know if this is the same thing. At least in the comments I was
reading trying to figure this out there was some concern that someone
else could change the space on the page. Does RelationGetBufferForTuple()
guarantee against this too?

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors

2012-12-12 Thread David Gould
On Wed, 12 Dec 2012 14:23:12 +0200
Heikki Linnakangas hlinnakan...@vmware.com wrote:
 
 The bug's been fixed now, but note that huge tuples like this will 
 always cause the table to be extended. Even if there are completely 
 empty pages in the table, after a vacuum. Even a completely empty 
 existing page is not considered spacious enough in this case, because 
 it's still too small when you take fillfactor into account, so the 
 insertion will always extend the table. If you regularly run into this 
 situation, you might want to raise your fillfactor..

Actually, we'd like it lower. Ideally, one row per page.

We lose noticable performance when we raise fill-factor above 10. Even 20 is
slower.

During busy times these hosts sometimes fall into a stable state
with very high cpu use mostly in s_lock() and LWLockAcquire() and I think
PinBuffer plus very high system cpu in the scheduler (I don't have the perf
trace in front of me so take this with a grain of salt). In this mode they
fall from the normal 7000 queries per second to below 3000. Once in this
state they tend to stay that way. If we turn down the number of incoming
requests they go back to normal. Our conjecture is that most requests are
for only a few keys and so we have multiple sessions contending for few
pages and convoying in the buffer manager. The table is under 20k rows, but
the hot items are probably only a couple hundred different rows. The busy
processes are doing reads only, but there is some update activity on this
table too.

Ah, found an email with the significant part of the perf output:

 ... set number of client threads = number of postgres backends = 70. That way
 all my threads have constant access to a backend and they just spin in a tight
 loop running the same query over and over (with different values). ... this
 seems to have tapped into 9.2's resonant frequency, right now we're spending
 almost all our time spin locking.
...
762377.00 71.0% s_lock   
 /usr/local/bin/postgres
 22279.00  2.1% LWLockAcquire
 /usr/local/bin/postgres
 18916.00  1.8% LWLockRelease
 /usr/local/bin/postgres 

I was trying to resurrect the pthread s_lock() patch to see if that helps,
but it did not apply at all and I have not had time to persue it.

We have tried lots of number of processes and get the best result with
about ten less active postgresql backends than HT cores. System is 128GB
with:
 
processor   : 79
vendor_id   : GenuineIntel
cpu family  : 6
model   : 47
model name  : Intel(R) Xeon(R) CPU E7-L8867  @ 2.13GHz
stepping: 2
cpu MHz : 2128.478
cache size  : 30720 KB

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Strange errors from 9.2.1 and 9.2.2 (I hope I'm missing something obvious)

2012-12-11 Thread David Gould

I'm sure I've had a stroke or something in the middle of the night and just
didn't notice, but I'm able to reproduce the following on three different
hosts on both 9.2.1 and 9.2.2. As far as I know the only difference between
these queries is whitespace since I just up-arrowed them in psql and
deleted a space or lf. And as far as I can tell none of these errors are
correct.

Complete transcript, freshly started 9.2.2.

dg@jekyl:~$ psql
psql (9.2.2)
Type help for help.

dg=# CREATE TABLE t (
 i INTEGER,
 PRIMARY KEY (i)
);
ERROR:  type key does not exist
LINE 3:  PRIMARY KEY (i)
   ^
dg=# CREATE TABLE t (
 i INTEGER,
   PRIMARY KEY (i)
);
ERROR:  syntax error at or near PRIMARY
LINE 3:    PRIMARY KEY (i)
 ^
dg=# CREATE TABLE t (
 i INTEGER, PRIMARY KEY (i)
);
ERROR:  column i named in key does not exist
LINE 2:  i INTEGER, PRIMARY KEY (i)
 ^

Someone please set me straight, and tell me I've had a brain injury because
I am not comfortable with computers just fucking with me which is the other
explanation.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange errors from 9.2.1 and 9.2.2 (I hope I'm missing something obvious)

2012-12-11 Thread David Gould
On Tue, 11 Dec 2012 18:58:58 -0700
Josh Kupershmidt schmi...@gmail.com wrote:

 On Tue, Dec 11, 2012 at 6:01 PM, David Gould da...@sonic.net wrote:
 
  I'm sure I've had a stroke or something in the middle of the night and just
  didn't notice, but I'm able to reproduce the following on three different
  hosts on both 9.2.1 and 9.2.2. As far as I know the only difference between
  these queries is whitespace since I just up-arrowed them in psql and
  deleted a space or lf. And as far as I can tell none of these errors are
  correct.
 
  Complete transcript, freshly started 9.2.2.
 
  dg@jekyl:~$ psql
  psql (9.2.2)
  Type help for help.
 
  dg=# CREATE TABLE t (
   i INTEGER,
   PRIMARY KEY (i)
  );
  ERROR:  type key does not exist
  LINE 3:  PRIMARY KEY (i)
 
 Hrm, although I didn't see such characters in your above text, perhaps
 you have some odd Unicode characters in your input. For example, the
 attached superficially similar input file will generate the same error
 message for me. (The odd character in my input is U+2060, 'Word
 Joiner', encoded 0xE2 0x81 0xA0.)

Thank you. I got the example via cut and paste from email and pasted it
into psql on different hosts. od tells me it ends each line with:

  \n followed by 0xC2 0xA0 and then normal spaces. The C2A0 thing is
  apparently NO-BREAK SPACE. Invisible, silent, odorless but still deadly. 

Which will teach me not to accept text files from the sort of people who
write code in Word I guess.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] huge tlb support

2012-08-21 Thread David Gould
On Tue, 21 Aug 2012 18:06:38 +0200
Andres Freund and...@2ndquadrant.com wrote:

 On Tuesday, August 21, 2012 05:56:58 PM Robert Haas wrote:
  On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund
  and...@2ndquadrant.com 
 wrote:
   On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote:
   On Thu, Aug 16, 2012 at 10:53 PM, David Gould da...@sonic.net
   wrote:
A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we
have had horrible problems caused by transparent_hugepages
running postgres on largish systems (128GB to 512GB memory, 32
cores). The system sometimes goes 99% system time and is very
slow and unresponsive to the point of not successfully
completing new tcp connections. Turning off
transparent_hugepages fixes it.
   
   Yikes!  Any idea WHY that happens?
 Afair there were several bugs that could cause that in earlier version
 of the hugepage feature. The prominent was something around never
 really stopping to search for mergeable pages even though the
 probability was small or such.

This is what I think was going on. We did see a lot (99%) of time in some
routine in the VM (I forget exactly which), and my interpretation was
that it was trying to create hugepages from scattered fragments.

   I'm inclined to think this torpedos any idea we might have of
   enabling hugepages automatically whenever possible.  I think we
   should just add a GUC for this and call it good.  If the state of
   the world improves sufficiently in the future, we can adjust, but
   I think for right now we should just do this in the simplest way
   possible and move on.
   
   He is talking about transparent hugepages not hugepages afaics.
  
  Hmm.  I guess you're right.  But why would it be different?
 Because in this case explicit hugepage usage reduces the pain instead
 of increasing it. And we cannot do much against transparent hugepages
 being enabled by default.
 Unless I misremember how things work the problem is/was independent of 
 anonymous mmap or sysv shmem.

Explicit hugepages work because the pages can be created early before all
of memory is fragmented and you either succeed or fail. Transparent
hugepages uses a daemon that looks for processe that might benefit from
hugepages and tries to create hugepages on the fly. On a system that has
been up for a some time memory may be so fragmented that this is just a
waste of time.

Real as opposed to transparent hugepages would be a huge win for
applications that try to use high connection counts. Each backend
attached to the postgresql shared memory uses its own set of page table
entries at the rate of 2KB per MB of mapped shared memory. At 8GB of
shared buffers and 1000 connections this uses 16GB just for page tables.

-dg

-- 
David Gould  510 282 0869 da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] huge tlb support

2012-08-16 Thread David Gould
On Mon, 9 Jul 2012 12:30:23 +0200
Andres Freund and...@2ndquadrant.com wrote:

 On Monday, July 09, 2012 08:11:00 AM Tom Lane wrote:
  y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
   Also, I was under the impression that recent Linux kernels use
   hugepages automatically if they can, so I wonder exactly what
   Andres was testing on ...
   
   if you mean the trasparent hugepage feature, iirc it doesn't
   affect MAP_SHARED mappings like this.
  
  Oh!  That would explain some things.  It seems like a pretty nasty
  restriction though ... do you know why they did that?
 Looking a bit deeper they explicitly only work on private memory. The
 reason apparently being that its too hard to update the page table
 entries in multiple processes at once without introducing locking
 problems/scalability issues.
 
 To be sure one can check /proc/$pid_of_pg_proccess/smaps and look for
 the mapping to /dev/zero or the biggest mapping ;). Its not counted as
 Anonymous memory and it doesn't have transparent hugepages. I was
 confused before because there is quite some (400mb here) huge pages
 allocated for postgres during a pgbench run but thats just all the
 local memory...

A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had
horrible problems caused by transparent_hugepages running postgres on
largish systems (128GB to 512GB memory, 32 cores). The system sometimes
goes 99% system time and is very slow and unresponsive to the point of
not successfully completing new tcp connections. Turning off
transparent_hugepages fixes it. 

That said, explicit hugepage support for the buffer cache would be a big
win especially for high connection counts.

-dg


-- 
David Gould   da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers