date:20121121

Re: [HACKERS] logical changeset generation v3

2012-11-21 Thread Michael Paquier

On Wed, Nov 21, 2012 at 4:34 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2012-11-21 14:57:08 +0900, Michael Paquier wrote:

 Ah, I see. Could you try the following diff?

 diff --git a/src/backend/replication/logical/snapbuild.c
 b/src/backend/replication/logical/snapbuild.c
 index df24b33..797a126 100644
 --- a/src/backend/replication/logical/snapbuild.c
 +++ b/src/backend/replication/logical/snapbuild.c
 @@ -471,6 +471,7 @@ SnapBuildDecodeCallback(ReorderBuffer *reorder,
 Snapstate *snapstate,
  */
 snapstate-transactions_after = buf-origptr;

 +   snapstate-nrrunning = running-xcnt;
 snapstate-xmin_running = InvalidTransactionId;
 snapstate-xmax_running = InvalidTransactionId;

I am still getting the same assertion failure even with this diff included.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] FDW for PostgreSQL

2012-11-21 Thread Kohei KaiGai

2012/11/21 Shigeru Hanada shigeru.han...@gmail.com:
 Thank for the comment!

 On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 I also think the new use_remote_explain option is good. It works fine
 when we try to use this fdw over the network with latency more or less.
 It seems to me its default is false, thus, GetForeignRelSize will return
 the estimated cost according to ANALYZE, instead of remote EXPLAIN.
 Even though you mention the default behavior was not changed, is it
 an expected one? My preference is the current one, as is.


 Oops, I must have focused on only cost factors.
 I too think that using local statistics is better as default behavior,
 because foreign tables would be used for relatively stable tables.
 If target tables are updated often, it would cause problems about
 consistency, unless we provide full-fledged transaction mapping.


 The deparseFuncExpr() still has case handling whether it is explicit case,
 implicit cast or regular functions. If my previous proposition has no
 flaw,
 could you fix up it using regular function invocation manner? In case when
 remote node has incompatible implicit-cast definition, this logic can make
 a problem.


 Sorry, I overlooked this issue. Fixed to use function call notation
 for all of explicit function calls, explicit casts, and implicit casts.

 At InitPostgresFdwOptions(), the source comment says we don't use
 malloc() here for simplification of code. Hmm. I'm not sure why it is more
 simple. It seems to me we have no reason why to avoid malloc here, even
 though libpq options are acquired using malloc().


 I used simple because using palloc avoids null-check and error handling.
 However, many backend code use malloc to allocate memory which lives
 as long as backend process itself, so I fixed.


 Regarding to the regression test.

  [snip]

 I guess this test tries to check a case when remote column has
 incompatible
 data type with local side. Please check timestamp_out(). Its output
 format follows
 datestyle setting of GUC, and it affects OS configuration on initdb
 time.
 (Please grep datestyle at initdb.c !) I'd like to recommend to use
 another data type
 for this regression test to avoid false-positive detection.


 Good catch.  :)
 I fixed the test to use another data type, user defined enum.

One other thing I noticed.

At execute_query(), it stores the retrieved rows onto tuplestore of
festate-tuples at once. Doesn't it make problems when remote-
table has very big number of rows?

IIRC, the previous code used cursor feature to fetch a set of rows
to avoid over-consumption of local memory. Do we have some
restriction if we fetch a certain number of rows with FETCH?
It seems to me, we can fetch 1000 rows for example, and tentatively
store them onto the tuplestore within one PG_TRY() block (so, no
need to worry about PQclear() timing), then we can fetch remote
rows again when IterateForeignScan reached end of tuplestore.

Please point out anything if I missed something.

Anyway, I'll check this v4 patch simultaneously.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2012-11-21 Thread Boszormenyi Zoltan


2012-11-20 20:32 keltezéssel, Boszormenyi Zoltan írta:

2012-11-20 17:03 keltezéssel, Boszormenyi Zoltan írta:

2012-11-18 17:20 keltezéssel, Magnus Hagander írta:


Much of the tar stuff is very similar (I haven't looked to see if it's
identical) to the stuff in backend/replication/basebackup.c. Perhaps
we should have a src/port/tarutil.c?


I will implement it as a separate patch.


I implemented it, all 3 patches are attached now. Use this patchset,
the previously sent 1st patch had a bug, it called conninit_storeval()
with value == NULL arguments and it crashes on strdup(NULL).




escape_string() - already exists as escape_quotes() in initdb, AFAICT.
We should either move it to src/port/, or at least copy it so it's
exactly the same.


A copy of escape_quotes() is now in pg_basebackup.c and is used.

I will also unify the copies of it in a separate patch.


This one is not done yet, but a suggestion on which existing file
it can fit into or for a new src/port/filename is welcome.


I experimented with unifying escape_quotes() shortly.

The problem is that it calls pg_malloc() which is an executable-specific
function. Some of the bin/* executables define it as calling exit(1)
when malloc() returns NULL, some call it with exit(EXIT_FAILURE)
which happens to be 1 but still can be different from the constant 1.
Some executables only define pg_malloc0() but not pg_malloc().

pg_basebackup needs pg_malloc() to call disconnect_and_exit(1)
instead to quit cleanly and not leave an unexpected EOF from client
message in the server log. Which is a macro at the moment, but
has to be turned into a real function for the reasons below.

To unify escape_quotes(), pg_malloc() need to be also unified.
But the diverse requirements for pg_malloc() from different
executables means that both the escape_quotes() and the pg_malloc()
interface need to be extended with the exit function they have to call
and the argument to pass to the exit function. Unless we don't care
about the bug reports about unexpected EOF from client messages.

Frankly, it doesn't worth the effort. Let's just keep the two separate
copies of escape_quotes().

BTW, it seems to me that this unify even the least used functions
mentality was not considered to be a great feature during the
introduction of pg_malloc(), which is used in far more code than
the TAR functions.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-11-21 Thread Karl O. Pinc

Hi Josh,

On 11/20/2012 11:53:23 PM, Josh Kupershmidt wrote:
 Hi Karl,
 I signed on to review this patch for the current CF.

I noticed.  Thanks very much.


 On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc k...@meme.com wrote:
  On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

  First, the problem:
 
  Begin with the following structure:
 
  CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);
 
  CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;
 
  Then, by accident, somebody does:
 
  UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;
 
  So, you want to restore the data into schemaA.foo.
  But schemaA.foo has (bad) data in it that must first
  be removed. It would seem that using
 
pg_restore --clean -n schemaA -t foo my_pg_dump_backup
 
  would solve the problem, it would drop schemaA.foo,
  recreate it, and then restore the data.  But this does
  not work.  schemaA.foo does not drop because it's
  got a dependent database object, schemaB.bar.

 TBH, I didn't find the example above particularly compelling for
 demonstrating the need for this feature. If you've just got one table
 with dependent views which needs to be restored, it's pretty easy to
 manually TRUNCATE and have pg_restore --data-only reload the table.
 (And easy enough to combine the truncate and restore into a single
 transaction in case anything goes wrong, if need be.)

I was not clear in stating the problem.  (But see below
regarding foreign keys.)  The dependent view
was an example, but there can also be REFERENCES constraints and
trigger related constraints involving other tables in other schemas.
The easiest work-around I can think of here is destroying all the
triggers and constraints, either everything in the whole db
or doing some work to be more selective, truncating all the schema's
tables. doing a data-only restore of the
schema, and then pg_restore --data-only, and then re-creating
all the triggers and constraints.

 
 But I'm willing to grant that this proposed feature is potentially as
 useful as existing restore-jiggering options like --disable-triggers.
 And I guess I could see that if you're really stuck having to perform
 a --data-only restore of many tables, this feature could come in
 handy.

I think so.  See above.

 
  So, the idea here is to be able to do a data-only
  restore, first truncating the data in the tables
  being restored to remove the existing corrupted data.
 
  The proposal is to add a --truncate-tables option
  to pg_restore.
 
  

  (I tested pg_restore with 9.1 and when --data-only is
  used --clean is ignored, it does not even produce a warning.
  This is arguably a bug.)
 
 +1 for having pg_restore bail out with an error if the user specifies
 --data-only --clean. By the same token, --clean and --truncate-tables
 together should also raise a not allowed error.

OT:
After looking at the code I found a number of  conflicting
option combinations are not tested for or rejected.   I can't
recall what they are now.  The right way to fix these would be
to send in a separate patch for each conflict all attached
to one email/commitfest item?  Or one patch that just gets
adjusted until it's right?

 
  
 
  More serious objections were raised regarding semantics.
 
  What if, instead, the initial structure looked like:
 
  CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);
 
  CREATE TABLE schemaB.bar
(id INT CONSTRAINT bar_on_foo REFERENCES foo
   , moredata INT);
 
  With a case like this, in most real-world situations, you'd
  have to use pg_restore with --disable-triggers if you wanted
  to use --data-only and --truncate-tables.  The possibility of
  foreign key referential integrity corruption is obvious.
 
 Why would --disable-triggers be used in this example? I don't think
 you could use --truncate-tables to restore only table foo, because
 you would get:
 
   ERROR:  cannot truncate a table referenced in a foreign key
 constraint
 
 (At least, I think so, not having tried with the actual patch.) You
 could use --disable-triggers when restoring bar. 

I tried it and you're quite right.  (I thought I'd tried this
before, but clearly I did not -- proving the utility of the review
process.  :-)  My assumption was that since triggers
were turned off that constraints, being triggers, would be off as 
well and therefore tables with foreign keys could be truncated.
Obviously not, since the I get the above error.

I just tried it.  --disable-triggers does not disable constraints.

 
   Recovering some data and being left with referential
  integrity problems is better than having no data.
 
 Well, this is really a judgment call, and I have a hunch that many
 admins would actually choose none of the above. And I think this
 point gets to the crux of whether the --truncate-tables option will 
 be
 useful as-is.

Well, yes.  None of the above is best.  :)  In my case I had munged
the data in the one important schema and wanted to restore.  The 
setting is an academic one

Re: [HACKERS] [PATCH] binary heap implementation

2012-11-21 Thread Andres Freund

On 2012-11-20 22:55:52 -0500, Tom Lane wrote:
 Abhijit Menon-Sen a...@2ndquadrant.com writes:
  While I'm thinking about it, why are the fields of a binaryheap_node
  called key and value?  That implies a semantics not actually used
  here.  Maybe value1 and value2 instead?

  Yes, I discussed this with Andres earlier (and considered ptr and value
  or p1 and p2), but decided to wait for others to comment on the naming.
  I have no problem with value1 and value2, though I'd very slightly
  prefer d1 and d2 (d for Datum) now.

 d1/d2 would be fine with me.

 BTW, I probably missed some context upthread, but why do we have two
 fields at all?  At least for the purposes of nodeMergeAppend, it
 would make a lot more sense for the binaryheap elements to be just
 single Datums, without all the redundant pointers to the MergeAppend
 node.  The need to get at the comparison support info could be handled
 much more elegantly than that by providing a void * passthrough arg.
 That is, the heap creation function becomes

 binaryheap *binaryheap_allocate(size_t capacity,
 binaryheap_comparator compare,
 void *passthrough);

 and the comparator signature becomes

 typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough);

 For nodeMergeAppend, the Datums would be the slot indexes and the
 passthrough pointer could point to the MergeAppend node.

 This would make equal sense in a lot of other contexts, such as if you
 wanted a heap composed of tuples -- the info to compare tuples could be
 handled via the passthrough argument, rather than doubling the size of
 the heap in order to put that pointer into each heap element.  If you
 have no use for the passthrough arg in a particular usage, just pass
 NULL.

The passthrough argument definitely makes sense, I am all for adding it.

The reasons I originally wanted to have two values was that it made it
easy/possible for the comparison function to work without any pointer
dereferences which was somewhat beneficial to performance. Completely
without dereferences only worked on 64bit systems anyway though...
With just one value I need an extra struct, but thats not too bad.

So if you all prefer just having one value, I am ok with that. Who is
updating the patch?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fwd: ERROR: volatile EquivalenceClass has no sortref

2012-11-21 Thread Ranjeet Dhumal

Hi All ,

When am trying to query a table temp_table1(sms_type varchar(20),sms_info
varchar(100),sms_id integer)
Query :: select sms_type,count(*) from  temp_table1 group by 1 order by 2
desc;
Then i got following errors , i dont know whats wrong in this .
*ERROR: volatile EquivalenceClass has no sortref*


-- 
--Regards
  Ranjeet  R. Dhumal

Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2012-11-21 Thread Alvaro Herrera

Boszormenyi Zoltan wrote:

 The problem is that it calls pg_malloc() which is an executable-specific
 function. Some of the bin/* executables define it as calling exit(1)
 when malloc() returns NULL, some call it with exit(EXIT_FAILURE)
 which happens to be 1 but still can be different from the constant 1.
 Some executables only define pg_malloc0() but not pg_malloc().

It seems simpler to have the escape_quotes utility function in port just
not use pg_malloc at all, have it return NULL or similar failure
indicator when malloc() fails, and then the caller decides what to do.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2012-11-21 Thread Boszormenyi Zoltan


2012-11-21 14:19 keltezéssel, Alvaro Herrera írta:

Boszormenyi Zoltan wrote:


The problem is that it calls pg_malloc() which is an executable-specific
function. Some of the bin/* executables define it as calling exit(1)
when malloc() returns NULL, some call it with exit(EXIT_FAILURE)
which happens to be 1 but still can be different from the constant 1.
Some executables only define pg_malloc0() but not pg_malloc().

It seems simpler to have the escape_quotes utility function in port just
not use pg_malloc at all, have it return NULL or similar failure
indicator when malloc() fails, and then the caller decides what to do.


$ find . -name *.c | xargs grep pg_malloc | grep -v ^pg_malloc | wc -l
277

Too much work for little gain.

Also:
- pg_upgrade calls pg_log(PG_FATAL, ...)
- pgbench logs a line then calls exit(1)

What I can imagine is to introduce a new src/port/ function,
InitPostgresFrontend() or something like that. Every executable
then calls this function first inside their main() it they want to use
pg_malloc() from libpgport.a. This function would accept a pointer
to a structure, and the  struct contains the pointer to the exit
function and the argument too call it with. Other data for different
use cases can be added later if needed.

This way, the pg_malloc() and friends can be unified and their call
interfaces don't have to change. InitPostgresFrontend() needs
to be added to only 9 places instead of changing 277 callers of
pg_malloc() or pg_malloc0().

BTW, the unified pg_malloc() and friends must be inside
#ifdef FRONTEND ... #endif to not leak into the backend code.

Opinions?

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review: Reduce palloc's in numeric operations

2012-11-21 Thread Heikki Linnakangas


On 20.11.2012 21:44, Tom Lane wrote:

init_var_from_num's header comment could use some more work.  The
statement that one must not modify the returned var is false in some
sense, since for instance numeric_ceil() does that.  The key is that you
have to replace the digit buffer not modify it in-place, but most
computational routines do that anyway.  Also it might be worth pointing
out explicitly that free_var() is not required unless the var is
modified subsequently.  (There are instances of both cases, and it might
not be clear to the reader why.)


Added those points to the comment.


* get_str_from_var() no longer scribbles on its input. I noticed that
it's always called with a dscale that comes from the input var itself.
In other words, the rounding was unnecessary to begin with.


Hmm, it may have been necessary once upon a time, but I agree getting
rid of the rounding step is a win now.  Suggest adding a comment though
that the var is displayed to the number of digits indicated by its dscale.

Looks good otherwise.


Committed, thanks to everyone involved.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2012-11-21 Thread Alvaro Herrera

Boszormenyi Zoltan wrote:
 2012-11-21 14:19 keltezéssel, Alvaro Herrera írta:
 Boszormenyi Zoltan wrote:
 
 The problem is that it calls pg_malloc() which is an executable-specific
 function. Some of the bin/* executables define it as calling exit(1)
 when malloc() returns NULL, some call it with exit(EXIT_FAILURE)
 which happens to be 1 but still can be different from the constant 1.
 Some executables only define pg_malloc0() but not pg_malloc().
 It seems simpler to have the escape_quotes utility function in port just
 not use pg_malloc at all, have it return NULL or similar failure
 indicator when malloc() fails, and then the caller decides what to do.
 
 $ find . -name *.c | xargs grep pg_malloc | grep -v ^pg_malloc | wc -l
 277
 
 Too much work for little gain.

I probably wrote the above in a confusing way.  I am not suggesting that
pg_malloc is changed in any way.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2012-11-21 Thread Tom Lane

Boszormenyi Zoltan z...@cybertec.at writes:
 pg_basebackup needs pg_malloc() to call disconnect_and_exit(1)
 instead to quit cleanly and not leave an unexpected EOF from client
 message in the server log. Which is a macro at the moment, but
 has to be turned into a real function for the reasons below.

man 2 atexit

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2012-11-21 Thread Boszormenyi Zoltan


2012-11-21 15:29 keltezéssel, Tom Lane írta:

Boszormenyi Zoltan z...@cybertec.at writes:

pg_basebackup needs pg_malloc() to call disconnect_and_exit(1)
instead to quit cleanly and not leave an unexpected EOF from client
message in the server log. Which is a macro at the moment, but
has to be turned into a real function for the reasons below.

man 2 atexit


Aww, crap. I knew I forgot about something. :-)
Thanks.



regards, tom lane




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-11-21 Thread Robert Haas

On Mon, Nov 19, 2012 at 6:19 PM, Peter Eisentraut pete...@gmx.net wrote:
 On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote:
 But, with the attached patch:

 rhaas=# create function xyz(smallint) returns smallint as $$select
 $1$$ language sql;
 CREATE FUNCTION
 rhaas=# select xyz(5);
  xyz
 -
5
 (1 row)

 rhaas=# create table abc (a int);
 CREATE TABLE
 rhaas=# select lpad(a, 5, '0') from abc;
  lpad
 --
 (0 rows)

 I continue to be of the opinion that allowing this second case to work
 is not desirable.

1. Why?

2. What's your counter-proposal?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] binary heap implementation

2012-11-21 Thread Robert Haas

On Wed, Nov 21, 2012 at 6:08 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2012-11-20 22:55:52 -0500, Tom Lane wrote:
 Abhijit Menon-Sen a...@2ndquadrant.com writes:
  While I'm thinking about it, why are the fields of a binaryheap_node
  called key and value?  That implies a semantics not actually used
  here.  Maybe value1 and value2 instead?

  Yes, I discussed this with Andres earlier (and considered ptr and value
  or p1 and p2), but decided to wait for others to comment on the naming.
  I have no problem with value1 and value2, though I'd very slightly
  prefer d1 and d2 (d for Datum) now.

 d1/d2 would be fine with me.

 BTW, I probably missed some context upthread, but why do we have two
 fields at all?  At least for the purposes of nodeMergeAppend, it
 would make a lot more sense for the binaryheap elements to be just
 single Datums, without all the redundant pointers to the MergeAppend
 node.  The need to get at the comparison support info could be handled
 much more elegantly than that by providing a void * passthrough arg.
 That is, the heap creation function becomes

 binaryheap *binaryheap_allocate(size_t capacity,
 binaryheap_comparator compare,
 void *passthrough);

 and the comparator signature becomes

 typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough);

 For nodeMergeAppend, the Datums would be the slot indexes and the
 passthrough pointer could point to the MergeAppend node.

 This would make equal sense in a lot of other contexts, such as if you
 wanted a heap composed of tuples -- the info to compare tuples could be
 handled via the passthrough argument, rather than doubling the size of
 the heap in order to put that pointer into each heap element.  If you
 have no use for the passthrough arg in a particular usage, just pass
 NULL.

 The passthrough argument definitely makes sense, I am all for adding it.

 The reasons I originally wanted to have two values was that it made it
 easy/possible for the comparison function to work without any pointer
 dereferences which was somewhat beneficial to performance. Completely
 without dereferences only worked on 64bit systems anyway though...
 With just one value I need an extra struct, but thats not too bad.

 So if you all prefer just having one value, I am ok with that. Who is
 updating the patch?

I guess I'll take another whack at it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] C-function, don't load external dll file

2012-11-21 Thread Dimitri Fontaine

Hi,

Przemek Lisowski prze...@lisnet.info writes:
 HOW LOAD DLL AND USE MY EXTERNAL
 FUNCTION ? 

You need to declare it in SQL, maybe like this:

  create function public.transform(text) returns text
  as '$libdir/fservice', 'transform' language C;

See also the LOAD command and the CREATE EXTENSION documentation for how
to organise testing and shipping of your code.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-11-21 Thread Simon Riggs

On 29 August 2012 23:39, Tom Lane t...@sss.pgh.pa.us wrote:

 The main downside I can see is that code that used to work is likely
 to stop working as soon as someone creates a potential overloading
 situation.  Worse, the error message could be pretty confusing, since
 if you had been successfully calling f(smallint) with f(42), you'd get
 f(integer) does not exist, not something like f() is ambiguous,
 after adding f(float8) to the mix.  This seems related to the confusing
 changes in regression test cases that I got in my experiments yesterday.
 This may be sufficient reason to reject the idea, since the very last
 thing we need in this area is any degradation in the relevance of the
 error messages.

It would be useful if we issued a NOTICE when an ambiguity is
introduced, rather than when using it.

Like Bison's reporting of reduce conflicts.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows

2012-11-21 Thread Andrew Dunstan



Given we're calling to_lower() on a single byte in the code referred 
to, should we even be doing that when we have a multi-byte encoding 
and the high bit is set?






Nobody responded to this, but I'm rather inclined to say we should not.

Here's a simple patch to avoid this case.

Comments?

cheers

andrew


diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index b8e2f71..1d76ce3 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -132,8 +132,10 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 {
 	char	   *result;
 	int			i;
+	boolenc_is_single_byte;
 
 	result = palloc(len + 1);
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
 
 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
@@ -141,8 +143,8 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and
 	 * 'I').  Our current compromise is to use tolower() for characters with
-	 * the high bit set, and use an ASCII-only downcasing for 7-bit
-	 * characters.
+	 * the high bit set, as long as they aren't part of a multi-byte character, 
+	 * and use an ASCII-only downcasing for 7-bit characters.
 	 */
 	for (i = 0; i  len; i++)
 	{
@@ -150,7 +152,7 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 
 		if (ch = 'A'  ch = 'Z')
 			ch += 'a' - 'A';
-		else if (IS_HIGHBIT_SET(ch)  isupper(ch))
+		else if (enc_is_single_byte  IS_HIGHBIT_SET(ch)  isupper(ch))
 			ch = tolower(ch);
 		result[i] = (char) ch;
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-11-21 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 It would be useful if we issued a NOTICE when an ambiguity is
 introduced, rather than when using it.

I think that's pie in the sky, since whether there is an ambiguity
will depend not only on what set of functions exists, but what the
caller's search_path is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows

2012-11-21 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 Here's a simple patch to avoid this case.
 Comments?

I'm not sure that's the only place we're doing this ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows

2012-11-21 Thread Andrew Dunstan



On 11/21/2012 11:11 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

Here's a simple patch to avoid this case.
Comments?

I'm not sure that's the only place we're doing this ...




Oh, Hmm, darn. Where else do you think we might?

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PANIC: could not write to log file

2012-11-21 Thread Cyril VELTER


   After upgrading a pretty big database (serveral hundred gig) from 8.2 to 9.2 
I'm getting some PANIC:  could not write to log file messages. Actually I got 
two of them. One yesterday and one today.

   * postgres 9.2.1 is running on a windows 2003 server (downloaded the zip 
archive binaries)
   * Data is on an iscsi san (the same postgres 8.2 was running on perfectly 
since several years)
   * Full messages are :

PANIC:  could not write to log file 118, segment 74 at offset 12189696, length 
475136: Invalid argument
STATEMENT:  COMMIT

PANIC:  could not write to log file 117, segment 117 at offset 5660672, length 
4096000: Invalid argument
STATEMENT:  COMMIT

   * I have no other messages from the san nor from windows

   * Here are the non default parameters in postgresql.conf

listen_addresses = '*'  # what IP address(es) to listen on;
max_connections = 200 # (change requires restart)
shared_buffers = 320MB # min 128kB
work_mem = 50MB# min 64kB
maintenance_work_mem = 700MB  # min 1MB
checkpoint_segments = 15  # in logfile segments, min 1, 16MB each


   Any idea on what might be going on ?

   Cyril VELTER



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows

2012-11-21 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 11/21/2012 11:11 AM, Tom Lane wrote:
 I'm not sure that's the only place we're doing this ...

 Oh, Hmm, darn. Where else do you think we might?

Dunno, but grepping for isupper and/or tolower should find any such
places.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PANIC: could not write to log file

2012-11-21 Thread Jeff Janes

On Wed, Nov 21, 2012 at 8:51 AM, Cyril VELTER cyril.vel...@metadys.com wrote:

After upgrading a pretty big database (serveral hundred gig) from 8.2 to 
 9.2 I'm getting some PANIC:  could not write to log file messages. Actually 
 I got two of them. One yesterday and one today.

How was the upgrade done?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] binary heap implementation

2012-11-21 Thread Robert Haas

On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas robertmh...@gmail.com wrote:
 I guess I'll take another whack at it.

New version attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


binaryheap-rmh2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum stress-testing our system

2012-11-21 Thread Robert Haas

On Sun, Nov 18, 2012 at 5:49 PM, Tomas Vondra t...@fuzzy.cz wrote:
 The two main changes are these:

 (1) The stats file is split into a common db file, containing all the
 DB Entries, and per-database files with tables/functions. The common
 file is still called pgstat.stat, the per-db files have the
 database OID appended, so for example pgstat.stat.12345 etc.

 This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile
 functions, introducing two new functions:

pgstat_read_db_statsfile
pgstat_write_db_statsfile

 that do the trick of reading/writing stat file for one database.

 (2) The pgstat_read_statsfile has an additional parameter onlydbs that
 says that you don't need table/func stats - just the list of db
 entries. This is used for autovacuum launcher, which does not need
 to read the table/stats (if I'm reading the code in autovacuum.c
 correctly - it seems to be working as expected).

I'm not an expert on the stats system, but this seems like a promising
approach to me.

 (a) It does not solve the many-schema scenario at all - that'll need
 a completely new approach I guess :-(

We don't need to solve every problem in the first patch.  I've got no
problem kicking this one down the road.

 (b) It does not solve the writing part at all - the current code uses a
 single timestamp (last_statwrite) to decide if a new file needs to
 be written.

 That clearly is not enough for multiple files - there should be one
 timestamp for each database/file. I'm thinking about how to solve
 this and how to integrate it with pgstat_send_inquiry etc.

Presumably you need a last_statwrite for each file, in a hash table or
something, and requests need to specify which file is needed.

 And yet another one I'm thinking about is using a fixed-length
 array of timestamps (e.g. 256), indexed by mod(dboid,256). That
 would mean stats for all databases with the same mod(oid,256) would
 be written at the same time. Seems like an over-engineering though.

That seems like an unnecessary kludge.

 (c) I'm a bit worried about the number of files - right now there's one
 for each database and I'm thinking about splitting them by type
 (one for tables, one for functions) which might make it even faster
 for some apps with a lot of stored procedures etc.

 But is the large number of files actually a problem? After all,
 we're using one file per relation fork in the base directory, so
 this seems like a minor issue.

I don't see why one file per database would be a problem.  After all,
we already have on directory per database inside base/.  If the user
has so many databases that dirent lookups in a directory of that size
are a problem, they're already hosed, and this will probably still
work out to a net win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] binary heap implementation

2012-11-21 Thread Andres Freund

On 2012-11-21 12:54:30 -0500, Robert Haas wrote:
 On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas robertmh...@gmail.com wrote:
  I guess I'll take another whack at it.
 
 New version attached.

I think the assert in replace_first should be
Assert(!binaryheap_empty(heap)  heap-has_heap_property);
instead of
Assert(heap-has_heap_property);

looks good otherwise.

Thanks,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

2012-11-21 Thread Heikki Linnakangas


On 20.11.2012 15:33, Amit Kapila wrote:

Defect-2:
 1. start primary A
 2. start standby B following A
 3. start cascade standby C following B.
 4. Start another standby D following C.
 5. Execute the following commands in the primary A.
create table tbl(f int);
insert into tbl values(generate_series(1,1000));
 6. Promote standby B.
 7. Execute the following commands in the primary B.
insert into tbl values(generate_series(1001,2000));
insert into tbl values(generate_series(2001,3000));

 The following logs are observed on standby C:

 LOG:  restarted WAL streaming at position 0/700 on tli 2
 ERROR:  requested WAL segment 00020007 has already been
removed
 LOG:  record with zero length at 0/7028190
 LOG:  record with zero length at 0/7048540
 LOG:  out-of-sequence timeline ID 1 (after 2) in log segment
00020007, offset 0


Hmm, this one is actually a pre-existing bug. There's a sanity check 
that the sequence of timeline IDs that are seen in the XLOG page headers 
doesn't go backwards. In other words, if the last XLOG page that was 
read had timeline id X, the next page must have a tli = X. The startup 
process keeps track of the last seen timeline id in lastPageTLI. In 
standby_mode, when the startup process is reading from a pre-existing 
file in pg_xlog (typically put there by streaming replication) and it 
reaches the end of valid WAL (marked by an error in decoding it, ie. 
record with zero length in your case), it sleeps for five seconds and 
retries. At retry, the WAL file is re-opened, and as part of sanity 
checking it, the first page header in the file is validated.


Now, if there was a timeline change in the current WAL segment, and 
we've already replayed past that point, lastPageTLI will already be set 
to the new TLI, but the first page on the file contains the old TLI. 
When the file is re-opened, and the first page is validated, you get the 
error.


The fix is quite straightforward: we should refrain from checking the 
TLI when we re-open a WAL file. Or better yet, compare it against the 
TLI we saw at the beginning of the last WAL segment, not the last WAL page.


I propose the attached patch (against 9.2) to fix that. This should be 
backpatched to 9.0, where standby_mode was introduced. The code was the 
same in 8.4, too, but AFAICS there was no problem there because 8.4 
never tried to re-open the same WAL segment after replaying some of it.


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8614907..045d21d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -572,6 +572,7 @@ static uint32 readRecordBufSize = 0;
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
 static XLogRecPtr EndRecPtr;	/* end+1 of last record read */
 static TimeLineID lastPageTLI = 0;
+static TimeLineID lastSegmentTLI = 0;
 
 static XLogRecPtr minRecoveryPoint;		/* local copy of
 		 * ControlFile-minRecoveryPoint */
@@ -655,7 +656,7 @@ static void CleanupBackupHistory(void);
 static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
-static bool ValidXLOGHeader(XLogPageHeader hdr, int emode);
+static bool ValidXLOGHeader(XLogPageHeader hdr, int emode, bool segmentonly);
 static XLogRecord *ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt);
 static List *readTimeLineHistory(TimeLineID targetTLI);
 static bool existsTimeLineHistory(TimeLineID probeTLI);
@@ -3927,7 +3928,7 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
 		 * to go backwards (but we can't reset that variable right here, since
 		 * we might not change files at all).
 		 */
-		lastPageTLI = 0;		/* see comment in ValidXLOGHeader */
+		lastPageTLI = lastSegmentTLI = 0;	/* see comment in ValidXLOGHeader */
 		randAccess = true;		/* allow curFileTLI to go backwards too */
 	}
 
@@ -4190,7 +4191,7 @@ next_record_is_invalid:
  * ReadRecord.	It's not intended for use from anywhere else.
  */
 static bool
-ValidXLOGHeader(XLogPageHeader hdr, int emode)
+ValidXLOGHeader(XLogPageHeader hdr, int emode, bool segmentonly)
 {
 	XLogRecPtr	recaddr;
 
@@ -4285,18 +4286,31 @@ ValidXLOGHeader(XLogPageHeader hdr, int emode)
 	 * successive pages of a consistent WAL sequence.
 	 *
 	 * Of course this check should only be applied when advancing sequentially
-	 * across pages; therefore ReadRecord resets lastPageTLI to zero when
-	 * going to a random page.
+	 * across pages; therefore ReadRecord resets lastPageTLI and
+	 * lastSegmentTLI to zero when going to a random page.
+	 *
+	 * Sometimes we re-open a segment that's already been partially replayed.
+	 * In that case we cannot perform the normal TLI check: if there is a
+	 * timeline switch within the segment, the first

[HACKERS] PQconninfo function for libpq

2012-11-21 Thread Magnus Hagander

I'm breaking this out into it's own thread, for my own sanity if
nothing else :) And it's an isolated feature after all.

I still agree with the previous review at
http://archives.postgresql.org/message-id/1349321071.23971.0.ca...@vanquo.pezone.net
about keeping the data in more than one place.

Based on that, I've created a different version of this patch,
attached. This way we keep all the data in one struct.

At this point, the patch is untested beyond compiling and a trivial
psql check, because I ran out of time :) But I figured I'd throw it
out there for comments on which version people prefer. (And yes, it's
quite possible I've made a big think-mistake in it somewhere, but
again, better to get some eyes on it early)

My version also contains a fixed version of the docs that should be
moved back into Zoltans version if that's the one we end up
preferring.

Also, a question was buried in the other review which is - are we OK
to remove the requiressl parameter. Both these patches do so, because
the code becomes much simpler if we can do that. It has been
deprecated since 7.2. Is it OK to remove it, or do we need to put back
in the more complex code to deal with both?

Attached is both Zoltans latest patch (v16) and my slightly different approach.

Comments on which approach is best?

Test results from somebody who has the time to look at it? :)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


01-PQconninfo-v16.patch
Description: Binary data


PQconninfo-mha.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] binary heap implementation

2012-11-21 Thread Alvaro Herrera

Robert Haas escribió:
 On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas robertmh...@gmail.com wrote:
  I guess I'll take another whack at it.
 
 New version attached.

The following comments still talk about key and value, thus need
an update:

+ * binaryheap_add_unordered
+ *
+ * Adds the given key and value to the end of the heap's list of nodes

+/*
+ * binaryheap_add
+ *
+ * Adds the given key and value to the heap in O(log n), while

+/*
+ * binaryheap_replace_first
+ *
+ * Change the key and/or value of the first (root, topmost) node and


This comment needs updated (s/comparator/compare/, and also add
has_heap_property and arg):

+/*
+ * binaryheap
+ *
+ * sizehow many nodes are currently in nodes
+ * space   how many nodes can be stored in nodes
+ * comparator  comparator to define the heap property
+ * nodes   the first of a list of space nodes
+ */
+typedef struct binaryheap
+{
+   int size;
+   int space;
+   boolhas_heap_property;  /* debugging cross-check */
+   binaryheap_comparator compare;
+   void   *arg;
+   Datum   nodes[FLEXIBLE_ARRAY_MEMBER];
+}  binaryheap;


I would suggest to add prefixes to struct members, so bh_size, bh_space
and so on.  This makes it easier to grep for stuff later.

Do we really need the struct definition be public?  Couldn't it just
live in binaryheap.c?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[Re] Re: [HACKERS] PANIC: could not write to log file

2012-11-21 Thread Cyril VELTER

On 21/11/12 18:10:12, mailto:jeff.ja...@gmail.com wrote:
 On Wed, Nov 21, 2012 at 8:51 AM, Cyril VELTER cyril.vel...@metadys.com 
 wrote:
 
 After upgrading a pretty big database (serveral hundred gig) from 8.2 to 
  9.2 I'm getting some PANIC:  could not write to log file messages. 
  Actually I
  got two of them. One yesterday and one today.
 
 How was the upgrade done?

   The upgrade was done with the following steps :

   * create a new cluster using 9.2.1 initdb
   * run pg_dump | psql using the 9.2.1 binairies on a linux machines (both 
servers the old 8.2 and the new 9.2.1 running on windows machines).

   Thanks,

   cyril



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] User control over psql error stream

2012-11-21 Thread Peter Eisentraut

On 11/15/12 3:53 PM, Karl O. Pinc wrote:
 This patch gives the end user control over psql's
 error stream.  This allows a single psql session
 to use \o to send both errors and table output
 to multiple files.  Useful when capturing test output, etc.

What does this do that cannot in practice be achieved using shell
redirection operators?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add -Wlogical-op to standard compiler options?

2012-11-21 Thread Jeff Janes

On Thu, Nov 15, 2012 at 1:46 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 11/15/12 9:40 AM, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
 I think it might be worth adding -Wlogical-op to the standard warning
 options (for supported compilers, determined by configure test).

 Does that add any new warnings with the current source code, and if
 so what?

 none

Using gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4),  I get dozens of
warnings, all apparently coming from somewhere in the MemSet macro.

example:

pl_handler.c:301: warning: logical '' with non-zero constant will
always evaluate as true

Probably has something to do with:

/* \
 *  If MEMSET_LOOP_LIMIT == 0, optimizer
should find \
 *  the whole if false at compile time. \
 */ \
MEMSET_LOOP_LIMIT != 0) \

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] binary heap implementation

2012-11-21 Thread Robert Haas

On Wed, Nov 21, 2012 at 1:30 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 The following comments still talk about key and value, thus need
 an update:

Oops.

 This comment needs updated (s/comparator/compare/, and also add
 has_heap_property and arg):

Fixed.

 I would suggest to add prefixes to struct members, so bh_size, bh_space
 and so on.  This makes it easier to grep for stuff later.

OK.

 Do we really need the struct definition be public?  Couldn't it just
 live in binaryheap.c?

Well, that would preclude defining any macros, like
binaryheap_empty().  Since efficiency is among the reasons for
inventing this in the first place, that doesn't seem prudent to me.

Another new version attached.

P.S. to Abhijit: Please, please tell me the secret to getting five
people to review your patches.  I'm lucky when I can get one!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


binaryheap-rmh3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doc patch: Document names of automatically created constraints and indexes

2012-11-21 Thread Robert Haas

On Sat, Nov 17, 2012 at 1:22 AM, Peter Eisentraut pete...@gmx.net wrote:
 On Mon, 2012-11-12 at 11:42 -0600, Karl O. Pinc wrote:
 Could ALTER TABLE use an option to drop the
 primary key constraint?  I needed to do that,
 found it was not obvious, and this lead me to
 try to improve things.

 That could be useful, I think.  But it might open a can of worms.

Would the new option be syntactic sugar around ALTER TABLE ... DROP
CONSTRAINT put_the_name_of_the_primary_key_here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP json generation enhancements

2012-11-21 Thread Andrew Dunstan


Here is a WIP patch for enhancements to json generation.

First, there is the much_requested json_agg, which will aggregate rows 
directly to json. So the following will now work:


select json_agg(my_table) from mytable;
select json_agg(q) from (myquery here) q;

One open question regarding this feature is whether this should return 
NULL or '[]' for 0 rows. Currently it returns NULL but I could be 
convinced to return '[]', and the change would be very small.


Next is to_json(), which will turn any value into json, so we're no 
longer restricted to rows and arrays.


Non-builtin types are now searched for a cast to json, and if it exists 
it is used instead of the type's text representation. I didn't add a 
special type to look for a cast to, as was discussed before, as it 
seemed a bit funky and unnecessary. It can easily be added, but I'm 
still not convinced it's a good idea. Note that this is only done for 
types that aren't builtin - we know how to turn all of those into json 
without needing to look for a cast.


Along with this there is an hstore_to_json() function added to the 
hstore module, and a cast from hstore to json that uses it. This 
function treats every value in the hstore as a string. There is also a 
function with the working title of hstore_to_json_loose() that does a 
heuristic conversion that treats values of 't' and 'f' as booleans, and 
strings that look like numbers as numbers unless they start with a 
leading 0 followed by another digit (could be zip codes, phone numbers 
etc.) The difference between these is illustrated here (notice that 
quoted 't' becomes unquoted 'true' and quoted '1' becomes '1'):


   andrew=# select json_agg(q) from foo q;
json_agg
   -
 [{a:a,b:1,h:{c: t, d: null, q: 1, x: y}}]
   (1 row)

   andrew=# select json_agg(q) from (select a, b, hstore_to_json_loose(h) as h 
from foo) q;
json_agg
   
 [{a:a,b:1,h:{c: true, d: null, q: 1, x: y}}]
   (1 row)

Note: this patch will need a change in the oids used for the new functions if 
applied against git tip, as they have been overtaken by time.


Comments welcome.


cheers

andrew



*** a/contrib/hstore/hstore--1.1.sql
--- b/contrib/hstore/hstore--1.1.sql
***
*** 234,239  LANGUAGE C IMMUTABLE STRICT;
--- 234,252 
  CREATE CAST (text[] AS hstore)
WITH FUNCTION hstore(text[]);
  
+ CREATE FUNCTION hstore_to_json(hstore)
+ RETURNS json
+ AS 'MODULE_PATHNAME', 'hstore_to_json'
+ LANGUAGE C IMMUTABLE STRICT;
+ 
+ CREATE CAST (hstore AS json)
+   WITH FUNCTION hstore_to_json(hstore);
+ 
+ CREATE FUNCTION hstore_to_json_loose(hstore)
+ RETURNS json
+ AS 'MODULE_PATHNAME', 'hstore_to_json_loose'
+ LANGUAGE C IMMUTABLE STRICT;
+ 
  CREATE FUNCTION hstore(record)
  RETURNS hstore
  AS 'MODULE_PATHNAME', 'hstore_from_record'
*** a/contrib/hstore/hstore_io.c
--- b/contrib/hstore/hstore_io.c
***
*** 8,14 
--- 8,17 
  #include access/htup_details.h
  #include catalog/pg_type.h
  #include funcapi.h
+ #include lib/stringinfo.h
  #include libpq/pqformat.h
+ #include utils/builtins.h
+ #include utils/json.h
  #include utils/lsyscache.h
  #include utils/typcache.h
  
***
*** 1209,1211  hstore_send(PG_FUNCTION_ARGS)
--- 1212,1425 
  
  	PG_RETURN_BYTEA_P(pq_endtypsend(buf));
  }
+ 
+ 
+ /*
+  * hstore_to_json_loose
+  *
+  * This is a heuristic conversion to json which treats
+  * 't' and 'f' as booleans and strings that look like numbers as numbers,
+  * as long as they don't start with a leading zero followed by another digit
+  * (think zip codes or phone numbers starting with 0).
+  */
+ PG_FUNCTION_INFO_V1(hstore_to_json_loose);
+ Datum		hstore_to_json_loose(PG_FUNCTION_ARGS);
+ Datum
+ hstore_to_json_loose(PG_FUNCTION_ARGS)
+ {
+ 	HStore	   *in = PG_GETARG_HS(0);
+ 	int			buflen,
+ i;
+ 	int			count = HS_COUNT(in);
+ 	char	   *out,
+ 			   *ptr;
+ 	char	   *base = STRPTR(in);
+ 	HEntry	   *entries = ARRPTR(in);
+ 	boolis_number;
+ 	StringInfo  src, dst;
+ 
+ 	if (count == 0)
+ 	{
+ 		out = palloc(1);
+ 		*out = '\0';
+ 		PG_RETURN_TEXT_P(cstring_to_text(out));
+ 	}
+ 
+ 	buflen = 3;
+ 
+ 	/*
+ 	 * Formula adjusted slightly from the logic in hstore_out.
+ 	 * We have to take account of out treatment of booleans
+ 	 * to be a bit more pessimistic about the length of values.
+ 	 */
+ 
+ 	for (i = 0; i  count; i++)
+ 	{
+ 		/* include  and colon-space and comma-space */
+ 		buflen += 6 + 2 * HS_KEYLEN(entries, i);
+ 		/* include  only if nonnull */
+ 		buflen += 3 + (HS_VALISNULL(entries, i)
+ 	   ? 1
+ 	   :  2 * HS_VALLEN(entries, i));
+ 	}
+ 
+ 	out = ptr = palloc(buflen);
+ 
+ 	src = makeStringInfo();
+ 	dst = makeStringInfo();
+ 
+ 	*ptr++ = '{';
+ 
+ 	for (i = 0; i  count; i++)
+ 	{
+ 	resetStringInfo(src);
+

Re: [HACKERS] [PATCH] binary heap implementation

2012-11-21 Thread Andres Freund

On 2012-11-21 15:09:12 -0500, Robert Haas wrote:
 On Wed, Nov 21, 2012 at 1:30 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  The following comments still talk about key and value, thus need
  an update:

 Oops.

  This comment needs updated (s/comparator/compare/, and also add
  has_heap_property and arg):

 Fixed.

  I would suggest to add prefixes to struct members, so bh_size, bh_space
  and so on.  This makes it easier to grep for stuff later.

 OK.

  Do we really need the struct definition be public?  Couldn't it just
  live in binaryheap.c?

 Well, that would preclude defining any macros, like
 binaryheap_empty().  Since efficiency is among the reasons for
 inventing this in the first place, that doesn't seem prudent to me.

 Another new version attached.

One very minor nitpick I unfortunately just found now, not sure when
that changed:
binaryheap_replace_first() hardcodes the indices for the left/right node
of the root node. I would rather have it use (left|right)_offset(0).

 P.S. to Abhijit: Please, please tell me the secret to getting five
 people to review your patches.  I'm lucky when I can get one!

Heh. This was rather surprising, yes. Its a rather easy patch to review
in a way, you don't need much pg internals knowledge for it...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array exclusion constraint

2012-11-21 Thread Robert Haas

On Sat, Nov 17, 2012 at 1:05 PM, Philip Taylor philiptaylo...@yahoo.com wrote:
 CREATE TABLE foo (
x CHAR(32) PRIMARY KEY,
y CHAR(32) NOT NULL,
EXCLUDE USING gist ((ARRAY[x, y]) WITH )
 );

My first thought was you were going to have better luck with text
rather than char(n), but a little bit of experimentation suggests to
me that that doesn't work either.  It seems that GIN doesn't support
exclusion constraints and there's no gist opclass for text[],
varchar[], or anyarray.  Bummer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQconninfo function for libpq

2012-11-21 Thread Boszormenyi Zoltan


Hi,

2012-11-21 19:19 keltezéssel, Magnus Hagander írta:

I'm breaking this out into it's own thread, for my own sanity if
nothing else :) And it's an isolated feature after all.

I still agree with the previous review at
http://archives.postgresql.org/message-id/1349321071.23971.0.ca...@vanquo.pezone.net
about keeping the data in more than one place.


OK, it seems I completely missed that comment.
(Or forgot about it if I happened to answer it.)


Based on that, I've created a different version of this patch,
attached. This way we keep all the data in one struct.


I like this single structure but not the way you handle the
options' classes. In your patch, each option belongs to only
one class. These classes are:

PG_CONNINFO_REPLICATION = replication only
PG_CONNINFO_PASSWORD = password only
PG_CONNINFO_NORMAL = everything else

How does it help pg_basebackup to filter out e.g. dbname and replication?
These are added by the walreceiver module anyway and adding them
to the primary_conninfo line should even be discouraged by the documentation.

In my view, the classes should be inclusive:

PG_CONNINFO_NORMAL = Everything that's usable for a regular client
connection. This mean everything, maybe including password but
excluding replication.

PG_CONNINFO_PASSWORD = password only.

PG_CONNINFO_REPLICATION = Everything usable for a replication client
not added by walreceiver. Maybe including/excluding password.

Maybe there should be two flags for replication usage:

PG_CONNINFO_WALRECEIVER = everything except those not added
by walreceiver (and maybe password too)

PG_CONNINFO_REPLICATION = replication only

And every option can belong to more than one class, just as in my patch.



At this point, the patch is untested beyond compiling and a trivial
psql check, because I ran out of time :) But I figured I'd throw it
out there for comments on which version people prefer. (And yes, it's
quite possible I've made a big think-mistake in it somewhere, but
again, better to get some eyes on it early)

My version also contains a fixed version of the docs that should be
moved back into Zoltans version if that's the one we end up
preferring.


I also liked your version of the documentation better,
I am not too good at writing docs.


Also, a question was buried in the other review which is - are we OK
to remove the requiressl parameter. Both these patches do so, because
the code becomes much simpler if we can do that. It has been
deprecated since 7.2. Is it OK to remove it, or do we need to put back
in the more complex code to deal with both?

Attached is both Zoltans latest patch (v16) and my slightly different approach.

Comments on which approach is best?

Test results from somebody who has the time to look at it? :)

--
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/


Thanks for four work on this.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQconninfo function for libpq

2012-11-21 Thread Magnus Hagander

On Wed, Nov 21, 2012 at 10:01 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 Hi,

 2012-11-21 19:19 keltezéssel, Magnus Hagander írta:

 I'm breaking this out into it's own thread, for my own sanity if
 nothing else :) And it's an isolated feature after all.

 I still agree with the previous review at

 http://archives.postgresql.org/message-id/1349321071.23971.0.ca...@vanquo.pezone.net
 about keeping the data in more than one place.


 OK, it seems I completely missed that comment.
 (Or forgot about it if I happened to answer it.)


 Based on that, I've created a different version of this patch,
 attached. This way we keep all the data in one struct.


 I like this single structure but not the way you handle the
 options' classes. In your patch, each option belongs to only
 one class. These classes are:

 PG_CONNINFO_REPLICATION = replication only
 PG_CONNINFO_PASSWORD = password only
 PG_CONNINFO_NORMAL = everything else

 How does it help pg_basebackup to filter out e.g. dbname and replication?

PG_CONNINFO_ALL should give pg_basebackup all it needs, no? Or
actually, it shouldn't have the replication=1 part, right? So it
should just use PG_CONNINFO_NORMAL|PG_CONNINFO_PASSWORD?


 These are added by the walreceiver module anyway and adding them
 to the primary_conninfo line should even be discouraged by the
 documentation.

Hmm. I wasn't actually thinking about the dbname part here, I admit that.


 In my view, the classes should be inclusive:

 PG_CONNINFO_NORMAL = Everything that's usable for a regular client
 connection. This mean everything, maybe including password but
 excluding replication.

 PG_CONNINFO_PASSWORD = password only.

 PG_CONNINFO_REPLICATION = Everything usable for a replication client
 not added by walreceiver. Maybe including/excluding password.

 Maybe there should be two flags for replication usage:

 PG_CONNINFO_WALRECEIVER = everything except those not added
 by walreceiver (and maybe password too)

 PG_CONNINFO_REPLICATION = replication only

 And every option can belong to more than one class, just as in my patch.

Hmm. I kind of liked having each option in just one class, but I see
the problem. Looking at the ones you suggest, all the non-password
ones *have* to be without password, otherwise there i no way to get
the conninfo without password - which is the whole reason for that
parameter to exist. So the PASSWORD one has to be additional - which
means that not making the other ones additional makes them
inconsistent. But maybe we don't really have a choice there.



 At this point, the patch is untested beyond compiling and a trivial
 psql check, because I ran out of time :) But I figured I'd throw it
 out there for comments on which version people prefer. (And yes, it's
 quite possible I've made a big think-mistake in it somewhere, but
 again, better to get some eyes on it early)

 My version also contains a fixed version of the docs that should be
 moved back into Zoltans version if that's the one we end up
 preferring.


 I also liked your version of the documentation better,
 I am not too good at writing docs.

np.


 Also, a question was buried in the other review which is - are we OK
 to remove the requiressl parameter. Both these patches do so, because
 the code becomes much simpler if we can do that. It has been
 deprecated since 7.2. Is it OK to remove it, or do we need to put back
 in the more complex code to deal with both?

Just going to highlight that we're looking for at least one third
party to comment on this :)


 Attached is both Zoltans latest patch (v16) and my slightly different
 approach.

 Comments on which approach is best?

 Test results from somebody who has the time to look at it? :)

Do you happen to have a set of tests you've been running on your
patches? Can you try them again this one?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-11-21 Thread Greg Smith


On 11/16/12 9:03 AM, Merlin Moncure wrote:

Atri ran some quick n dirty tests to see if there
were any regressions.  He benched a large scan followed by vacuum.  So
far, results are inconclusive so better testing methodologies will
definitely be greatly appreciated.  One of the challenges with working
in this part of the code is that it's quite difficult to make changes
without impacting at least some workloads.


I'm adding something to pgbench-tools to test for some types of vacuum 
regressions that I've seen before.  And the checksum benchmarking I've 
already signed up for overlaps with this one quite a bit.  I'd suggest 
reviewers here focus on code quality, correctness, and targeted 
optimization testing.  I'm working heavily on a larger benchmarking 
framework that both this and checksums will fit into right now.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] StrategyGetBuffer questions

2012-11-21 Thread Merlin Moncure

On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure mmonc...@gmail.com wrote:
 In this sprawling thread on scaling issues [1], the topic meandered
 into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
 wondering:

 *) If there shouldn't be a a bound in terms of how many candidate
 buffers you're allowed to skip for having a non-zero usage count.
 Whenever an unpinned usage_count0 buffer is found, trycounter is
 reset (!) so that the code operates from point of view as it had just
 entered the loop.  There is an implicit assumption that this is rare,
 but how rare is it?

 How often is that the trycounter would hit zero if it were not reset?
 I've instrumented something like that in the past, and could only get
 it to fire under pathologically small shared_buffers and workloads
 that caused most of them to be pinned simultaneously.

well, it's basically impossible -- and that's what I find odd.

 *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds
 itself examining too many unpinned buffers per sweep?

 It is a self correcting problem.  If it is examining a lot of unpinned
 buffers, it is also decrementing a lot of them.

sure.  but it's entirely plausible that some backends are marking up
usage_count rapidly and not allocating buffers while others are doing
a lot of allocations.  point being: all it takes is one backend to get
scheduled out while holding the freelist lock to effectively freeze
the database for many operations.

it's been documented [1] that particular buffers can become spinlock
contention hot spots due to reference counting of the pins.   if a lot
of allocation is happening concurrently it's only a matter of time
before the clock sweep rolls around to one of them, hits the spinlock,
and (in the worst case) schedules out.  this could in turn shut down
the clock sweep for some time and non allocating backends might then
beat on established buffers and pumping up usage counts.

The reference counting problem might be alleviated in some fashion for
example via Robert's idea to disable reference counting under
contention [2].  Even if you do that. you're still in for a world of
hurt if you get scheduled out of a buffer allocation.   Your patch
fixes that AFAICT.  The buffer pin check is outside the wider lock,
making my suggestion to be less rigorous about usage_count a lot less
useful (but perhaps not completely useless!).

Another innovation might be to implement a 'trylock' variant of
LockBufHdr that does a TAS but doesn't spin -- if someone else has the
header locked, why bother waiting for it? just skip to the next and
move on.  ..

merlin


[1] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01557.php

[2] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01571.php


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-11-21 Thread Peter Eisentraut

On 11/21/12 9:42 AM, Robert Haas wrote:
 On Mon, Nov 19, 2012 at 6:19 PM, Peter Eisentraut pete...@gmx.net wrote:
 On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote:
 But, with the attached patch:

 rhaas=# create function xyz(smallint) returns smallint as $$select
 $1$$ language sql;
 CREATE FUNCTION
 rhaas=# select xyz(5);
  xyz
 -
5
 (1 row)

 rhaas=# create table abc (a int);
 CREATE TABLE
 rhaas=# select lpad(a, 5, '0') from abc;
  lpad
 --
 (0 rows)

 I continue to be of the opinion that allowing this second case to work
 is not desirable.
 
 1. Why?

Because a strongly-typed system should not cast numbers to strings
implicitly.  Does the equivalent of the lpad case work in any other
strongly-typed programming language?

 2. What's your counter-proposal?

Leave things as they are.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we need so many hint bits?

2012-11-21 Thread Jeff Davis

On Mon, 2012-11-19 at 14:46 -0500, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  As I said elsewhere in the thread, I'm not planning to introduce any
  additional locking. There is already precedent in IndexOnlyNext.
 
 Of course, that just begs the question of whether the code in
 IndexOnlyNext is correct.  And after looking at it, it seems pretty
 broken, or at least the argument for its correctness is broken.
 That comment seems to consider only the case where the target tuple has
 just been inserted --- but what of the case where the target tuple is in
 process of being deleted?  The deleter will not make a new index entry
 for that.  Of course, there's a pretty long code path from marking a
 tuple deleted to committing the deletion, and it seems safe to assume
 that the deleter will hit a write barrier in that path.  But I don't
 believe the argument here that the reader must hit a read barrier in
 between.

After digging in a little, this is bothering me now, too. I think it's
correct, and I agree with your analysis, but it seems a little complex
to explain why it works.

It also seems like a fortunate coincidence that we can detect clearing
the bit due to an insert, which we need to know about immediately; but
not detect a delete, which we don't care about until it commits.

I will try to update the comment along with my submission.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-11-21 Thread Jeff Janes

On Thu, Nov 15, 2012 at 11:16 AM, Robert Haas robertmh...@gmail.com wrote:

 So what's next here?  Do you want to work on these issue some more?
 Or does Jeff?

This has been rewritten enough that I no longer feel much ownership of it.

I'd prefer to leave it to Peter or Greg S., if they are willing to do it.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool

2012-11-21 Thread Jeff Janes

On Thu, Nov 15, 2012 at 9:13 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2012-11-15 09:06:23 -0800, Jeff Janes wrote:
 On Wed, Nov 14, 2012 at 5:17 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  ---
   src/bin/Makefile|   2 +-
   src/bin/xlogdump/Makefile   |  25 +++
   src/bin/xlogdump/xlogdump.c | 468 
  
   3 files changed, 494 insertions(+), 1 deletion(-)
   create mode 100644 src/bin/xlogdump/Makefile
   create mode 100644 src/bin/xlogdump/xlogdump.c

 Is this intended to be the successor of
 https://github.com/snaga/xlogdump which will then be deprecated?

 As-is this is just a development tool which was sorely needed for the
 development of this patchset. But yes I think that once ready
 (xlogreader infrastructure, *_desc routines splitted) it should
 definitely be able to do most of what the above xlogdump can do and it
 should live in bin/. I think mostly some filtering is missing.

 That doesn't really deprecate the above though.

 Does that answer your question?

Yes, I think so.  Thanks.

(I've just recently gotten the original xlogdump to work for me in
9.2, and I had been wonder if back-porting yours to 9.2 would have
been an easier way to go.)

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool

2012-11-21 Thread Andres Freund

On 2012-11-21 14:57:14 -0800, Jeff Janes wrote:
 On Thu, Nov 15, 2012 at 9:13 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2012-11-15 09:06:23 -0800, Jeff Janes wrote:
  On Wed, Nov 14, 2012 at 5:17 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
   ---
src/bin/Makefile|   2 +-
src/bin/xlogdump/Makefile   |  25 +++
src/bin/xlogdump/xlogdump.c | 468 
   
3 files changed, 494 insertions(+), 1 deletion(-)
create mode 100644 src/bin/xlogdump/Makefile
create mode 100644 src/bin/xlogdump/xlogdump.c
 
  Is this intended to be the successor of
  https://github.com/snaga/xlogdump which will then be deprecated?
 
  As-is this is just a development tool which was sorely needed for the
  development of this patchset. But yes I think that once ready
  (xlogreader infrastructure, *_desc routines splitted) it should
  definitely be able to do most of what the above xlogdump can do and it
  should live in bin/. I think mostly some filtering is missing.
 
  That doesn't really deprecate the above though.
 
  Does that answer your question?
 
 Yes, I think so.  Thanks.
 
 (I've just recently gotten the original xlogdump to work for me in
 9.2, and I had been wonder if back-porting yours to 9.2 would have
 been an easier way to go.)

I don't think you would have much fun doing so - the WAL format changes
between 9.2 and 9.3 make this larger than one might think. I had a
version that worked with the previous format but there have been some
interface changes since then...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto_explain WAS: RFC: Timing Events

2012-11-21 Thread Greg Smith


On 11/8/12 2:16 PM, Josh Berkus wrote:


Also, logging only the long-running queries is less useful than people
on this list seem to think.  When I'm doing real performance analysis, I
need to see *everything* which was run, not just the slow stuff.  Often
the real problem is a query which used to take 1.1ms, now takes 1.8ms,
and gets run 400 times/second. Looking just at the slow queries won't
tell you that.


No argument here.  I've tried to be clear that some of these high-speed 
but lossy things I'm hacking on are not going to be useful for everyone. 
 A solution that found most of these 1.8ms queries, but dropped some 
percentage under the highest peak load conditions, would still be very 
useful to me.


An anecdote on this topic seems relevant.  I have a troublesome 
production server that has moved log_min_duration_statement from 100ms 
to 500ms to 2000ms as the system grew.  Even the original setting wasn't 
short enough to catch everything we would like to watch *now*, but 
seeing sub-second data is a dream at this point.  The increases have 
been forced by logging contention becoming unmanagable when every client 
has to fight for the log to write to disk.  I can see the buggers stack 
up as waiting for locks if I try to log shorter statements, stalling 
enough that it drags the whole server down under peak load.


If I could just turn off logging just during those periods--basically, 
throwing them away only when some output rate throttled component hit 
its limit--I could still find them in the data collected the rest of the 
time.  There are some types of problems that also only occur under peak 
load that this idea would miss, but you'd still be likely to get *some* 
of them, statistically.


There's a really hard trade-off here:

-Sometimes you must save data on every query to capture fine details
-Making every query wait for disk I/O is impractical

The sort of ideas you threw out for making things like auto-explain 
logging per-session can work.  The main limitation there though is that 
it presumes you even know what area the problem is in the first place. 
I am not optimistic that covers a whole lot of ground either.


Upthread I talked about a background process that persists shared memory 
queues as a future consumer of timing events--one that might consume 
slow query data too.  That is effectively acting as the ideal component 
I described above, one that only loses things when it exceeds the 
system's write capacity for saving them.  I wouldn't want to try and 
retrofit the existing PostgreSQL logging facility for such a thing 
though.  Log parsing as the way to collect data is filled with headaches 
anyway.


I don't see any other good way to resolve this trade-off.  To help with 
the real-world situation you describe, ideally you dump all the query 
data somewhere, fast, and have the client move on.  You can't make 
queries wait for storage, something else (with a buffer!) needs to take 
over that job.


I can't find the URL right now, but at PG.EU someone was showing me a 
module that grabbed the new 9.2 logging hook and shipped the result to 
another server.  Clients burn a little CPU and network time and they 
move on, and magically disk I/O goes out of their concerns.  How much 
overhead persisting the data takes isn't the database server's job at 
all then.  That's the sort of flexibility people need to have with 
logging eventually.  Something so efficient that every client can afford 
to do it; it is capable of saving all events under ideal conditions; but 
under adverse ones, you have to keep going and accept the loss.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v3

2012-11-21 Thread Andres Freund

On 2012-11-21 18:35:34 +0900, Michael Paquier wrote:
 On Wed, Nov 21, 2012 at 4:34 PM, Andres Freund and...@2ndquadrant.comwrote:
 
  On 2012-11-21 14:57:08 +0900, Michael Paquier wrote:
 
  Ah, I see. Could you try the following diff?
 
  diff --git a/src/backend/replication/logical/snapbuild.c
  b/src/backend/replication/logical/snapbuild.c
  index df24b33..797a126 100644
  --- a/src/backend/replication/logical/snapbuild.c
  +++ b/src/backend/replication/logical/snapbuild.c
  @@ -471,6 +471,7 @@ SnapBuildDecodeCallback(ReorderBuffer *reorder,
  Snapstate *snapstate,
   */
  snapstate-transactions_after = buf-origptr;
 
  +   snapstate-nrrunning = running-xcnt;
  snapstate-xmin_running = InvalidTransactionId;
  snapstate-xmax_running = InvalidTransactionId;
 
 I am still getting the same assertion failure even with this diff included.

I really don't understand whats going on here then. Youve said you made
sure that there is a catalog snapshot. Which means you would need
something like:
WARNING:  connecting to postgres
WARNING:  Initiating logical rep
LOG:  computed new xmin: 16566894
LOG:  start reading from 3/E62457C0, scrolled back to 3/E6244000
LOG:  found initial snapshot (via running xacts). Done: 1
WARNING:  reached consistent point, stopping!
WARNING:  Starting logical replication
LOG:  start reading from 3/E62457C0, scrolled back to 3/E6244000
LOG:  found initial snapshot (via running xacts). Done: 1

in the log *and* it means that snapbuild-state has to be
CONSISTENT. But the backtrace youve posted:

#3  0x0070c409 in SnapBuildTxnIsRunning (snapstate=0x19e4f10,xid=0) at 
snapbuild.c:877
#4  0x0070b8e4 in SnapBuildProcessChange 
(reorder=0x19e4e80,snapstate=0x19e4f10, xid=0, buf=0x1a0a368, 
relfilenode=0x1a0a450) at snapbuild.c:388
#5  0x0070c088 in SnapBuildDecodeCallback 
(reorder=0x19e4e80,snapstate=0x19e4f10, buf=0x1a0a368) at snapbuild.c:732

shows pretty clearly that snapstate *can't* be consistent because line 387ff is:
  else if (snapstate-state  SNAPBUILD_CONSISTENT 
   SnapBuildTxnIsRunning(snapstate, xid))
;
so #3 #4 can't happen at those line numbers with state == CONSISTENT.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto_explain WAS: RFC: Timing Events

2012-11-21 Thread Gavin Flower


On 22/11/12 12:15, Greg Smith wrote:

On 11/8/12 2:16 PM, Josh Berkus wrote:


Also, logging only the long-running queries is less useful than people
on this list seem to think.  When I'm doing real performance analysis, I
need to see *everything* which was run, not just the slow stuff.  Often
the real problem is a query which used to take 1.1ms, now takes 1.8ms,
and gets run 400 times/second. Looking just at the slow queries won't
tell you that.


No argument here.  I've tried to be clear that some of these 
high-speed but lossy things I'm hacking on are not going to be useful 
for everyone.  A solution that found most of these 1.8ms queries, but 
dropped some percentage under the highest peak load conditions, would 
still be very useful to me.


An anecdote on this topic seems relevant.  I have a troublesome 
production server that has moved log_min_duration_statement from 100ms 
to 500ms to 2000ms as the system grew.  Even the original setting 
wasn't short enough to catch everything we would like to watch *now*, 
but seeing sub-second data is a dream at this point. The increases 
have been forced by logging contention becoming unmanagable when every 
client has to fight for the log to write to disk.  I can see the 
buggers stack up as waiting for locks if I try to log shorter 
statements, stalling enough that it drags the whole server down under 
peak load.


If I could just turn off logging just during those periods--basically, 
throwing them away only when some output rate throttled component hit 
its limit--I could still find them in the data collected the rest of 
the time.  There are some types of problems that also only occur under 
peak load that this idea would miss, but you'd still be likely to get 
*some* of them, statistically.


There's a really hard trade-off here:

-Sometimes you must save data on every query to capture fine details
-Making every query wait for disk I/O is impractical

The sort of ideas you threw out for making things like auto-explain 
logging per-session can work.  The main limitation there though is 
that it presumes you even know what area the problem is in the first 
place. I am not optimistic that covers a whole lot of ground either.


Upthread I talked about a background process that persists shared 
memory queues as a future consumer of timing events--one that might 
consume slow query data too.  That is effectively acting as the ideal 
component I described above, one that only loses things when it 
exceeds the system's write capacity for saving them.  I wouldn't want 
to try and retrofit the existing PostgreSQL logging facility for such 
a thing though.  Log parsing as the way to collect data is filled with 
headaches anyway.


I don't see any other good way to resolve this trade-off.  To help 
with the real-world situation you describe, ideally you dump all the 
query data somewhere, fast, and have the client move on.  You can't 
make queries wait for storage, something else (with a buffer!) needs 
to take over that job.


I can't find the URL right now, but at PG.EU someone was showing me a 
module that grabbed the new 9.2 logging hook and shipped the result to 
another server.  Clients burn a little CPU and network time and they 
move on, and magically disk I/O goes out of their concerns.  How much 
overhead persisting the data takes isn't the database server's job at 
all then.  That's the sort of flexibility people need to have with 
logging eventually.  Something so efficient that every client can 
afford to do it; it is capable of saving all events under ideal 
conditions; but under adverse ones, you have to keep going and accept 
the loss.


Would it be useful to be able to specify a fixed sampling rate, say 1 in 
100.  That way you could get a well defined statistical sample, yet not 
cause excessive I/O?



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v3

2012-11-21 Thread Michael Paquier

On Thu, Nov 22, 2012 at 8:25 AM, Andres Freund and...@2ndquadrant.comwrote:

 I really don't understand whats going on here then. Youve said you made
 sure that there is a catalog snapshot. Which means you would need
 something like:
 WARNING:  connecting to postgres
 WARNING:  Initiating logical rep
 LOG:  computed new xmin: 16566894
 LOG:  start reading from 3/E62457C0, scrolled back to 3/E6244000
 LOG:  found initial snapshot (via running xacts). Done: 1
 WARNING:  reached consistent point, stopping!
 WARNING:  Starting logical replication
 LOG:  start reading from 3/E62457C0, scrolled back to 3/E6244000
 LOG:  found initial snapshot (via running xacts). Done: 1

 in the log *and* it means that snapbuild-state has to be
 CONSISTENT. But the backtrace youve posted:

 #3  0x0070c409 in SnapBuildTxnIsRunning
 (snapstate=0x19e4f10,xid=0) at snapbuild.c:877
 #4  0x0070b8e4 in SnapBuildProcessChange
 (reorder=0x19e4e80,snapstate=0x19e4f10, xid=0, buf=0x1a0a368,
 relfilenode=0x1a0a450) at snapbuild.c:388
 #5  0x0070c088 in SnapBuildDecodeCallback
 (reorder=0x19e4e80,snapstate=0x19e4f10, buf=0x1a0a368) at snapbuild.c:732

 shows pretty clearly that snapstate *can't* be consistent because line
 387ff is:
   else if (snapstate-state  SNAPBUILD_CONSISTENT 
SnapBuildTxnIsRunning(snapstate, xid))
 ;
 so #3 #4 can't happen at those line numbers with state == CONSISTENT.

Still this *impossible* thing happens.
Here are some more information on the logs I get on server side:

Yes I have the logical replication correctly initialized:
[629 0] LOG:  database system was shut down at 2012-11-22 09:02:42 JST
[628 0] LOG:  database system is ready to accept connections
[633 0] LOG:  autovacuum launcher started
[648 0] WARNING:  connecting to postgres
[648 0] WARNING:  Initiating logical rep
[648 0] LOG:  computed new xmin: 684
[648 0] LOG:  start reading from 0/178C1B8, scrolled back to 0/178C000

And I am also getting logs of this type with pg_receivellog:
BEGIN 698
table pgbench_accounts: UPDATE: aid[int4]:759559 bid[int4]:8
abalance[int4]:-3641 filler[bpchar]:
table pgbench_tellers: UPDATE: tid[int4]:93 bid[int4]:10
tbalance[int4]:-3641 filler[bpchar]:(null)
table pgbench_branches: UPDATE: bid[int4]:10 bbalance[int4]:-3641
filler[bpchar]:(null)
table pgbench_history: INSERT: tid[int4]:93 bid[int4]:10 aid[int4]:759559
delta[int4]:-3641 mtime[timestamp]:2012-11-22 09:05:34.535651
filler[bpchar]:(null)
COMMIT 698

Until the assertion failure:
TRAP: FailedAssertion(!(((xid) = ((TransactionId) 3)) 
((snapstate-xmin_running) = ((TransactionId) 3))), File: snapbuild.c,
Line: 878)
I still have the core file and its binary at hand if you want, so can send
them at will.
I have not been able to read your code yet, but there should be something
you are missing.

Thanks,
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] logical changeset generation v3

2012-11-21 Thread Andres Freund

On 2012-11-22 09:13:30 +0900, Michael Paquier wrote:
 On Thu, Nov 22, 2012 at 8:25 AM, Andres Freund and...@2ndquadrant.comwrote:

  I really don't understand whats going on here then. Youve said you made
  sure that there is a catalog snapshot. Which means you would need
  something like:
  WARNING:  connecting to postgres
  WARNING:  Initiating logical rep
  LOG:  computed new xmin: 16566894
  LOG:  start reading from 3/E62457C0, scrolled back to 3/E6244000
  LOG:  found initial snapshot (via running xacts). Done: 1
  WARNING:  reached consistent point, stopping!
  WARNING:  Starting logical replication
  LOG:  start reading from 3/E62457C0, scrolled back to 3/E6244000
  LOG:  found initial snapshot (via running xacts). Done: 1
 
  in the log *and* it means that snapbuild-state has to be
  CONSISTENT. But the backtrace youve posted:
 
  #3  0x0070c409 in SnapBuildTxnIsRunning
  (snapstate=0x19e4f10,xid=0) at snapbuild.c:877
  #4  0x0070b8e4 in SnapBuildProcessChange
  (reorder=0x19e4e80,snapstate=0x19e4f10, xid=0, buf=0x1a0a368,
  relfilenode=0x1a0a450) at snapbuild.c:388
  #5  0x0070c088 in SnapBuildDecodeCallback
  (reorder=0x19e4e80,snapstate=0x19e4f10, buf=0x1a0a368) at snapbuild.c:732
 
  shows pretty clearly that snapstate *can't* be consistent because line
  387ff is:
else if (snapstate-state  SNAPBUILD_CONSISTENT 
 SnapBuildTxnIsRunning(snapstate, xid))
  ;
  so #3 #4 can't happen at those line numbers with state == CONSISTENT.
 
 Still this *impossible* thing happens.
 Here are some more information on the logs I get on server side:

 Yes I have the logical replication correctly initialized:
 [629 0] LOG:  database system was shut down at 2012-11-22 09:02:42 JST
 [628 0] LOG:  database system is ready to accept connections
 [633 0] LOG:  autovacuum launcher started
 [648 0] WARNING:  connecting to postgres
 [648 0] WARNING:  Initiating logical rep
 [648 0] LOG:  computed new xmin: 684
 [648 0] LOG:  start reading from 0/178C1B8, scrolled back to 0/178C000

Ok, so youve not yet reached a consistent point.

Which means this shouldn't yet be written out:

 And I am also getting logs of this type with pg_receivellog:
 BEGIN 698
 table pgbench_accounts: UPDATE: aid[int4]:759559 bid[int4]:8
 abalance[int4]:-3641 filler[bpchar]:
 table pgbench_tellers: UPDATE: tid[int4]:93 bid[int4]:10
 tbalance[int4]:-3641 filler[bpchar]:(null)
 table pgbench_branches: UPDATE: bid[int4]:10 bbalance[int4]:-3641
 filler[bpchar]:(null)
 table pgbench_history: INSERT: tid[int4]:93 bid[int4]:10 aid[int4]:759559
 delta[int4]:-3641 mtime[timestamp]:2012-11-22 09:05:34.535651
 filler[bpchar]:(null)
 COMMIT 698

that could already be good enough of a hint, let me check tomorrow.

 I still have the core file and its binary at hand if you want, so can send
 them at will.

If those aren't too big, its worth a try...

Greetings,

Andres

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Removing PD_ALL_VISIBLE

2012-11-21 Thread Jeff Davis

Follow up to discussion:
http://archives.postgresql.org/pgsql-hackers/2012-11/msg00817.php

I worked out a patch that replaces PD_ALL_VISIBLE with calls to
visibilitymap_test. It rips out a lot of complexity, with a net drop of
about 300 lines (not a lot, but some of that code is pretty complex).

The patch is quite rough, and I haven't yet added the code to keep the
VM buffer pins around longer, but I still don't see any major problems.
I'm fairly certain that scans will not be adversely affected, and I
think that inserts/updates/deletes should be fine as well.

The main worry I have with inserts/updates/deletes is that there will be
less spatial locality for allocating new buffers or for modifying
existing buffers. So keeping a pinned VM buffer might not help as much,
because it might need to pin a different one, anyway.

Adding to January commitfest, as it's past 11/15.

Regards,
Jeff Davis


rm-pd-all-visible-20121121.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] binary heap implementation

2012-11-21 Thread Abhijit Menon-Sen

At 2012-11-21 15:09:12 -0500, robertmh...@gmail.com wrote:

  The following comments still talk about key and value, thus need
  an update:
 
 Oops.

In the same vein, Returns NULL if the heap is empty no longer applies
to binaryheap_first and binaryheap_remove_first. Plus there is an extra
asterisk in the middle of the comment about binaryheap_replace_first.

But it looks good otherwise.

I agree with Tom that we should support resizing the heap, but let's not
bother with that now. I'll write the few extra lines of code the first
time something actually needs to add more elements to a heap.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] StrategyGetBuffer questions

2012-11-21 Thread Amit Kapila

On Thursday, November 22, 2012 3:26 AM Merlin Moncure wrote:
 On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes jeff.ja...@gmail.com
 wrote:
  On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure mmonc...@gmail.com
 wrote:
  In this sprawling thread on scaling issues [1], the topic meandered
  into StrategyGetBuffer() -- in particular the clock sweep loop.  I'm
  wondering:
 
  *) If there shouldn't be a a bound in terms of how many candidate
  buffers you're allowed to skip for having a non-zero usage count.
  Whenever an unpinned usage_count0 buffer is found, trycounter is
  reset (!) so that the code operates from point of view as it had just
  entered the loop.  There is an implicit assumption that this is rare,
  but how rare is it?
 
  How often is that the trycounter would hit zero if it were not reset?
  I've instrumented something like that in the past, and could only get
  it to fire under pathologically small shared_buffers and workloads
  that caused most of them to be pinned simultaneously.
 
 well, it's basically impossible -- and that's what I find odd.
 
  *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds
  itself examining too many unpinned buffers per sweep?
 
  It is a self correcting problem.  If it is examining a lot of unpinned
  buffers, it is also decrementing a lot of them.
 
 sure.  but it's entirely plausible that some backends are marking up
 usage_count rapidly and not allocating buffers while others are doing
 a lot of allocations.  point being: all it takes is one backend to get
 scheduled out while holding the freelist lock to effectively freeze
 the database for many operations.

  True, even this is observed by me, the case I have tried for this was that
when 50% of buffers are always used for some 
  hot table(table in high demand) and rest 50% of buffers are used for
normal ops. In such cases contention can be observed 
  for BufFreelistLock. The same can be seen in the Report I have attached in
below mail.
  http://archives.postgresql.org/pgsql-hackers/2012-11/msg01147.php

 
 it's been documented [1] that particular buffers can become spinlock
 contention hot spots due to reference counting of the pins.   if a lot
 of allocation is happening concurrently it's only a matter of time
 before the clock sweep rolls around to one of them, hits the spinlock,
 and (in the worst case) schedules out.  this could in turn shut down
 the clock sweep for some time and non allocating backends might then
 beat on established buffers and pumping up usage counts.
 
 The reference counting problem might be alleviated in some fashion for
 example via Robert's idea to disable reference counting under
 contention [2].  Even if you do that. you're still in for a world of
 hurt if you get scheduled out of a buffer allocation.   Your patch
 fixes that AFAICT.  The buffer pin check is outside the wider lock,
 making my suggestion to be less rigorous about usage_count a lot less
 useful (but perhaps not completely useless!).
 
 Another innovation might be to implement a 'trylock' variant of
 LockBufHdr that does a TAS but doesn't spin -- if someone else has the
 header locked, why bother waiting for it? just skip to the next and
 move on.  ..

I think this is reasonable idea. 
How about having Hot and Cold end in the buffer list, so that if some of the
buffers are heavily
used, then other backends might not need to pay penality for traversing Hot
Buffers.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] User control over psql error stream

2012-11-21 Thread Karl O. Pinc

On 11/21/2012 01:41:56 PM, Peter Eisentraut wrote:
 On 11/15/12 3:53 PM, Karl O. Pinc wrote:
  This patch gives the end user control over psql's
  error stream.  This allows a single psql session
  to use \o to send both errors and table output
  to multiple files.  Useful when capturing test output, etc.
 
 What does this do that cannot in practice be achieved using shell
 redirection operators?

I've 3 answers to this.  The short one, the long one,
and the practical one.  Sorry to go on about it.

The brief generic answer is:

Easily capture error output associated with disparate
input streams, interspersed with the data output associated with
the disparate input streams, in separate files
correlated with the input, in the case where transactions
span multiple input streams.   That latter
bit about transactions being the important part.

Of course I could be wrong/not clever enough.


The long generic answer is:

Shell redirection works fine if the transactions do not span 
input streams; a separate connection/psql instance can
be used to redirect stderr intermixed with stdout as desired.  
However when transactions span input streams then a single
psql instance/db connection is required to maintain
transaction state.  In such a case it is difficult
to re-redirect stderr so that it's output appears
intermixed with table data output (stdout) in
time wise order while at the same time redirecting
all output into files that correlate with the
various input streams.  

It's doable with something like:

  psql 21  myfifo | mydemultiplexer 

where myfifo is a fifo fed the various input streams
and mydemultiplexer is a some process that
splits stdin into various output files as
various bits of input are sent to psql.  But there
must be a way to communicate with the mydemultiplexer,
and you need to worry about synchronization of
input with output.  And it's not clear to me how to
write mydemultiplexer in shell either.

Note that I've not actually tried the example above.


The practical answer is that I want my dependency
based test framework to work:

http://gombemi.ccas.gwu.edu/cgi-bin/darcsweb.cgi?r=gombemi;a=tree;f=/
db/test

For an intro see:

http://gombemi.ccas.gwu.edu/cgi-bin/darcsweb.cgi?
r=gombemi;a=headblob;f=/db/test/test.mk

I'm using this patch in my test framework.
If I had to I think I could get away
without it, but it would involve a bit of
work, and more than a bit of additional
complication.

Although I've got (much) further work to do 
if I'm to follow through on my notion to
release this as a generic test framework,
what's documented in test.mk works.  Probably.
(Not bad for under 300 lines of actual code,
comments excluded.  Perhaps under 200 if you
eliminate blank lines and configuration for
my real-world use.)

Comments/suggestions are welcome.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doc patch: Document names of automatically created constraints and indexes

2012-11-21 Thread Karl O. Pinc

On 11/21/2012 02:12:26 PM, Robert Haas wrote:
 On Sat, Nov 17, 2012 at 1:22 AM, Peter Eisentraut pete...@gmx.net
 wrote:
  On Mon, 2012-11-12 at 11:42 -0600, Karl O. Pinc wrote:
  Could ALTER TABLE use an option to drop the
  primary key constraint?  I needed to do that,
  found it was not obvious, and this lead me to
  try to improve things.
 
  That could be useful, I think.  But it might open a can of worms.
 
 Would the new option be syntactic sugar around ALTER TABLE ... DROP
 CONSTRAINT put_the_name_of_the_primary_key_here?

This sounds nice to me, but there's worms left over because
the unique index created when PRIMARY KEY is specified would 
then remain.  This the right behavior IMHO, and if everything
is spelled out in the documentation no problems should arise.

But the user deserves to know how to get rid of the unique
index too, so the index's name would need to be documented.
Since this is something of an internal matter (?) there
might be another worm here.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP json generation enhancements

2012-11-21 Thread Andrew Dunstan



On 11/21/2012 03:16 PM, Andrew Dunstan wrote:

Here is a WIP patch for enhancements to json generation.

First, there is the much_requested json_agg, which will aggregate rows 
directly to json. So the following will now work:


select json_agg(my_table) from mytable;
select json_agg(q) from (myquery here) q;

One open question regarding this feature is whether this should return 
NULL or '[]' for 0 rows. Currently it returns NULL but I could be 
convinced to return '[]', and the change would be very small.


Next is to_json(), which will turn any value into json, so we're no 
longer restricted to rows and arrays.


Non-builtin types are now searched for a cast to json, and if it 
exists it is used instead of the type's text representation. I didn't 
add a special type to look for a cast to, as was discussed before, as 
it seemed a bit funky and unnecessary. It can easily be added, but I'm 
still not convinced it's a good idea. Note that this is only done for 
types that aren't builtin - we know how to turn all of those into json 
without needing to look for a cast.


Along with this there is an hstore_to_json() function added to the 
hstore module, and a cast from hstore to json that uses it. This 
function treats every value in the hstore as a string. There is also a 
function with the working title of hstore_to_json_loose() that does a 
heuristic conversion that treats values of 't' and 'f' as booleans, 
and strings that look like numbers as numbers unless they start with a 
leading 0 followed by another digit (could be zip codes, phone numbers 
etc.) The difference between these is illustrated here (notice that 
quoted 't' becomes unquoted 'true' and quoted '1' becomes '1'):


   andrew=# select json_agg(q) from foo q;
json_agg
-
 [{a:a,b:1,h:{c: t, d: null, q: 1, x: y}}]
   (1 row)

   andrew=# select json_agg(q) from (select a, b, 
hstore_to_json_loose(h) as h from foo) q;

json_agg

 [{a:a,b:1,h:{c: true, d: null, q: 1, x: y}}]
   (1 row)

Note: this patch will need a change in the oids used for the new 
functions if applied against git tip, as they have been overtaken by 
time.



Comments welcome.





Updated patch that works with git tip and has regression tests.

cheers

andrew



*** a/contrib/hstore/expected/hstore.out
--- b/contrib/hstore/expected/hstore.out
***
*** 1453,1455  select count(*) from testhstore where h = 'pos=98, line=371, node=CBA, indexe
--- 1453,1491 
   1
  (1 row)
  
+ -- json
+ select hstore_to_json('a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4');
+  hstore_to_json  
+ -
+  {b: t, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1}
+ (1 row)
+ 
+ select cast( hstore  'a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4' as json);
+   json   
+ -
+  {b: t, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1}
+ (1 row)
+ 
+ select hstore_to_json_loose('a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4');
+hstore_to_json_loose   
+ --
+  {b: true, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1}
+ (1 row)
+ 
+ create table test_json_agg (f1 text, f2 hstore);
+ insert into test_json_agg values ('rec1','a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4'),
+('rec2','a key =2, b = f, c = null, d= -12345, e = 012345.6, f= -1.234, g= 0.345e-4');
+ select json_agg(q) from test_json_agg q;
+   json_agg  
+ 
+  [{f1:rec1,f2:{b: t, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1}},  +
+   {f1:rec2,f2:{b: f, c: null, d: -12345, e: 012345.6, f: -1.234, g: 0.345e-4, a key: 2}}]
+ (1 row)
+ 
+ select json_agg(q) from (select f1, hstore_to_json_loose(f2) as f2 from test_json_agg) q;
+json_agg   
+ --
+  [{f1:rec1,f2:{b: true, c: null, d: 12345, e: 012345,

Re: [HACKERS] User control over psql error stream

2012-11-21 Thread Karl O. Pinc

On 11/21/2012 01:41:56 PM, Peter Eisentraut wrote:
 On 11/15/12 3:53 PM, Karl O. Pinc wrote:
  This patch gives the end user control over psql's
  error stream.  This allows a single psql session
  to use \o to send both errors and table output
  to multiple files.  Useful when capturing test output, etc.
 
 What does this do that cannot in practice be achieved using shell
 redirection operators?

To look at it from another angle, you need it for the
same reason you need \o -- to redirect output
to multiple places from within a single psql process.
\o redirects stdout, but I'm interested in
redirecting stderr.

Which makes me think that instead of a \pset
option a, say, \e, command is called for.
This would do the same thing \o does only
for stderr instead of stdout.  

The problem comes
when you want \e to send to the same place \o
sends to, or vice versa.  You can't just compare textual paths
because there's more than one way to write
a path to an object in the filesystem.
You need either a special argument
to \e and \o -- very ungood as it breaks
backwards compatibility with \o --
or you need, perhaps, yet another \
command to send stderr to where \o
is going or, conversely, send \o
to where stderr is going.  Perhaps\oe
could send \o output to where stderr
is going at the moment and \eo could
send stderr output to where \o output
is going at the moment?

Sorry to ramble on.  I'm tired.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FDW for PostgreSQL

2012-11-21 Thread Shigeru Hanada

On Wed, Nov 21, 2012 at 7:31 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 At execute_query(), it stores the retrieved rows onto tuplestore of
 festate-tuples at once. Doesn't it make problems when remote-
 table has very big number of rows?


No.  postgres_fdw uses single-row processing mode of libpq when
retrieving query results in execute_query, so memory usage will
be stable at a certain level.


 IIRC, the previous code used cursor feature to fetch a set of rows
 to avoid over-consumption of local memory. Do we have some
 restriction if we fetch a certain number of rows with FETCH?
 It seems to me, we can fetch 1000 rows for example, and tentatively
 store them onto the tuplestore within one PG_TRY() block (so, no
 need to worry about PQclear() timing), then we can fetch remote
 rows again when IterateForeignScan reached end of tuplestore.


As you say, postgres_fdw had used cursor to avoid possible memory
exhaust on large result set.  I switched to single-row processing mode
(it could be said protocol-level cursor), which was added in 9.2,
because it accomplish same task with less SQL calls than cursor.

Regards,
-- 
Shigeru HANADA

Re: [HACKERS] Doc patch: Document names of automatically created constraints and indexes

2012-11-21 Thread Karl O. Pinc

On 11/21/2012 10:00:11 PM, Karl O. Pinc wrote:
 On 11/21/2012 02:12:26 PM, Robert Haas wrote:
  On Sat, Nov 17, 2012 at 1:22 AM, Peter Eisentraut pete...@gmx.net
  wrote:
   On Mon, 2012-11-12 at 11:42 -0600, Karl O. Pinc wrote:
   Could ALTER TABLE use an option to drop the
   primary key constraint?  I needed to do that,
   found it was not obvious, and this lead me to
   try to improve things.
  
   That could be useful, I think.  But it might open a can of worms.
  
  Would the new option be syntactic sugar around ALTER TABLE ... DROP
  CONSTRAINT put_the_name_of_the_primary_key_here?
 
 This sounds nice to me,

No, wait.   If constraint name_of_primary_key is an internal
and is to change over time, how do you deal with dropping,
now, a primary key constraint that was created, then, before
some change to the internal name.  And you wouldn't want
to accidentally remove a user-created constraint
that just happened to have the same name as the internal
primary key constraint name, especially in the case
where there's a real primary key constraint created
under an old naming convention.

Changes to the primary key constraint naming convention
would have to require a db dump/restore on pg upgrade
to the new version, or something else that changes
the primary key constraint names.  
(And then I'm not sure what would
happen if a user was, before upgrading, using a constraint name
that was the new default.)  And the changing of
the internal constraint name would have had to have
always previously caused a name change in existing
pg dbs or else dbs created long ago could today
have primary key constraint names following old
conventions, raising the concern at top.

I'm sorry to waste your time if these are obvious
issues.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-21 Thread Pavan Deolasee

On Mon, Nov 19, 2012 at 8:52 PM, Amit kapila amit.kap...@huawei.com wrote:

 On Monday, November 19, 2012 5:53 AM Jeff Janes wrote:
 On Sun, Oct 21, 2012 at 12:59 AM, Amit kapila amit.kap...@huawei.com
 wrote:
  On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote:
 
 Run the modes in reciprocating order?
  Sorry, I didn't understood this, What do you mean by modes in
 reciprocating order?

  Sorry for the long delay.  In your scripts, it looks like you always
  run the unpatched first, and then the patched second.

Yes, thats true.

  By reciprocating, I mean to run them in the reverse order, or in random
 order.

 Today for some configurations, I have ran by reciprocating the order.
 Below are readings:
 Configuration
 16GB (Database) -7GB (Shared Buffers)

 Here i had run in following order
 1. Run perf report with patch for 32 client
 2. Run perf report without patch for 32 client
 3. Run perf report with patch for 16 client
 4. Run perf report without patch for 16 client

 Each execution is 5 minutes,
 16 client /16 thread|   32 client /32 thread
@mv-free-lst @9.3devl|  @mv-free-lst @9.3devl
 ---
   36694056|   53565258
   39874121|   46255185
   48404574|   45026796
   64656932|   45588233
   69667222|   49558237
   75517219|   91158269
   83157168|   431718340
   91027136|   579208349
 ---
   63626054|   167757333


Sorry, I haven't followed this thread at all, but the numbers (43171 and
57920) in the last two runs of @mv-free-list for 32 clients look
aberrations, no ?  I wonder if that's skewing the average.

I also looked at the the Results.htm file down thread. There seem to be a
steep degradation when the shared buffers are increased from 5GB to 10GB,
both with and without the patch. Is that expected ? If so, isn't that worth
investigating and possibly even fixing before we do anything else ?

Thanks,
Pavan

Re: [HACKERS] PQconninfo function for libpq

2012-11-21 Thread Boszormenyi Zoltan


2012-11-21 22:15 keltezéssel, Magnus Hagander írta:

On Wed, Nov 21, 2012 at 10:01 PM, Boszormenyi Zoltan z...@cybertec.at wrote:

Hi,

2012-11-21 19:19 keltezéssel, Magnus Hagander írta:


I'm breaking this out into it's own thread, for my own sanity if
nothing else :) And it's an isolated feature after all.

I still agree with the previous review at

http://archives.postgresql.org/message-id/1349321071.23971.0.ca...@vanquo.pezone.net
about keeping the data in more than one place.


OK, it seems I completely missed that comment.
(Or forgot about it if I happened to answer it.)



Based on that, I've created a different version of this patch,
attached. This way we keep all the data in one struct.


I like this single structure but not the way you handle the
options' classes. In your patch, each option belongs to only
one class. These classes are:

PG_CONNINFO_REPLICATION = replication only
PG_CONNINFO_PASSWORD = password only
PG_CONNINFO_NORMAL = everything else

How does it help pg_basebackup to filter out e.g. dbname and replication?

PG_CONNINFO_ALL should give pg_basebackup all it needs, no? Or
actually, it shouldn't have the replication=1 part, right? So it
should just use PG_CONNINFO_NORMAL|PG_CONNINFO_PASSWORD?



These are added by the walreceiver module anyway and adding them
to the primary_conninfo line should even be discouraged by the
documentation.

Hmm. I wasn't actually thinking about the dbname part here, I admit that.


And not only the dbname, libpqwalreceiver adds these three:

[zozo@localhost libpqwalreceiver]$ grep dbname *
libpqwalreceiver.c: %s dbname=replication replication=true 
fallback_application_name=walreceiver,


I also excluded application_name from PG_CONNINFO_REPLICATION
by this reasoning:

- for async replication or single standby, it doesn't matter,
  the connection will show up as walreceiver
- for sync replication, the administrator has to add the node name
  manually via application_name.




In my view, the classes should be inclusive:

PG_CONNINFO_NORMAL = Everything that's usable for a regular client
connection. This mean everything, maybe including password but
excluding replication.

PG_CONNINFO_PASSWORD = password only.

PG_CONNINFO_REPLICATION = Everything usable for a replication client
not added by walreceiver. Maybe including/excluding password.

Maybe there should be two flags for replication usage:

PG_CONNINFO_WALRECEIVER = everything except those not added
by walreceiver (and maybe password too)

PG_CONNINFO_REPLICATION = replication only

And every option can belong to more than one class, just as in my patch.

Hmm. I kind of liked having each option in just one class, but I see
the problem. Looking at the ones you suggest, all the non-password
ones *have* to be without password, otherwise there i no way to get
the conninfo without password - which is the whole reason for that
parameter to exist. So the PASSWORD one has to be additional - which
means that not making the other ones additional makes them
inconsistent. But maybe we don't really have a choice there.


Yes, the PASSWORD part can be on its own, this is what I meant above
but wanted a different opinion about having it completely separate
is better or not.

But the NORMAL and REPLICATION (or WALRECEIVER) classes
need to overlap.


At this point, the patch is untested beyond compiling and a trivial
psql check, because I ran out of time :) But I figured I'd throw it
out there for comments on which version people prefer. (And yes, it's
quite possible I've made a big think-mistake in it somewhere, but
again, better to get some eyes on it early)

My version also contains a fixed version of the docs that should be
moved back into Zoltans version if that's the one we end up
preferring.


I also liked your version of the documentation better,
I am not too good at writing docs.

np.



Also, a question was buried in the other review which is - are we OK
to remove the requiressl parameter. Both these patches do so, because
the code becomes much simpler if we can do that. It has been
deprecated since 7.2. Is it OK to remove it, or do we need to put back
in the more complex code to deal with both?

Just going to highlight that we're looking for at least one third
party to comment on this :)


Yes, me too. A +1 for removing wouldn't count from me. ;-)




Attached is both Zoltans latest patch (v16) and my slightly different
approach.

Comments on which approach is best?

Test results from somebody who has the time to look at it? :)

Do you happen to have a set of tests you've been running on your
patches? Can you try them again this one?


My set of tests are:

1. initdb the master
2. pg_basebackup -R the first standby from the master
3. pg_basebackup -R the second standby from the master
4. pg_basebackup -R the third standby from the first standby

and diff -durpN the different data directories while there is no load on either.

I will test it today after fixing the classes in your patch. ;-)

62 matches

Mail list logo