Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Pavel Stehule
Hi

plpgsql has zero optimization for this kind of functions. It is best glue
for SQL statements and relative bad for high expensive numeric
calculations. It is very simple AST interpret only.

Try to use PLPerl, PLPython, PLLua instead for this purposes.

Pavel


2014-08-04 22:48 GMT+02:00 testman1316 danilo.rami...@hmhco.com:

 We am trying to get an idea of the raw performance of Oracle vs PostgreSQL.
 We have extensive oracle experience but are new to PostgreSQL. We are going
 to run lots of queries with our data, etc. But first we wanted to see just
 how they perform on basic kernel tasks, i.e. math and branching since SQL
 is
 built on that.

 In AWS RDS we created two db.m3.2xlarge instances one with oracle
 11.2.0.4.v1 license included, the other with PostgreSQL (9.3.3)

 In both we ran code that did 1 million square roots (from 1 to 1 mill).
 Then
 did the same but within an If..Then statement.

 The results were a bit troubling:

 Oracle  4.8 seconds

 PostgreSQL  21.803 seconds

 adding an if statement:

 Oracle  4.78 seconds

 PostgreSQL  24.4 seconds

 code Oracle square root

 SET SERVEROUTPUT ON
 SET TIMING ON

 DECLARE
   n NUMBER := 0;
 BEGIN
   FOR f IN 1..1000
 LOOP
 n := SQRT (f);
   END LOOP;
 END;

 PostgreSQL

 DO LANGUAGE plpgsql $$ DECLARE n real;
 DECLARE f integer;
 BEGIN
 FOR f IN 1..1000 LOOP
 n = SQRT (f);
 END LOOP;
 RAISE NOTICE 'Result = %',n;
 END $$;

 oracle adding if

 SET SERVEROUTPUT ON
 SET TIMING ON

 DECLARE
   n NUMBER := 0;
 BEGIN
   FOR f IN 1..1000
 LOOP
   if 0 =0 then
 n := SQRT (f);
 end if;
   END LOOP;

 postgres adding if

 DO LANGUAGE plpgsql $$ DECLARE n real;
 DECLARE f integer;
 BEGIN
 FOR f IN 1..1000 LOOP
 if 0=0 then
 n = SQRT (f);
 end if;
 END LOOP;
 RAISE NOTICE 'Result = %',n;
 END $$;

 I used an anonymous block for PostgreSQL. I also did it as a function and
 got identical results

 CREATE OR REPLACE FUNCTION testpostgrescpu()
   RETURNS real AS
 $BODY$
 declare
  n real;
  f integer;

 BEGIN
FOR f IN 1..1000 LOOP
 n = SQRT (f);
END LOOP;


RETURN n;
 END;
 $BODY$
   LANGUAGE plpgsql VOLATILE
   COST 100;
 ALTER FUNCTION testpostgrescpu()
   OWNER TO xxx

 Based on what we had heard of PostgreSQL and how it is comparable to Oracle
 in many ways, we were taken aback by the results. Did we code PostgreSQL
 incorrectly? What are we missing or is this the way it is.

 Note: once we started running queries on the exact same data in Oracle and
 PostgreSQL we saw a similar pattern. On basic queries little difference,
 but
 as they started to get more and more complex Oracle was around 3-5 faster.

 Again, this was run on identical AWS RDS instances, we ran them many times
 during the day on different days and results were always the same




 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/PostrgeSQL-vs-oracle-doing-1-million-sqrts-am-I-doing-it-wrong-tp5813732.html
 Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Mark Kirkwood

On 05/08/14 17:56, Mark Kirkwood wrote:



Adding in the 'if' in the float8 case increases run time to 4s. So looks
like plpgsql might have a slightly higher cost for handling added
conditionals. Be interesting to dig a bit more and see what is taking
the time.



Thinking about this a bit more, I wonder if the 'big O' has added some 
optimizations in PL/SQL for trivial conditionals - i.e you are adding:


IF (0 = 0) THEN

END IF;

...it may be going...'Ah yes, always true...so remove'!

So it might be interesting to try some (hopefully not so easily 
removable) non trivial ones like:



DO LANGUAGE plpgsql $$ DECLARE
DECLARE i integer;
BEGIN
FOR i IN 1..1000 LOOP
  IF (i%100 = 0) THEN
NULL;
  END IF;
END LOOP;
END $$;

Now I guess there is the chance that PL/SQL might understand that NULL 
inside a loop means it can remove it...so you may need to experiment 
further. The point to take away here is that for interesting loops and 
conditions - there may be not such a significant difference!


Regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Gabriele Bartolini
Hi Claudio,

  I think there has been a misunderstanding. I agree with you (and I
think also Marco) that LSN is definitely a component to consider in
this process. We will come up with an alternate proposal which
considers LSNS either today or tomorrow. ;)

Thanks,
Gabriele
--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it


2014-08-04 20:30 GMT+02:00 Claudio Freire klaussfre...@gmail.com:
 On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
I really like the proposal of working on a block level incremental
 backup feature and the idea of considering LSN. However, I'd suggest
 to see block level as a second step and a goal to keep in mind while
 working on the first step. I believe that file-level incremental
 backup will bring a lot of benefits to our community and users anyway.

 Thing is, I don't see how the LSN method is that much harder than an
 on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.

 Keeping a last-updated-LSN for each segment (or group of blocks) is
 just as easy as keeping a bitmap, and far more flexible and robust.

 The complexity and cost of safely keeping the map up-to-date is what's
 in question here, but as was pointed before, there's no really safe
 alternative. Nor modification times nor checksums (nor in-memory
 bitmaps IMV) are really safe enough for backups, so you really want to
 use something like the LSN. It's extra work, but opens up a world of
 possibilities.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Marti Raudsepp
On Mon, Aug 4, 2014 at 11:48 PM, testman1316 danilo.rami...@hmhco.com wrote:
 In both we ran code that did 1 million square roots (from 1 to 1 mill). Then
 did the same but within an If..Then statement.

 Note: once we started running queries on the exact same data in Oracle and
 PostgreSQL we saw a similar pattern. On basic queries little difference, but
 as they started to get more and more complex Oracle was around 3-5 faster.

Looks like from the test cases you posted, you're not actually
benchmarking any *queries*, you're comparing the speeds of the
procedural languages. And yes, PL/pgSQL is known to be a farily slow
language.

If you want fair benchmark results, you should instead concentrate on
what databases are supposed to do: store and retrieve data; finding
the most optimal way to execute complicated SQL queries. In most
setups, that's where the majority of database processor time is spent,
not procedure code.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch to support SEMI and ANTI join removal

2014-08-05 Thread David Rowley
I've been working away at allowing semi and anti joins to be added to the
list of join types that our join removal code supports.

The basic idea is that we can removal a semi or anti join if the left hand
relation references the relation that's being semi/anti joined if the join
condition matches a foreign key definition on the left hand relation.

To give an example:

Given the 2 tables:
create table t2 (t1_id int primary key);
create table t1 (value int references t2);

The join to t2 would not be required in:

select * from t1 where value in(select t1_id from t2);

Neither would it be here:

select * from t1 where not exists(select 1 from t2 where t1_id=value);

To give a bit of background, I initially proposed the idea here:

http://www.postgresql.org/message-id/caaphdvq0nai8ceqtnndqg6mhfh__7_a6tn9xu4v0cut9wab...@mail.gmail.com

And some issues were raised around the fact that updates to the referenced
relation would only flush out changes to the referencing tables on
completion of the command, and if we happened to be planning a query that
was located inside a volatile function then we wouldn't know that the
parent query hadn't updated some of these referenced tables.

Noah raised this concern here:
http://www.postgresql.org/message-id/20140603235053.ga351...@tornado.leadboat.com
But proposed a solution here:
http://www.postgresql.org/message-id/20140605000407.ga390...@tornado.leadboat.com

In the attached I've used Noah's solution to the problem, and it seems to
work just fine. (See regression test in the attached patch)

Tom raised a point here:
http://www.postgresql.org/message-id/19326.1401891...@sss.pgh.pa.us

Where he mentioned that it may be possible that the foreign key trigger
queue gets added to after planning has taken place.
I've spent some time looking into this and I've not yet managed to find a
case where this matters as it seems that updates made in 1 command are not
visible to that same command. I've tested various different test cases in
all transaction isolation levels and also tested update commands which call
volatile functions that perform updates in the same table that the outer
update will reach later in the command.

The patch (attached) is also now able to detect when a NOT EXISTS clause
cannot produce any records at all.

If I make a simple change to the tables I defined above:

ALTER TABLE t1 ALTER COLUMN value SET NOT NULL;

Then the following will be produced:

explain (costs off) select * from t1 where not exists(select 1 from t2
where t1_id=value);
QUERY PLAN
--
 Result
   One-Time Filter: false
   -  Seq Scan on t1

A small note on my intentions with this patch:

I'm not seeing the use case for all of this to be massive, I'm more
interested in this patch to use it as a stepping stone towards implementing
INNER JOIN removals which would use foreign keys in a similar way to
attempt to prove that the join is not required. I decided to tackle semi
and anti joins first as these are a far more simple case, and it also adds
quite a bit of the infrastructure that would be required for inner join
removal, plus if nobody manages to poke holes in my ideas with this then I
should have good grounds to begin the work on the inner join removal code.
I also think if we're bothering to load foreign key constraints at planning
time, then only using them for inner join removals wouldn't be making full
use of them, so likely this patch would be a good idea anyway.

Currently most of my changes are in analyzejoin.c, but I did also have to
make changes to load the foreign key constraints so that they were
available to the planner. One thing that is currently lacking, which would
likely be needed, before the finished patch is ready, would be a
relhasfkeys column in pg_class. Such a column would mean that it would be
possible to skip scanning pg_constraint for foreign keys when there's none
to find. I'll delay implementing that until I get a bit more feedback to
weather this patch would be a welcome addition to the existing join removal
code or not.

I'm submitting this (a little early) for the August commitfest, but if
anyone has any time to glance at it before then then that would be a really
good help.

Regards

David Rowley


semianti_join_removal_2410c7c_2014-08-05.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.

2014-08-05 Thread Ashutosh Bapat
Hi Kyotaro,
I looked at the patches and felt that the approach taken here is too
intrusive, considering that the feature is only for foreign scans.

There are quite a few members added to the generic Path, Plan structures,
whose use is is induced only through foreign scans. Each path now stores
two sets of costs, one with parallelism and one without. The parallel
values will make sense only when there is a foreign scan, which uses
parallelism, in the plan tree. So, those costs are maintained unnecessarily
or the memory for those members is wasted in most of the cases, where the
tables involved are not foreign. Also, not many foreign tables will be able
to use the parallelism, e.g. file_fdw. Although, that's my opinion; I would
like hear from others.

Instead, an FDW which can use parallelism can add two paths one with and
one without parallelism with appropriate costs and let the logic choosing
the cheapest path take care of the actual choice. In fact, I thought,
parallelism would be always faster than the non-parallel one, except when
the foreign server is too much loaded. But we won't be able to check that
anyway. Can you point out a case where the parallelism may not win over
serial execution?

BTW, the name parallelism seems to be misleading here. All, it will be able
to do is fire the queries (or data fetch requests) asynchronously. So, we
might want to change the naming appropriately.


On Fri, Aug 1, 2014 at 2:48 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hello,

  Hello, this is the new version which is complete to some extent
  of parallelism based on postgres_fdw.
 
  This compares the costs for parallel and non-parallel execution
  and choose parallel one if it is faster by some extent specified
  by GUCs. The attached files are,
 
   0001_parallel_exec_planning_v0.patch:
 - PostgreSQL body stuff for parallel execution planning.
 
   0002_enable_postgres_fdw_to_run_in_parallel_v0.patch:
 - postgres_fdw parallelization.
 
   0003_file_fdw_changes_to_avoid_error.patch:
 - error avoidig stuff for file_fdw (not necessary for this patch)
 
   env.sql:
 - simple test script to try this patch.
 
  =
 
   - planner stuff to handle cost of parallel execution. Including
 indication of parallel execution.
 
   - GUCs to control how easy to go parallel.
 
 parallel_cost_threshold is the threshold of path total cost
 where to enable parallel execution.
 
 prallel_ratio_threshond is the threshold of the ratio of
 parallel cost to non-parallel cost where to choose the
 parallel path.
 
   - postgres_fdw which can run in multiple sessions using snapshot
 export and fetches in parallel for foreign scans on dedicated
 connections.

 But now the effect of async execution of FETCH'es is omitted
 during planning.

 foreign server has a new option 'max_aux_connections', which
 limits the number of connections for parallel execution per
 (server, user) pairs.
 
   - change file_fdw to follow the changes of planner stuff.
 
 
  Whth the patch attached, the attached sql script shows the
  following result (after some line breaks are added).
 
  postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c
 FROM fvs1 a join fvs1_2 b on (a.a = b.a);
  QUERY PLAN
 
 
  Hash Join  (cost=9573392.96..9573393.34 rows=1 width=40 parallel)
 (actual time=2213.400..2213.407 rows=12 loops=1)
   Hash Cond: (a.a = b.a)
   -  Foreign Scan on fvs1 a
 (cost=9573392.96..9573393.29 rows=10 width=8 parallel)
 (actual time=2199.992..2199.993 rows=10 loops=1)
   -  Hash  (cost=9573393.29..9573393.29 rows=10 width=36)
 (actual time=13.388..13.388 rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 6kB
 -  Foreign Scan on fvs1_2 b
 (cost=9573392.96..9573393.29 rows=10 width=36
 parallel)
 (actual time=13.376..13.379 rows=10 loops=1)
   Planning time: 4.761 ms
   Execution time: 2227.462 ms
  (8 rows)
  postgres=# SET parallel_ratio_threshold to 0.0;
  postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c
 FROM fvs1 a join fvs1 b on (a.a = b.a);
  QUERY PLAN
 
 --
   Hash Join  (cost=318084.32..318084.69 rows=1 width=40)
  (actual time=4302.913..4302.928 rows=12 loops=1)
 Hash Cond: (a.a = b.a)
 -  Foreign Scan on fvs1 a  (cost=159041.93..159042.26 rows=10
 width=8)
 (actual time=2122.989..2122.992 rows=10
 loops=1)
 -  Hash  (cost=159042.26..159042.26 rows=10 width=500)
   (actual time=2179.900..2179.900 rows=10 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 6kB
   -  Foreign Scan on fvs1 b
 (cost=159041.93..159042.26 rows=10 

Re: [HACKERS] psql: show only failed queries

2014-08-05 Thread Fujii Masao
On Wed, Jul 16, 2014 at 4:34 AM, Pavel Stehule pavel.steh...@gmail.com wrote:



 2014-07-15 12:07 GMT+02:00 Fujii Masao masao.fu...@gmail.com:

 On Tue, Jul 15, 2014 at 7:01 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
 
 
 
  2014-07-15 11:29 GMT+02:00 Fujii Masao masao.fu...@gmail.com:
 
  On Thu, Jul 10, 2014 at 9:56 PM, Pavel Stehule
  pavel.steh...@gmail.com
  wrote:
   Hello
  
   here is a proposed patch - autocomplete for known psql variable
   content
 
  Even after applying the patch, the following psql variables were not
  displayed
  on the tab-completion of \set command.
 
  HISTFILE
  HISTSIZE
  HOST
  IGNOREEOF
  LASTOID
 
  I'm not sure if it's worth displaying all of them on the tab-completion
  of
  \set
  because it seems strange to change some of them by \set command, for
  example
  HOST, though.
 
 
  For these variables are not default - so doesn't exist and cannot be
  completed by default.
 
  there are two fix:
 
  a) fix autocomplete source - preferred

 +1


 here is the patch

Thanks for the patch!

I got the following compiler warnings.

tab-complete.c:4155: warning: assignment discards qualifiers from
pointer target type
tab-complete.c:4166: warning: assignment discards qualifiers from
pointer target type

+FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST,
IGNOREOFF,

Typo: IGNOREOFF - IGNOREEOF

+/* all psql known variables are included in list by default */
+for (known_varname = known_varnames; *known_varname; known_varname++)
+varnames[nvars++] = pg_strdup(*known_varname);

Don't we need to append both prefix and suffix to even known variables?

+else if (strcmp(prev2_wd, \\set) == 0)
+{
+if (strcmp(prev_wd, AUTOCOMMIT) == 0)

ISTM that some psql variables like IGNOREEOF are not there. Why not?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-05 Thread Fujii Masao
On Tue, Jul 29, 2014 at 7:07 PM,  furu...@pm.nttdata.co.jp wrote:
 I have improved the patch  by making following changes:

 1. Since stream_stop() was redundant, stream_stop() at the time of WAL file 
 closing was deleted.

 2. Change the Flash judging timing for the readability of source code.
I have changed the Flash judging timing , from the continuous message 
 after receiving to
before the feedbackmassege decision of continue statement after execution.

Thanks for the updated version of the patch!

While reviewing the patch, I found that HandleCopyStream() is still
long and which decreases the readability of the source code.
So I feel inclined to refactor the HandleCopyStream() more for better
readability. What about the attached refactoring patch?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-05 Thread Fujii Masao
On Tue, Aug 5, 2014 at 9:04 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jul 29, 2014 at 7:07 PM,  furu...@pm.nttdata.co.jp wrote:
 I have improved the patch  by making following changes:

 1. Since stream_stop() was redundant, stream_stop() at the time of WAL file 
 closing was deleted.

 2. Change the Flash judging timing for the readability of source code.
I have changed the Flash judging timing , from the continuous message 
 after receiving to
before the feedbackmassege decision of continue statement after execution.

 Thanks for the updated version of the patch!

 While reviewing the patch, I found that HandleCopyStream() is still
 long and which decreases the readability of the source code.
 So I feel inclined to refactor the HandleCopyStream() more for better
 readability. What about the attached refactoring patch?

Sorry, I forgot to attached the patch in previous email. So attached.

Regards,

-- 
Fujii Masao
*** a/src/bin/pg_basebackup/receivelog.c
--- b/src/bin/pg_basebackup/receivelog.c
***
*** 31,42  static char current_walfile_name[MAXPGPATH] = ;
--- 31,53 
  static bool reportFlushPosition = false;
  static XLogRecPtr lastFlushPosition = InvalidXLogRecPtr;
  
+ static bool still_sending = true;		/* feedback still needs to be sent? */
+ 
  static PGresult *HandleCopyStream(PGconn *conn, XLogRecPtr startpos,
   uint32 timeline, char *basedir,
  			   stream_stop_callback stream_stop, int standby_message_timeout,
   char *partial_suffix, XLogRecPtr *stoppos);
  static int CopyStreamPoll(PGconn *conn, long timeout_ms);
  static int CopyStreamReceive(PGconn *conn, long timeout, char **buffer);
+ static bool ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
+ XLogRecPtr blockpos, int64 *last_status);
+ static bool ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
+ 			   XLogRecPtr *blockpos, uint32 timeline,
+ 			   char *basedir, stream_stop_callback stream_stop,
+ 			   char *partial_suffix);
+ static PGresult *HandleEndOfCopyStream(PGconn *conn, char *copybuf,
+ 	   XLogRecPtr blockpos, char *basedir, char *partial_suffix,
+ 	   XLogRecPtr *stoppos);
  
  static bool ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos,
  		 uint32 *timeline);
***
*** 740,755  HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
  	char	   *copybuf = NULL;
  	int64		last_status = -1;
  	XLogRecPtr	blockpos = startpos;
! 	bool		still_sending = true;
  
  	while (1)
  	{
  		int			r;
- 		int			xlogoff;
- 		int			bytes_left;
- 		int			bytes_written;
  		int64		now;
- 		int			hdr_len;
  		long		sleeptime;
  
  		/*
--- 751,763 
  	char	   *copybuf = NULL;
  	int64		last_status = -1;
  	XLogRecPtr	blockpos = startpos;
! 
! 	still_sending = true;
  
  	while (1)
  	{
  		int			r;
  		int64		now;
  		long		sleeptime;
  
  		/*
***
*** 818,1015  HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
  			goto error;
  		if (r == -2)
  		{
! 			PGresult   *res = PQgetResult(conn);
! 
! 			/*
! 			 * The server closed its end of the copy stream.  If we haven't
! 			 * closed ours already, we need to do so now, unless the server
! 			 * threw an error, in which case we don't.
! 			 */
! 			if (still_sending)
! 			{
! if (!close_walfile(basedir, partial_suffix, blockpos))
! {
! 	/* Error message written in close_walfile() */
! 	PQclear(res);
! 	goto error;
! }
! if (PQresultStatus(res) == PGRES_COPY_IN)
! {
! 	if (PQputCopyEnd(conn, NULL) = 0 || PQflush(conn))
! 	{
! 		fprintf(stderr,
! _(%s: could not send copy-end packet: %s),
! progname, PQerrorMessage(conn));
! 		PQclear(res);
! 		goto error;
! 	}
! 	PQclear(res);
! 	res = PQgetResult(conn);
! }
! still_sending = false;
! 			}
! 			if (copybuf != NULL)
! PQfreemem(copybuf);
! 			copybuf = NULL;
! 			*stoppos = blockpos;
! 			return res;
  		}
  
  		/* Check the message type. */
  		if (copybuf[0] == 'k')
  		{
! 			int			pos;
! 			bool		replyRequested;
! 
! 			/*
! 			 * Parse the keepalive message, enclosed in the CopyData message.
! 			 * We just check if the server requested a reply, and ignore the
! 			 * rest.
! 			 */
! 			pos = 1;			/* skip msgtype 'k' */
! 			pos += 8;			/* skip walEnd */
! 			pos += 8;			/* skip sendTime */
! 
! 			if (r  pos + 1)
! 			{
! fprintf(stderr, _(%s: streaming header too small: %d\n),
! 		progname, r);
  goto error;
- 			}
- 			replyRequested = copybuf[pos];
- 
- 			/* If the server requested an immediate reply, send one. */
- 			if (replyRequested  still_sending)
- 			{
- now = feGetCurrentTimestamp();
- if (!sendFeedback(conn, blockpos, now, false))
- 	goto error;
- last_status = now;
- 			}
  		}
  		else if (copybuf[0] == 'w')
  		{
! 			/*
! 			 * Once we've decided we don't want to receive any more, just
! 			 * ignore any subsequent 

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Roberto Mello
Em segunda-feira, 4 de agosto de 2014, testman1316 danilo.rami...@hmhco.com
escreveu:


 SET SERVEROUTPUT ON
 SET TIMING ON

 DECLARE
   n NUMBER := 0;
 BEGIN
   FOR f IN 1..1000
 LOOP
 n := SQRT (f);
   END LOOP;


In addition to the other suggestions that have been posted (using a
procedural language more suitable to mathematical ops, etc) I noticed that
you are using a RAISE in the PostgreSQL version that you are not in Oracle.

I am curious as to what the difference is if you use the RAISE in both or
neither cases.

Roberto


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-05 Thread Michael Paquier
On Sat, Aug 2, 2014 at 1:40 AM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:
 I ran this through my WAL page comparison tool to verify that all the WAL
 record types are replayed correctly (although I did some small cleanup
after
 that, so it's not impossible that I broke it again; will re-test before
 committing).
I have run as well some tests on it with a master a a standby to check WAL
construction and replay:
- regression tests as well as tests from contrib, isolation
- WAL page comparison tool
- Some tests with pgbench
- When testing pg_xlogdump I found an assertion failure for record
XLOG_GIN_INSERT:
Assertion failed: (((bool) (((const void*)(insertData-newitem.key) !=
((void*)0))  ((insertData-newitem.key)-ip_posid != 0, function
gin_desc, file gindesc.c, line 127.

Then, I had a look at the code, and here are some comments:
- This comment has been introduced with 96ef3b8 that has added page
checksums. Perhaps the authors can tell what was the original purpose of
this comment (Jeff, Simon). For now, it may be better to remove this FIXME
and to let this comment as is.
+* FIXME: I didn't understand below comment. Is it still
relevant?
+*
 * This also means there is no corresponding API call for
this, so an
 * smgr implementation has no need to implement anything.
Which means
 * nothing is needed in md.c etc
- Would it be a win to add an assertion check for (CritSectionCount == 0)
in XLogEnsureRecordSpace?
- There is no mention of REGBUF_NO_IMAGE in transam/README.
- This patch adds a lot of blocks like that in the redo code:
+ if (BufferIsValid(buffer))
+   UnlockReleaseBuffer(buffer);
What do you think about adding a new macro like UnlockBufferIfValid(buffer)?
- Don't we need a call to XLogBeginInsert() in XLogSaveBufferForHint():
+   int flags = REGBUF_FORCE_IMAGE;
+   if (buffer_std)
+   flags |= REGBUF_STANDARD;
+   XLogRegisterBuffer(0, buffer, flags);
[...]
-   recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI, rdata);
+   recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI);
- There is a typo in the header of xlogrecord.h, the to is not needed:
+ * Definitions for to the WAL record
format.

- There is still a TODO item in ValidXLogRecordHeader to check if block
data header is valid or not. Just mentioning.
- Just came across that: s/reference refes to/reference refers to
- XLogRecGetBlockRefIds needing an already-allocated array for *out is not
user-friendly. Cannot we just do all the work inside this function?
- All the functions in xlogconstruct.c could be defined in a separate
header xlogconstruct.h. What they do is rather independent from xlog.h. The
same applies to all the structures and functions of xlogconstruct.c in
xlog_internal.h: XLogRecordAssemble, XLogRecordAssemble,
InitXLogRecordConstruct. (worth noticing as well that the only reason
XLogRecData is defined externally of xlogconstruct.c is that it is being
used in XLogInsert()).

The patch is more solid and better structured than the previous versions.
It is in good shape.
Regards,
-- 
Michael


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-05 Thread Fujii Masao
On Wed, Jul 23, 2014 at 5:21 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 1. Need for compressing full page backups:
 There are good number of benchmarks done by various people on this list
 which clearly shows the need of the feature. Many people have already voiced
 their agreement on having this in core, even as a configurable parameter.

Yes!

 Having said that, IMHO we should go one step at a time. We are using pglz
 for compressing toast data for long, so we can continue to use the same for
 compressing full page images. We can simultaneously work on adding more
 algorithms to core and choose the right candidate for different scenarios
 such as toast or FPW based on test evidences. But that work can happen
 independent of this patch.

This gradual approach looks good to me. And, if the additional compression
algorithm like lz4 is always better than pglz for every scenarios, we can just
change the code so that the additional algorithm is always used. Which would
make the code simpler.

 3. Compressing one block vs all blocks:
 Andres suggested that compressing all backup blocks in one go may give us
 better compression ratio. This is worth trying. I'm wondering what would the
 best way to do so without minimal changes to the xlog insertion code. Today,
 we add more rdata items for backup block header(s) and backup blocks
 themselves (if there is a hole then 2 per backup block) beyond what the
 caller has supplied. If we have to compress all the backup blocks together,
 then one approach is to copy the backup block headers and the blocks to a
 temp buffer, compress that and replace the rdata entries added previously
 with a single rdata.

Basically sounds reasonable. But, how does this logic work if there are
multiple rdata and only some of them are backup blocks?

If a hole is not copied to that temp buffer, ISTM that we should
change backup block header  so that it contains the info for a
hole, e.g., location that a hole starts. No?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Kevin Grittner
Roberto Mello roberto.me...@gmail.com wrote:

 In addition to the other suggestions that have been posted (using
 a procedural language more suitable to mathematical ops, etc) I
 noticed that you are using a RAISE in the PostgreSQL version that
 you are not in Oracle.

 I am curious as to what the difference is if you use the RAISE in
 both or neither cases.

Since that is outside the loop, the difference should be nominal;
and in a quick test it was.  On the other hand, reducing the
procedural code made a big difference.

test=# \timing on
Timing is on.
test=# DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
n = SQRT (f);
END LOOP;
RAISE NOTICE 'Result = %',n;
END $$;
NOTICE:  Result = 3162.28
DO
Time: 23687.914 ms

test=# DO LANGUAGE plpgsql $$ DECLARE n real;
BEGIN
  PERFORM SQRT(f) FROM generate_series(1, 1000) x(f);
END $$;
DO
Time: 3916.815 ms

Eliminating the plpgsql function entirely shaved a little more off:

test=# SELECT  FROM generate_series(1, 1000) x(f);
Time: 3762.886 ms

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Roberto Mello
On Tue, Aug 5, 2014 at 9:50 AM, Kevin Grittner kgri...@ymail.com wrote:

 Since that is outside the loop, the difference should be nominal;

Apologies. I misread on my phone and though it was within the loop.

 and in a quick test it was.  On the other hand, reducing the
 procedural code made a big difference.

snip

 test=# DO LANGUAGE plpgsql $$ DECLARE n real;
 BEGIN
   PERFORM SQRT(f) FROM generate_series(1, 1000) x(f);
 END $$;
 DO
 Time: 3916.815 ms

That is a big difference. Are you porting a lot of code from PL/SQL,
and therefore evaluating the performance difference of running this
code? Or is this just a general test where you wish to assess the
performance difference?

PL/pgSQL could definitely use some loving, as far as optimization
goes, but my feeling is that it hasn't happened because there are
other suitable backends that give the necessary flexibility for the
different use cases.

Roberto


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] how to debug into InitPostgres() and InitCatalogCache()?

2014-08-05 Thread 土卜皿
hi, all
  I already can debug general postgres using ddd and select
pg_backend_pid(); ,  now, I want to study the details of the system
catalog cache and system cache management, so I need to debug the function
InitPostgres() and InitCatalogCache(), and I tried the following steps:

[refer: How to debug postgresql during initdb ]
http://www.phacai.com/how-to-debug-postgresql-during-initdb

(1) in first terminal:

ddd initdb

and set a breakpoint on the next executable line after “PG_CMD_OPEN” in
function bootstrap_template1() ,

Then within gdb, use “r $myfolder/mydb”,

and found the cursor stop at the breakpoint

(2) in the second terminal:

ps aux | grep bin\/postgres

after get the posgtres server pid, start a new ddd without args, and within
gdb, using attach postgres.pid

and set a breakpoint as InitCatalogCache()

(3) back to the first ddd window, hit the cont button. I hope it can stop
at the second ddd's breakpoint, but nothing happened?

I must make mistake in some step, or my understanding is fully wrong,
please give me some advice, thanks a lot!

Dillon


Re: [HACKERS] Audit of logout

2014-08-05 Thread Fujii Masao
On Tue, Jul 15, 2014 at 10:45 PM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Jul 2, 2014 at 10:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 In short, maybe we ought to invent a new category PGC_SU_BACKEND (not
 wedded to this spelling), which is to PGC_BACKEND as PGC_SUSET is to
 PGC_USERSET, ie same when-it-can-be-changed behavior but only superusers
 are allowed to change it.  I don't have any objection to making these two
 settings only adjustable by superusers --- I just don't want to give up
 the existing timing restrictions for them.

 Another idea would be to get rid of PGC_SUSET as a separate category, and
 instead have a superuser-only bit in the GUC flags, which would apply to
 all categories.  This would be a bit more orthogonal, though likely a
 much more invasive change.

 That could become interesting in the futuren ow that we have ALTER
 SYSTEM SET. It could allow a non-superuser to make persistent
 configuration changes. Now, I'm not sure we actually *want* that
 though... But having it as a separate bit would make it possible for
 ALTER SYSTEM SET to say that for example regular users would be able
 to change work_mem persistently. But if we want to go down that route,
 we might need a more fine grained permissions model than just
 superuser vs non-superuser...

 I think going with the PGC_SU_BACKEND is the right choice at this
 time, until we have an actual usecase for the other :)

Yep, the attached patch introduces PGC_SU_BACKEND and
changes the contexts of log_connections and log_disconnections
to PGC_SU_BACKEND. Review?

Regards,

-- 
Fujii Masao
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***
*** 3258,3264  get_stats_option_name(const char *arg)
   * argv[0] is ignored in either case (it's assumed to be the program name).
   *
   * ctx is PGC_POSTMASTER for secure options, PGC_BACKEND for insecure options
!  * coming from the client, or PGC_SUSET for insecure options coming from
   * a superuser client.
   *
   * If a database name is present in the command line arguments, it's
--- 3258,3264 
   * argv[0] is ignored in either case (it's assumed to be the program name).
   *
   * ctx is PGC_POSTMASTER for secure options, PGC_BACKEND for insecure options
!  * coming from the client, or PGC_SU_BACKEND for insecure options coming from
   * a superuser client.
   *
   * If a database name is present in the command line arguments, it's
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
***
*** 957,963  process_startup_options(Port *port, bool am_superuser)
  	GucContext	gucctx;
  	ListCell   *gucopts;
  
! 	gucctx = am_superuser ? PGC_SUSET : PGC_BACKEND;
  
  	/*
  	 * First process any command-line switches that were included in the
--- 957,963 
  	GucContext	gucctx;
  	ListCell   *gucopts;
  
! 	gucctx = am_superuser ? PGC_SU_BACKEND : PGC_BACKEND;
  
  	/*
  	 * First process any command-line switches that were included in the
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***
*** 512,517  const char *const GucContext_Names[] =
--- 512,518 
  	 /* PGC_INTERNAL */ internal,
  	 /* PGC_POSTMASTER */ postmaster,
  	 /* PGC_SIGHUP */ sighup,
+ 	 /* PGC_SU_BACKEND */ superuser-backend,
  	 /* PGC_BACKEND */ backend,
  	 /* PGC_SUSET */ superuser,
  	 /* PGC_USERSET */ user
***
*** 910,916  static struct config_bool ConfigureNamesBool[] =
  		NULL, NULL, NULL
  	},
  	{
! 		{log_connections, PGC_BACKEND, LOGGING_WHAT,
  			gettext_noop(Logs each successful connection.),
  			NULL
  		},
--- 911,917 
  		NULL, NULL, NULL
  	},
  	{
! 		{log_connections, PGC_SU_BACKEND, LOGGING_WHAT,
  			gettext_noop(Logs each successful connection.),
  			NULL
  		},
***
*** 919,925  static struct config_bool ConfigureNamesBool[] =
  		NULL, NULL, NULL
  	},
  	{
! 		{log_disconnections, PGC_BACKEND, LOGGING_WHAT,
  			gettext_noop(Logs end of a session, including duration.),
  			NULL
  		},
--- 920,926 
  		NULL, NULL, NULL
  	},
  	{
! 		{log_disconnections, PGC_SU_BACKEND, LOGGING_WHAT,
  			gettext_noop(Logs end of a session, including duration.),
  			NULL
  		},
***
*** 5681,5696  set_config_option(const char *name, const char *value,
  			 * signals to individual backends only.
  			 */
  			break;
  		case PGC_BACKEND:
  			if (context == PGC_SIGHUP)
  			{
  /*
!  * If a PGC_BACKEND parameter is changed in the config file,
!  * we want to accept the new value in the postmaster (whence
!  * it will propagate to subsequently-started backends), but
!  * ignore it in existing backends.  This is a tad klugy, but
!  * necessary because we don't re-read the config file during
!  * backend start.
   *
   * In EXEC_BACKEND builds, this works differently: we load all
   * nondefault settings from the CONFIG_EXEC_PARAMS file during
--- 5682,5706 
  			 * 

Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread Shaun Thomas

On 08/05/2014 12:56 AM, Mark Kirkwood wrote:


The moral of the story for this case is that mapping Oracle to Postgres
datatypes can require some careful thought. Using 'native' types (like
integer, float8 etc) will generally give vastly quicker performance.


We've seen a lot of this ourselves. Oracle's NUMERIC is a native type, 
whereas ours is emulated. From the performance, it would appear that 
REAL is another calculated type.


At least you used INT though. I've seen too many Oracle shops using 
NUMERIC in PostgreSQL because it's there, and suffering major 
performance hits because of it.


That said, the documentation here says FLOAT4 is an alias for REAL, so 
it's somewhat nonintuitive for FLOAT4 to be so much slower than FLOAT8, 
which is an alias for DOUBLE PRECISION.


http://www.postgresql.org/docs/9.3/static/datatype.html

Not sure why that would be.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

2014-08-05 Thread testman1316
You are correct sir, 4.1 seconds.  Are you a consulant?  We ae looking for a 
Postgresql guru for advice.   We are doing a proof of concept of Postgresql on 
AWS

From: Mark Kirkwood-2 [via PostgreSQL] 
[mailto:ml-node+s1045698n5813763...@n5.nabble.com]
Sent: Tuesday, August 05, 2014 12:58 AM
To: Ramirez, Danilo
Subject: Re: PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?

On 05/08/14 08:48, testman1316 wrote:

 We am trying to get an idea of the raw performance of Oracle vs PostgreSQL.
 We have extensive oracle experience but are new to PostgreSQL. We are going
 to run lots of queries with our data, etc. But first we wanted to see just
 how they perform on basic kernel tasks, i.e. math and branching since SQL is
 built on that.

 In AWS RDS we created two db.m3.2xlarge instances one with oracle
 11.2.0.4.v1 license included, the other with PostgreSQL (9.3.3)

 In both we ran code that did 1 million square roots (from 1 to 1 mill). Then
 did the same but within an If..Then statement.

 The results were a bit troubling:

 Oracle  4.8 seconds

 PostgreSQL  21.803 seconds

 adding an if statement:

 Oracle  4.78 seconds

 PostgreSQL  24.4 seconds

 code Oracle square root

 SET SERVEROUTPUT ON
 SET TIMING ON

 DECLARE
n NUMBER := 0;
 BEGIN
FOR f IN 1..1000
 LOOP
  n := SQRT (f);
END LOOP;
 END;

 PostgreSQL

 DO LANGUAGE plpgsql $$ DECLARE n real;
 DECLARE f integer;
 BEGIN
 FOR f IN 1..1000 LOOP
 n = SQRT (f);
 END LOOP;
 RAISE NOTICE 'Result = %',n;
 END $$;

 oracle adding if

 SET SERVEROUTPUT ON
 SET TIMING ON

 DECLARE
n NUMBER := 0;
 BEGIN
FOR f IN 1..1000
 LOOP
if 0 =0 then
  n := SQRT (f);
  end if;
END LOOP;

 postgres adding if

 DO LANGUAGE plpgsql $$ DECLARE n real;
 DECLARE f integer;
 BEGIN
 FOR f IN 1..1000 LOOP
 if 0=0 then
 n = SQRT (f);
 end if;
 END LOOP;
 RAISE NOTICE 'Result = %',n;
 END $$;

 I used an anonymous block for PostgreSQL. I also did it as a function and
 got identical results

 CREATE OR REPLACE FUNCTION testpostgrescpu()
RETURNS real AS
 $BODY$
 declare
   n real;
   f integer;

 BEGIN
 FOR f IN 1..1000 LOOP
  n = SQRT (f);
 END LOOP;


 RETURN n;
 END;
 $BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
 ALTER FUNCTION testpostgrescpu()
OWNER TO xxx

 Based on what we had heard of PostgreSQL and how it is comparable to Oracle
 in many ways, we were taken aback by the results. Did we code PostgreSQL
 incorrectly? What are we missing or is this the way it is.

 Note: once we started running queries on the exact same data in Oracle and
 PostgreSQL we saw a similar pattern. On basic queries little difference, but
 as they started to get more and more complex Oracle was around 3-5 faster.

 Again, this was run on identical AWS RDS instances, we ran them many times
 during the day on different days and results were always the same







Looking at this guy:

DO LANGUAGE plpgsql $$ DECLARE n real;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
   n = SQRT (f);
END LOOP;
RAISE NOTICE 'Result = %',n;
END $$;

Takes about 12s with with Postgres 9.4 running on Ubuntu 14.04 hosted on
real HW (Intel i7).

Changing n to be float8 rather than real, i.e:

DO LANGUAGE plpgsql $$ DECLARE n float8;
DECLARE f integer;
BEGIN
FOR f IN 1..1000 LOOP
   n = SQRT (f);
END LOOP;
RAISE NOTICE 'Result = %',n;
END $$;

...time drops to about 2s (which I'm guessing would get it to about
Oracle speed on your EC2 setup).

The moral of the story for this case is that mapping Oracle to Postgres
datatypes can require some careful thought. Using 'native' types (like
integer, float8 etc) will generally give vastly quicker performance.


Adding in the 'if' in the float8 case increases run time to 4s. So looks
like plpgsql might have a slightly higher cost for handling added
conditionals. Be interesting to dig a bit more and see what is taking
the time.

Regards

Mark



--
Sent via pgsql-hackers mailing list ([hidden 
email]/user/SendEmail.jtp?type=nodenode=5813763i=0)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


If you reply to this email, your message will be added to the discussion below:
http://postgresql.1045698.n5.nabble.com/PostrgeSQL-vs-oracle-doing-1-million-sqrts-am-I-doing-it-wrong-tp5813732p5813763.html
To unsubscribe from PostrgeSQL vs oracle doing 1 million sqrts am I doing it 
wrong?, click 
herehttp://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=5813732code=ZGFuaWxvLnJhbWlyZXpAaG1oY28uY29tfDU4MTM3MzJ8LTE4ODkzODQwNDc=.

Re: [HACKERS] Scaling shared buffer eviction

2014-08-05 Thread Robert Haas
On Thu, Jun 5, 2014 at 4:43 AM, Amit Kapila amit.kapil...@gmail.com wrote:

 I have improved the patch  by making following changes:
 a. Improved the bgwriter logic to log for xl_running_xacts info and
 removed the hibernate logic as bgwriter will now work only when
 there is scarcity of buffer's in free list. Basic idea is when the
 number of buffers on freelist drops below the low threshold, the
 allocating backend sets the latch and bgwriter wakesup and begin

 adding buffer's to freelist until it reaches high threshold and then

 again goes back to sleep.


This essentially removes BgWriterDelay, but it's still mentioned in
BgBufferSync().  Looking further, I see that with the patch applied,
BgBufferSync() is still present in the source code but is no longer called
from anywhere.  Please don't submit patches that render things unused
without actually removing them; it makes it much harder to see what you've
changed.  I realize you probably left it that way for testing purposes, but
you need to clean such things up before submitting.  Likewise, if you've
rendered GUCs or statistics counters removed, you need to rip them out, so
that the scope of the changes you've made is clear to reviewers.

A comparison of BgBufferSync() with BgBufferSyncAndMoveBuffersToFreelist()
reveals that you've removed at least one behavior that some people (at
least, me) will care about, which is the guarantee that the background
writer will scan the entire buffer pool at least every couple of minutes.
This is important because it guarantees that dirty data doesn't sit in
memory forever.  When the system becomes busy again after a long idle
period, users will expect the system to have used the idle time to flush
dirty buffers to disk.  This also improves data recovery prospects if, for
example, somebody loses their pg_xlog directory - there may be dirty
buffers whose contents are lost, of course, but they won't be months old.

b.  New stats for number of buffers on freelist has been added, some
  old one's like maxwritten_clean can be removed as new logic for
  syncing buffers and moving them to free list doesn't use them.
  However I think it's better to remove them once the new logic is
  accepted.  Added some new logs for info related to free list under
  BGW_DEBUG


If I'm reading this right, the new statistic is an incrementing counter
where, every time you update it, you add the number of buffers currently on
the freelist.  That makes no sense.  I think what you should be counting is
the number of allocations that are being satisfied from the free-list.
Then, by comparing the rate at which that value is incrementing to the rate
at which buffers_alloc is incrementing, somebody can figure out what
percentage of allocations are requiring a clock-sweep run.  Actually, I
think it's better to flip it around: count the number of allocations that
require an individual backend to run the clock sweep (vs. being satisfied
from the free-list); call it, say, buffers_backend_clocksweep.  We can then
try to tune the patch to make that number as small as possible under
varying workloads.

c.  Used the already existing bgwriterLatch in BufferStrategyControl to
  wake bgwriter when number of buffer's in freelist drops below
  threshold.


Seems like a good idea.


 d.  Autotune the low and high threshold for freelist for various
  configurations.  Generally if keep small number (200~2000) of buffers
  always available on freelist, then even for high shared buffers
  like 15GB, it appears to be sufficient.  However when the value
  of shared buffer's is less, then we need much smaller number.  I
  think we can provide these as config knobs for user as well, but for
  now based on LWLOCK_STATS result, I have chosen some hard
  coded values for low and high threshold values for freelist.
  Values for low and high threshold have been decided based on total
  number of shared buffers, basically I have divided them into 5
  categories (16~100, 100~1000,  1000~1, 1~10,
  10 and above) and then ran tests(read-only pgbench) for various
  configurations falling under these categories.  The reason for keeping
  lesser categories for larger shared buffers is that if there are small
  number (200~2000) of buffers available on free list, then it seems to
  be sufficient for quite high loads, however as the total number of
 shared
  buffer's decreases we need to be more careful as if we keep the
 number as
  too low then it will lead to more clock sweep by backends (which means
  freelist lock contention) and if we keep number higher bgwriter will
 evict
  many useful buffers.  Results based on LWLOCK_STATS is at end of mail.


I think we need to come up with some kind of formula here rather than just
a list of hard-coded constants.  And it definitely needs some comments
explaining the logic behind the choices.


Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-08-05 Thread Robert Haas
On Sun, Jul 6, 2014 at 3:12 PM, Andres Freund and...@2ndquadrant.com wrote:
  If you want to do that, it's fine with me.  What I would do is:
 
  - Back-patch the addition of the sparcv8+ stuff all the way.  If
  anyone's running anything older, let them complain...
  - Remove the special case for MIPS without gcc intrinsics only in
  master, leaving the back-branches broken.  If anyone cares, let them
  complain...
  - Nothing else.

 I've gone ahead and done the second of these things.

 Thanks.

 Andres, do you want to go take a stab at fixing the SPARC stuff?

 Will do, will probably take me till thursday to come up with the brain
 cycles.

Ping?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-05 Thread Robert Haas
On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo
daniele.varra...@gmail.com wrote:
 I'd definitely replace /arg/argument/. Furthermore I'd avoid the form
 argument 1: something is wrong: the string is likely to end up in
 sentences with other colons so I'd rather use something is wrong at
 argument 1.

 Is the patch attached better?

Looks OK to me.  I thought someone else might comment, but since no
one has, committed.

(As a note for the future, you forgot to update the regression test
outputs; I took care of that before committing.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Jerry Sievers
Using the 'include' keyword in postgresql.conf let's us do groovy things
like paste together pieces of  general purpose configs into the working file.

-- But since all we can do is set/reset the parameters the possibility
   of concatenating fragments that use some of the list parameters won't
   work.

To wit; suppose we had a fragment for enabling auto_explain and another
for enabling pg_stat_statements.

Both of these require us to put something in shared_preload_libraries
and both make use of custom_variable_classes.

Just FWIW...  Something like...

shared_preload_libraries += auto_explain

Would do the trick.

I've  never heard this mentioned before so presume not many have
contemplated this.

Comments?

Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 4 August 2014 19:30, Claudio Freire klaussfre...@gmail.com wrote:
 On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
I really like the proposal of working on a block level incremental
 backup feature and the idea of considering LSN. However, I'd suggest
 to see block level as a second step and a goal to keep in mind while
 working on the first step. I believe that file-level incremental
 backup will bring a lot of benefits to our community and users anyway.

 Thing is, I don't see how the LSN method is that much harder than an
 on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.

 Keeping a last-updated-LSN for each segment (or group of blocks) is
 just as easy as keeping a bitmap, and far more flexible and robust.

 The complexity and cost of safely keeping the map up-to-date is what's
 in question here, but as was pointed before, there's no really safe
 alternative. Nor modification times nor checksums (nor in-memory
 bitmaps IMV) are really safe enough for backups, so you really want to
 use something like the LSN. It's extra work, but opens up a world of
 possibilities.

OK, some comments on all of this.

* Wikipedia thinks the style of backup envisaged should be called Incremental
https://en.wikipedia.org/wiki/Differential_backup

* Base backups are worthless without WAL right up to the *last* LSN
seen during the backup, which is why pg_stop_backup() returns an LSN.
This is the LSN that is the effective viewpoint of the whole base
backup. So if we wish to get all changes since the last backup, we
must re-quote this LSN. (Or put another way - file level LSNs don't
make sense - we just need one LSN for the whole backup).

* When we take an incremental backup we need the WAL from the backup
start LSN through to the backup stop LSN. We do not need the WAL
between the last backup stop LSN and the new incremental start LSN.
That is a huge amount of WAL in many cases and we'd like to avoid
that, I would imagine. (So the space savings aren't just the delta
from the main data files, we should also look at WAL savings).

* For me, file based incremental is a useful and robust feature.
Block-level incremental is possible, but requires either significant
persistent metadata (1 MB per GB file) or access to the original
backup. One important objective here is to make sure we do NOT have to
re-read the last backup when taking the next backup; this helps us to
optimize the storage costs for backups. Plus, block-level recovery
requires us to have a program that correctly re-writes data into the
correct locations in a file, which seems likely to be a slow and bug
ridden process to me. Nice, safe, solid file-level incremental backup
first please. Fancy, bug prone, block-level stuff much later.

* One purpose of this could be to verify the backup. rsync provides a
checksum, pg_basebackup does not. However, checksums are frequently
prohibitively expensive, so perhaps asking for that is impractical and
maybe only a secondary objective.

* If we don't want/have file checksums, then we don't need a profile
file and using just the LSN seems fine. I don't think we should
specify that manually - the correct LSN is written to the backup_label
file in a base backup and we should read it back from there. We should
also write a backup_label file to incremental base backups, then we
can have additional lines saying what the source backups were. So full
base backup backup_labels remain as they are now, but we add one
additional line per increment, so we have the full set of increments,
much like a history file.

Normal backup_label files look like this

START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo

so we would have a file that looks like this

START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo
INCREMENTAL 1
START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo incremental 1
INCREMENTAL 2
START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: 
LABEL: foo incremental 2
... etc ...

which we interpret as showing the original base backup, then the first
increment, then the second increment etc.. which allows us to recover
the backups in the correct sequence.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Robert Haas
On Sat, Aug 2, 2014 at 6:58 PM, Peter Geoghegan p...@heroku.com wrote:
 On Sat, Aug 2, 2014 at 2:45 AM, Peter Geoghegan p...@heroku.com wrote:
 I'll post something over the weekend.

 Attached is a cumulative pair of patches generated using
 git-format-patch. I refer anyone who wants to know how the two parts
 fit together to the commit messages of each patch. In passing, I have
 added a reference to the MIT License as outlined by Noah.

OK, I have taken a look at patch 1.  You write:

+ * As a general principle, operator classes with a cataloged sort support
+ * function are expected to provide sane sort support state, including a
+ * function pointer comparator.  Rather than revising that principle, just
+ * setup a shim for the WIN32 UTF-8 and non-C collation special case here.

...but I'm wondering what underlies that decision.   I would
understand the decision to go that way if it simplified things
elsewhere, but in fact it seems that's what underlies the addition of
ssup_operator to SortSupportData, which in turn requires a number of
changes elsewhere.  The upshot of those changes is that it makes it
possible to write bttext_inject_shim, but AFAICS that's just
recapitulating what get_sort_function_for_ordering_op and
PrepareSortSupportFromOrderingOp are already doing.  Any material
change to either of those functions will have to be reflected in
bttext_inject_shim; and if some opclass other than text wants to
provide a sortsupport shim that supplies a comparator only sometimes,
it will need its own copy of the logic.

So I think it's better to just change the sortsupport contract so that
filling in the comparator is optional.  Patch for that attached.
Objections?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
 src/backend/executor/nodeMergejoin.c |8 ++---
 src/backend/utils/cache/lsyscache.c  |   56 --
 src/backend/utils/sort/sortsupport.c |   42 ++---
 src/include/utils/lsyscache.h|2 --
 4 files changed, 35 insertions(+), 73 deletions(-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote:
 ...but I'm wondering what underlies that decision.   I would
 understand the decision to go that way if it simplified things
 elsewhere, but in fact it seems that's what underlies the addition of
 ssup_operator to SortSupportData, which in turn requires a number of
 changes elsewhere.

The changes aren't too invasive. There is exactly one place where it
isn't a trivial matter of storing the operator that was already
available:

+ /* Original operator must be provided */
+ clause-ssup.ssup_operator = get_opfamily_member(opfamily,
+
 op_lefttype,
+
 op_righttype,
+
 opstrategy);

 So I think it's better to just change the sortsupport contract so that
 filling in the comparator is optional.  Patch for that attached.
 Objections?

I'd have preferred to maintain the obligation for some sane
sortsupport state to be provided. It's not as if I feel too strongly
about it, though.

You attached git diff --stat output, and not an actual patch. Please re-send.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote:
 and if some opclass other than text wants to
 provide a sortsupport shim that supplies a comparator only sometimes,
 it will need its own copy of the logic.

That's true, but my proposal to do things that way reflects the fact
that text is a type oddly tied to the platform. I don't think it will
come up again (note that in the 4 byte Datum case, we still use sort
support to some degree on other platforms with patch 2 applied). It
seemed logical to impose the obligation to deal with that on
varlena.c.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed changing the definition of decade for date_trunc and extract

2014-08-05 Thread Robert Haas
On Sun, Aug 3, 2014 at 3:29 PM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
 So it would probably be a good idea to mention in the relevant
 documentation, that there was no Year Zero, and that 1 AD follows directly
 after 1 BC.

Well, maybe.  By and large, the PostgreSQL documentation should
confine itself to documenting facts about PostgreSQL rather than, say,
facts about date reckoning, except to the extent that PostgreSQL does
something different from the universal norms.  I mean, we could
document the rules for leap years, too, or how many days there are in
each month, or that many people who used the Julian calendar
considered the first day of the year to be March 25th, but really,
those are things that mostly deserve to be mentioned in a treatise on
historical calendaring rather than our documentation, unless there is
some PostgreSQL-specific treatment that requires expounding on them.

On the original patch, like a few others who have spoken, I think
changing the current behavior is a bad idea.  I certainly respect
Mike's desire to get the behavior right, and to set things up in the
way that makes sense to him, but if we start changing things like this
on the basis of a small number of complaints, we will piss off a lot
of users.  There are many people out there who have thousands upon
thousands of lines of PostgreSQL code that they can't easily audit and
change, and the standard for doing things that might break that code
in subtle ways is, and should be, high.  If what we were doing today
was outright wrong, of course I'd be in favor of changing it.  But the
current behavior isn't at all unreasonable; there are two possible
definitions and we picked one.  In that kind of situation, backward
compatibility concerns carry a great deal of weight.  Even if, in a
vacuum, more users would have chosen to do it as Mike proposes vs.
what we actually picked, we'll make enough people unhappy by changing
it now to make it a poor decision for the project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Robert Haas
On Tue, Aug 5, 2014 at 3:15 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Aug 5, 2014 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote:
 and if some opclass other than text wants to
 provide a sortsupport shim that supplies a comparator only sometimes,
 it will need its own copy of the logic.

 That's true, but my proposal to do things that way reflects the fact
 that text is a type oddly tied to the platform. I don't think it will
 come up again (note that in the 4 byte Datum case, we still use sort
 support to some degree on other platforms with patch 2 applied). It
 seemed logical to impose the obligation to deal with that on
 varlena.c.

Per your other email, here's the patch again; hopefully I've got the
right stuff in the file this time.

On this point, I'm all for confining knowledge of things to a
particular module to that module.  However, in this particular case, I
don't believe that there's anything specific to varlena.c in
bttext_inject_shim(); it's a cut down version of the combination of
functions that appear in other modules, and there's nothing to
encourage someone who modifies one of those functions to also update
varlena.c.  Indeed, people developing on non-Windows platforms won't
even be compiling that function, so it would be easy for most of us to
miss the need for an update.  So I argue that my approach is keeping
this knowledge more localized.

I'm also not sure it won't come up again.  There are certainly other
text-like datatypes out there that might want to optimize sorts; e.g.
citext.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index bc036a3..fdf2f4c 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -230,19 +230,19 @@ MJExamineQuals(List *mergeclauses,
  qual-opno);
 
 		/* And get the matching support or comparison function */
+		Assert(clause-ssup.comparator == NULL);
 		sortfunc = get_opfamily_proc(opfamily,
 	 op_lefttype,
 	 op_righttype,
 	 BTSORTSUPPORT_PROC);
 		if (OidIsValid(sortfunc))
 		{
-			/* The sort support function should provide a comparator */
+			/* The sort support function can provide a comparator */
 			OidFunctionCall1(sortfunc, PointerGetDatum(clause-ssup));
-			Assert(clause-ssup.comparator != NULL);
 		}
-		else
+		if (clause-ssup.comparator == NULL)
 		{
-			/* opfamily doesn't provide sort support, get comparison func */
+			/* support not available, get comparison func */
 			sortfunc = get_opfamily_proc(opfamily,
 		 op_lefttype,
 		 op_righttype,
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 4b5ef99..552e498 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -246,62 +246,6 @@ get_ordering_op_properties(Oid opno,
 }
 
 /*
- * get_sort_function_for_ordering_op
- *		Get the OID of the datatype-specific btree sort support function,
- *		or if there is none, the btree comparison function,
- *		associated with an ordering operator (a  or  operator).
- *
- * *sortfunc receives the support or comparison function OID.
- * *issupport is set TRUE if it's a support func, FALSE if a comparison func.
- * *reverse is set FALSE if the operator is , TRUE if it's 
- * (indicating that comparison results must be negated before use).
- *
- * Returns TRUE if successful, FALSE if no btree function can be found.
- * (This indicates that the operator is not a valid ordering operator.)
- */
-bool
-get_sort_function_for_ordering_op(Oid opno, Oid *sortfunc,
-  bool *issupport, bool *reverse)
-{
-	Oid			opfamily;
-	Oid			opcintype;
-	int16		strategy;
-
-	/* Find the operator in pg_amop */
-	if (get_ordering_op_properties(opno,
-   opfamily, opcintype, strategy))
-	{
-		/* Found a suitable opfamily, get matching support function */
-		*sortfunc = get_opfamily_proc(opfamily,
-	  opcintype,
-	  opcintype,
-	  BTSORTSUPPORT_PROC);
-		if (OidIsValid(*sortfunc))
-			*issupport = true;
-		else
-		{
-			/* opfamily doesn't provide sort support, get comparison func */
-			*sortfunc = get_opfamily_proc(opfamily,
-		  opcintype,
-		  opcintype,
-		  BTORDER_PROC);
-			if (!OidIsValid(*sortfunc)) /* should not happen */
-elog(ERROR, missing support function %d(%u,%u) in opfamily %u,
-	 BTORDER_PROC, opcintype, opcintype, opfamily);
-			*issupport = false;
-		}
-		*reverse = (strategy == BTGreaterStrategyNumber);
-		return true;
-	}
-
-	/* ensure outputs are set on failure */
-	*sortfunc = InvalidOid;
-	*issupport = false;
-	*reverse = false;
-	return false;
-}
-
-/*
  * get_equality_op_for_ordering_op
  *		Get the OID of the datatype-specific btree equality operator
  *		associated with an ordering operator (a  or  operator).
diff --git a/src/backend/utils/sort/sortsupport.c 

Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-05 Thread Robert Haas
On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
 This patch is pretty trivial.
 Another slightly less trivial but more useful version.

 The issue is that there are 3 definitions of modulo, two of which are fine
 (Knuth floored division and Euclidian), and the last one much less useful.
 Alas, C (%)  SQL (MOD) choose the bad definition:-( I really need any of
 the other two. The attached patch adds all versions, with % and mod
 consistent with their C and SQL unfortunate counterparts, and fmod and
 emod the sane ones.

Three different modulo operators seems like a lot for a language that
doesn't even have a real expression syntax, but I'll yield to whatever
the consensus is on this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL regression test suite

2014-08-05 Thread Robert Haas
On Mon, Aug 4, 2014 at 10:38 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Now that we use TAP for testing client tools, I think we can use that to
 test various SSL options too. I came up with the attached. Comments?

 It currently assumes that the client's and the server's hostnames are
 postgres-client.test and postgres-server.test, respectively. That makes
 it a bit tricky to run on a single systme. The README includes instructions;
 basically you need to set up an additional loopback device, and add entries
 to /etc/hosts for that.

That seems so onerous that I think few people will do it, and not
regularly, resulting in the tests breaking and nobody noticing.
Reconfiguring the loopback interface like that requires root
privilege, and won't survive a reboot, and doing it in the system
configuration will require hackery specific to the particular flavor
of Linux you're running, and probably other hackery on non-Linux
systems (never mind Windows).  Why can't you make it work over
127.0.0.1?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing PG_RETURN_UINT16

2014-08-05 Thread Robert Haas
On Mon, Aug 4, 2014 at 11:35 AM, Manuel Kniep m.kn...@web.de wrote:
 I’m missing the PG_RETURN_UINT16 macro in fmgr.h
 Since we already have the PG_GETARG_UINT16 macro
 I guess it makes sense to to have it.

 here is the trivial patch for it.

I see no reason not to add this.  Anyone else want to object?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Josh Berkus
On 08/05/2014 11:12 AM, Jerry Sievers wrote:
 shared_preload_libraries += auto_explain
 
 Would do the trick.
 
 I've  never heard this mentioned before so presume not many have
 contemplated this.

It's been discussed.

However, other than shared_preload_libraries, I can't think of a GUC
which would benefit from this.  Can you?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 12:33 PM, Robert Haas robertmh...@gmail.com wrote:
 Per your other email, here's the patch again; hopefully I've got the
 right stuff in the file this time.

Your patch looks fine to me. I recommend committing patch 1 with these
additions. I can't think of any reason to rehash the 2012 discussion.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Bruce Momjian
On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
 On 08/05/2014 11:12 AM, Jerry Sievers wrote:
  shared_preload_libraries += auto_explain
  
  Would do the trick.
  
  I've  never heard this mentioned before so presume not many have
  contemplated this.
 
 It's been discussed.
 
 However, other than shared_preload_libraries, I can't think of a GUC
 which would benefit from this.  Can you?

search_path?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 3:23 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 4 August 2014 19:30, Claudio Freire klaussfre...@gmail.com wrote:
 On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
I really like the proposal of working on a block level incremental
 backup feature and the idea of considering LSN. However, I'd suggest
 to see block level as a second step and a goal to keep in mind while
 working on the first step. I believe that file-level incremental
 backup will bring a lot of benefits to our community and users anyway.

 Thing is, I don't see how the LSN method is that much harder than an
 on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.

 Keeping a last-updated-LSN for each segment (or group of blocks) is
 just as easy as keeping a bitmap, and far more flexible and robust.

 The complexity and cost of safely keeping the map up-to-date is what's
 in question here, but as was pointed before, there's no really safe
 alternative. Nor modification times nor checksums (nor in-memory
 bitmaps IMV) are really safe enough for backups, so you really want to
 use something like the LSN. It's extra work, but opens up a world of
 possibilities.

 OK, some comments on all of this.

 * Wikipedia thinks the style of backup envisaged should be called 
 Incremental
 https://en.wikipedia.org/wiki/Differential_backup

 * Base backups are worthless without WAL right up to the *last* LSN
 seen during the backup, which is why pg_stop_backup() returns an LSN.
 This is the LSN that is the effective viewpoint of the whole base
 backup. So if we wish to get all changes since the last backup, we
 must re-quote this LSN. (Or put another way - file level LSNs don't
 make sense - we just need one LSN for the whole backup).

File-level LSNs are an optimization. When you want to backup all files
modified since the last base or incremental backup (yes, you need the
previous backup label at least), you check the file-level LSN range.
That tells you which changesets touched that file, so you know
whether to process it or not.

Block-level LSNs (or, rather, block-segment-level) are just a
refinement of that.

 * When we take an incremental backup we need the WAL from the backup
 start LSN through to the backup stop LSN. We do not need the WAL
 between the last backup stop LSN and the new incremental start LSN.
 That is a huge amount of WAL in many cases and we'd like to avoid
 that, I would imagine. (So the space savings aren't just the delta
 from the main data files, we should also look at WAL savings).

Yes, probably something along the lines of removing redundant FPW and
stuff like that.

 * For me, file based incremental is a useful and robust feature.
 Block-level incremental is possible, but requires either significant
 persistent metadata (1 MB per GB file) or access to the original
 backup. One important objective here is to make sure we do NOT have to
 re-read the last backup when taking the next backup; this helps us to
 optimize the storage costs for backups. Plus, block-level recovery
 requires us to have a program that correctly re-writes data into the
 correct locations in a file, which seems likely to be a slow and bug
 ridden process to me. Nice, safe, solid file-level incremental backup
 first please. Fancy, bug prone, block-level stuff much later.

Ok. You could do incremental first without any kind of optimization,
then file-level optimization by keeping a file-level LSN range, and
then extend that to block-segment-level LSN ranges. That sounds like a
plan to me.

But, I don't see how you'd do the one without optimization without
reading the previous backup for comparing deltas. Remember checksums
are deemed not trustworthy, not just by me, so that (which was the
original proposition) doesn't work.

 * If we don't want/have file checksums, then we don't need a profile
 file and using just the LSN seems fine. I don't think we should
 specify that manually - the correct LSN is written to the backup_label
 file in a base backup and we should read it back from there.

Agreed


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-05 Thread Alvaro Herrera
Robert Haas wrote:
 On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:
  This patch is pretty trivial.
  Another slightly less trivial but more useful version.
 
  The issue is that there are 3 definitions of modulo, two of which are fine
  (Knuth floored division and Euclidian), and the last one much less useful.
  Alas, C (%)  SQL (MOD) choose the bad definition:-( I really need any of
  the other two. The attached patch adds all versions, with % and mod
  consistent with their C and SQL unfortunate counterparts, and fmod and
  emod the sane ones.
 
 Three different modulo operators seems like a lot for a language that
 doesn't even have a real expression syntax, but I'll yield to whatever
 the consensus is on this one.

I wonder if it would be necessary to offer the division operator
semantics corresponding to whatever additional modulo operator we choose
to offer.  That is, if we add emod, do we need ediv as well?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Alvaro Herrera
Bruce Momjian wrote:
 On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
  On 08/05/2014 11:12 AM, Jerry Sievers wrote:
   shared_preload_libraries += auto_explain
   
   Would do the trick.
   
   I've  never heard this mentioned before so presume not many have
   contemplated this.
  
  It's been discussed.
  
  However, other than shared_preload_libraries, I can't think of a GUC
  which would benefit from this.  Can you?
 
 search_path?

Almost anything that's marked GUC_LIST_INPUT.  A quick grep shows some
variables for which it doesn't make sense, such as DateStyle, and others
for which it would, such as temp_tablespaces, synchronous_standby_names,
listen_addresses, and the various preload_libraries params.  Not sure
about log_destination.

I think this merits a new GUC flag, say GUC_LIST_ADDITIVE.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixed redundant i18n strings in json

2014-08-05 Thread Jeff Janes
On Tue, Aug 5, 2014 at 9:35 AM, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo
 daniele.varra...@gmail.com wrote:
  I'd definitely replace /arg/argument/. Furthermore I'd avoid the form
  argument 1: something is wrong: the string is likely to end up in
  sentences with other colons so I'd rather use something is wrong at
  argument 1.
 
  Is the patch attached better?

 Looks OK to me.  I thought someone else might comment, but since no
 one has, committed.

 (As a note for the future, you forgot to update the regression test
 outputs; I took care of that before committing.)


I think you missed one of the regression tests, see attached

Thanks,

Jeff


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 12:33 PM, Robert Haas robertmh...@gmail.com wrote:
 I'm also not sure it won't come up again.  There are certainly other
 text-like datatypes out there that might want to optimize sorts; e.g.
 citext.

Fair enough. Actually, come to think of it I find BpChar/character(n)
a far more likely candidate. I've personally never used the SQL
standard character(n) type, which is why I didn't think of it until
now. TPC-H does make use of character(n) though, and that might be a
good reason in and of itself to care about its sorting performance.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-08-05 Thread Bruce Momjian
On Thu, Jul 10, 2014 at 06:38:26PM -0400, Bruce Momjian wrote:
 On Thu, Jul 10, 2014 at 06:17:14PM -0400, Bruce Momjian wrote:
  Well, we are going to need to call internal C functions, often bypassing
  their typical call sites and the assumption about locking, etc.  Perhaps
  this could be done from a plpgsql function.  We could add and drop a
  dummy column to force TOAST table creation --- we would then only need a
  way to detect if a function _needs_ a TOAST table, which was skipped in
  binary upgrade mode previously.
  
  That might be a minimalistic approach.
 
 I have thought some more on this.  I thought I would need to open
 pg_class in C and do complex backend stuff, but I now realize I can do
 it from libpq, and just call ALTER TABLE and I think that always
 auto-checks if a TOAST table is needed.  All I have to do is query
 pg_class from libpq, then construct ALTER TABLE commands for each item,
 and it will optionally create the TOAST table if needed.  I just have to
 use a no-op ALTER TABLE command, like SET STATISTICS.

Attached is a completed patch which handles oid conflicts in pg_class
and pg_type for TOAST tables that were not needed in the old cluster but
are needed in the new one.  I was able to recreate a failure case and
this fixed it.

The patch need to be backpatched because I am getting more-frequent bug
reports from users using pg_upgrade to leave now-end-of-life'ed 8.4. 
There is not a good work-around for pg_upgrade failures without this
fix, but at least pg_upgrade throws an error.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c
new file mode 100644
index b112f3a..642dd5d
*** a/contrib/pg_upgrade/dump.c
--- b/contrib/pg_upgrade/dump.c
***
*** 12,17 
--- 12,19 
  #include pg_upgrade.h
  
  #include sys/types.h
+ #include catalog/binary_upgrade.h
+ 
  
  void
  generate_old_dump(void)
*** generate_old_dump(void)
*** 67,69 
--- 69,139 
  	end_progress_output();
  	check_ok();
  }
+ 
+ 
+ /*
+  * It is possible for there to be a mismatch in the need for TOAST tables
+  * between the old and new servers, e.g. some pre-9.1 tables didn't need
+  * TOAST tables but will need them in 9.1+.  (There are also opposite cases,
+  * but these are handled by setting binary_upgrade_next_toast_pg_class_oid.)
+  *
+  * We can't allow the TOAST table to be created by pg_dump with a
+  * pg_dump-assigned oid because it might conflict with a later table that
+  * uses that oid, causing a file exists error for pg_class conflicts, and
+  * a duplicate oid error for pg_type conflicts.  (TOAST tables need pg_type
+  * entries.)
+  *
+  * Therefore, a backend in binary-upgrade mode will not create a TOAST
+  * table unless an OID as passed in via pg_upgrade_support functions.
+  * This function is called after the restore and uses ALTER TABLE to
+  * auto-create any needed TOAST tables which will not conflict with
+  * restored oids.
+  */
+ void
+ optionally_create_toast_tables(void)
+ {
+ 	int			dbnum;
+ 
+ 	prep_status(Creating newly-required TOAST tables);
+ 
+ 	for (dbnum = 0; dbnum  new_cluster.dbarr.ndbs; dbnum++)
+ 	{
+ 		PGresult   *res;
+ 		int			ntups;
+ 		int			rowno;
+ 		int			i_nspname,
+ 	i_relname;
+ 		DbInfo	   *active_db = new_cluster.dbarr.dbs[dbnum];
+ 		PGconn	   *conn = connectToServer(new_cluster, active_db-db_name);
+ 
+ 		res = executeQueryOrDie(conn,
+ SELECT n.nspname, c.relname 
+ FROM	pg_catalog.pg_class c, 
+ 		pg_catalog.pg_namespace n 
+ WHERE	c.relnamespace = n.oid AND 
+ 			  		n.nspname NOT IN ('pg_catalog', 'information_schema') AND 
+ 			  	c.relkind IN ('r', 'm') AND 
+ c.reltoastrelid = 0);
+ 
+ 		ntups = PQntuples(res);
+ 		i_nspname = PQfnumber(res, nspname);
+ 		i_relname = PQfnumber(res, relname);
+ 		for (rowno = 0; rowno  ntups; rowno++)
+ 		{
+ 			/* enable auto-oid-numbered TOAST creation if needed */
+ 			PQclear(executeQueryOrDie(conn, SELECT binary_upgrade.set_next_toast_pg_class_oid('%d'::pg_catalog.oid);,
+ 	OPTIONALLY_CREATE_TOAST_OID));
+ 
+ 			/* dummy command that also triggers check for required TOAST table */
+ 			PQclear(executeQueryOrDie(conn, ALTER TABLE %s.%s RESET (binary_upgrade_dummy_option);,
+ 	quote_identifier(PQgetvalue(res, rowno, i_nspname)),
+ 	quote_identifier(PQgetvalue(res, rowno, i_relname;
+ 		}
+ 
+ 		PQclear(res);
+ 
+ 		PQfinish(conn);
+ 	}
+ 
+ 	check_ok();
+ }
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index 3e97b66..49ea873
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*** create_new_objects(void)
*** 363,368 
--- 363,370 
  	if (GET_MAJOR_VERSION(old_cluster.major_version)  903)
  		set_frozenxids(true);
  
+ 	

Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread David G Johnston
Alvaro Herrera-9 wrote
 Bruce Momjian wrote:
 On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
  On 08/05/2014 11:12 AM, Jerry Sievers wrote:
   shared_preload_libraries += auto_explain
   
   Would do the trick.
   
   I've  never heard this mentioned before so presume not many have
   contemplated this.
  
  It's been discussed.
  
  However, other than shared_preload_libraries, I can't think of a GUC
  which would benefit from this.  Can you?
 
 search_path?
 
 Almost anything that's marked GUC_LIST_INPUT.  A quick grep shows some
 variables for which it doesn't make sense, such as DateStyle, and others
 for which it would, such as temp_tablespaces, synchronous_standby_names,
 listen_addresses, and the various preload_libraries params.  Not sure
 about log_destination.
 
 I think this merits a new GUC flag, say GUC_LIST_ADDITIVE.

Would that allow, without any special syntax, multiple declarations of, say,
shared_preload_libraries, to have their values appended instead of the most
(or least, I forget which) recent one overwrite (or be ignored by) the
existing value?

The idea of requiring += instead of just = is not particularly
appealing...

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Append-to-a-GUC-parameter-tp5813811p5813842.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-05 Thread Josh Berkus
On 08/05/2014 04:41 PM, Alvaro Herrera wrote:
 I have chosen to keep the name minmax, even if the opclasses now let
 one implement completely different things on top of it such as geometry
 bounding boxes and bloom filters (aka bitmap indexes).  I don't see a
 need for a rename: essentially, in PR we can just say we have these
 neat minmax indexes that other databases also have, but instead of just
 being used for integer data, they can also be used for geometry, GIS and
 bitmap indexes, so as always we're more powerful than everyone else when
 implementing new database features.

Plus we haven't come up with a better name ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:

 * When we take an incremental backup we need the WAL from the backup
 start LSN through to the backup stop LSN. We do not need the WAL
 between the last backup stop LSN and the new incremental start LSN.
 That is a huge amount of WAL in many cases and we'd like to avoid
 that, I would imagine. (So the space savings aren't just the delta
 from the main data files, we should also look at WAL savings).

 Yes, probably something along the lines of removing redundant FPW and
 stuff like that.

Not what I mean at all, sorry for confusing.

Each backup has a start LSN and a stop LSN. You need all the WAL
between those two points (-X option)

But if you have an incremental backup (b2), it depends upon an earlier
backup (b1).

You don't need the WAL between b1.stop_lsn and b2.start_lsn.

In typical cases, start to stop will be a few hours or less, whereas
we'd be doing backups at most daily. Which would mean we'd only need
to store at most 10% of the WAL files because we don't need WAL
between backups.

 * For me, file based incremental is a useful and robust feature.
 Block-level incremental is possible, but requires either significant
 persistent metadata (1 MB per GB file) or access to the original
 backup. One important objective here is to make sure we do NOT have to
 re-read the last backup when taking the next backup; this helps us to
 optimize the storage costs for backups. Plus, block-level recovery
 requires us to have a program that correctly re-writes data into the
 correct locations in a file, which seems likely to be a slow and bug
 ridden process to me. Nice, safe, solid file-level incremental backup
 first please. Fancy, bug prone, block-level stuff much later.

 Ok. You could do incremental first without any kind of optimization,

Yes, that is what makes sense to me. Fast, simple, robust and most of
the benefit.

We should call this INCREMENTAL FILE LEVEL

 then file-level optimization by keeping a file-level LSN range, and
 then extend that to block-segment-level LSN ranges. That sounds like a
 plan to me.

Thinking some more, there seems like this whole store-multiple-LSNs
thing is too much. We can still do block-level incrementals just by
using a single LSN as the reference point. We'd still need a complex
file format and a complex file reconstruction program, so I think that
is still next release. We can call that INCREMENTAL BLOCK LEVEL

 But, I don't see how you'd do the one without optimization without
 reading the previous backup for comparing deltas. Remember checksums
 are deemed not trustworthy, not just by me, so that (which was the
 original proposition) doesn't work.

Every incremental backup refers to an earlier backup as a reference
point, which may then refer to an earlier one, in a chain.

Each backup has a single LSN associated with it, as stored in the
backup_label. (So we don't need the profile stage now, AFAICS)

To decide whether we need to re-copy the file, you read the file until
we find a block with a later LSN. If we read the whole file without
finding a later LSN then we don't need to re-copy. That means we read
each file twice, which is slower, but the file is at most 1GB in size,
we we can assume will be mostly in memory for the second read.

As Marco says, that can be optimized using filesystem timestamps instead.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A worst case for qsort

2014-08-05 Thread Peter Geoghegan
There was some informal discussion of worst case performance of my
text sort support patch at the most recent developer's meeting in
Ottawa. There is no need to repeat the details here, which aren't
terribly interesting - I made a point about putting worst case
performance in context. I also made an argument around quicksort,
which despite being a widely used sorting algorithm is known to go
quadratic in the worst case, a really bad outcome. I've heard stories
about GUIs that hit the worst case and froze when a user clicked on a
column header to sort its contents. This kind of thing occurred until
surprisingly recently, and was one reason why Postgres eventually
adopted its own qsort() rather than using the OS-supplied
implementation. A lot of the measures proposed by Sedgewick in
particular made these outcomes occur very infrequently in practice.

Our quicksort routine is almost a verbatim copy of the implementation
that appeared in NetBSD, which is itself almost a verbatim copy of the
implementation that appears in the 1993 paper by J. L. Bentley and M.
D. McIlroy, ‘Engineering a sort function’. McIlroy
(with some input from Bentley) subsequently described a way of making
many high quality implementations perform very badly, including his
own:

http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

I don't want to belabor the point here. I do not mean to imply that
everything I've proposed to date wrt sort support for text is
self-evidently reasonable, in the same way that our use of Bentley and
McIlroy's qsort() seems reasonable on balance. I expect some
interesting discussion there.

Anyway, the paper says:

The adversarial method works for almost any polymorphic program
recognizable as quicksort. The subject quicksort may copy values at
will, or work with lists rather than arrays. It may even pick the
pivot at random. The quicksort will be vulnerable provided only that
it satisfies some mild assumptions that are met by every
implementation I have seen.

IMHO, while worst case performance is a very useful tool for analyzing
algorithms (particularly their worst case time complexity), a worst
case should be put in its practical context. For example, if we had
reason to be concerned about *adversarial* inputs, I think that there
is a good chance that our qsort() actually would be problematic to the
point of driving us to prefer some generally slower alternative.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Michael Paquier
On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
 Thinking some more, there seems like this whole store-multiple-LSNs
 thing is too much. We can still do block-level incrementals just by
 using a single LSN as the reference point. We'd still need a complex
 file format and a complex file reconstruction program, so I think that
 is still next release. We can call that INCREMENTAL BLOCK LEVEL.

Yes, that's the approach taken by pg_rman for its block-level
incremental backup. Btw, I don't think that the CPU cost to scan all
the relation files added to the one to rebuild the backups is worth
doing it on large instances. File-level backup would cover most of the
use cases that people face, and simplify footprint on core code. With
a single LSN as reference position of course to determine if a file
needs to be backup up of course, if it has at least one block that has
been modified with a LSN newer than the reference point.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to debug into InitPostgres() and InitCatalogCache()?

2014-08-05 Thread 土卜皿
2014-08-05 22:08 GMT+08:00 土卜皿 pengcz.n...@gmail.com:

 hi, all
   I already can debug general postgres using ddd and select
 pg_backend_pid(); ,  now, I want to study the details of the system
 catalog cache and system cache management, so I need to debug the function
 InitPostgres() and InitCatalogCache(), and I tried the following steps:

 [refer: How to debug postgresql during initdb ]
 http://www.phacai.com/how-to-debug-postgresql-during-initdb

 (1) in first terminal:

 ddd initdb

 and set a breakpoint on the next executable line after “PG_CMD_OPEN” in
 function bootstrap_template1() ,

 Then within gdb, use “r $myfolder/mydb”,

 and found the cursor stop at the breakpoint

 (2) in the second terminal:

 ps aux | grep bin\/postgres

 after get the posgtres server pid, start a new ddd without args, and
 within gdb, using attach postgres.pid

 and set a breakpoint as InitCatalogCache()

 (3) back to the first ddd window, hit the cont button. I hope it can
 stop at the second ddd's breakpoint, but nothing happened?

 I must make mistake in some step, or my understanding is fully wrong,
 please give me some advice, thanks a lot



  I am sorry that this is a simple question.  I found that the above steps
make the posgres reach boot_yyparse() of BootstrapModeMain(void), which
means postgres has already finished the initialization including
InitCatalogCache(), and it was waiting query command or other user command .

  For debug initialization including system catalog cache, I add some code
like:

 bool forDebug = true;

while (forDebug){
  forDebug = true;
}

in the InitPosgres()'s starting position.








Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
 However, other than shared_preload_libraries, I can't think of a GUC
 which would benefit from this.  Can you?

 search_path?

Given the security implications of search_path, assembling its value
from independent parts sounds like a darn bad idea from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Alvaro Herrera
David G Johnston wrote:
 Alvaro Herrera-9 wrote

  I think this merits a new GUC flag, say GUC_LIST_ADDITIVE.
 
 Would that allow, without any special syntax, multiple declarations of, say,
 shared_preload_libraries, to have their values appended instead of the most
 (or least, I forget which) recent one overwrite (or be ignored by) the
 existing value?
 
 The idea of requiring += instead of just = is not particularly
 appealing...

I'm not sold on += as the syntax to use.  It just needs to be something
different from =.  But I am thinking that whatever it is, it would be
required.  Otherwise, for example if you have search_path=foo in
postgresql.conf and search_path=bar in auto.conf via ALTER SYSTEM, you
would end up with search_path=foo,bar instead of just bar which I think
would be the expected outcome.  Of course, we would offer a new ALTER
SYSTEM option to indicate that you want to have auto.conf use += instead
of =.

The idea behing GUC_LIST_ADDITIVE is that += is only supported for
variables that have that flag set.  If you tried to use += with other
variables, an error would be raised.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 9:17 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
 Thinking some more, there seems like this whole store-multiple-LSNs
 thing is too much. We can still do block-level incrementals just by
 using a single LSN as the reference point. We'd still need a complex
 file format and a complex file reconstruction program, so I think that
 is still next release. We can call that INCREMENTAL BLOCK LEVEL.

 Yes, that's the approach taken by pg_rman for its block-level
 incremental backup. Btw, I don't think that the CPU cost to scan all
 the relation files added to the one to rebuild the backups is worth
 doing it on large instances. File-level backup would cover most of the
 use cases that people face, and simplify footprint on core code. With
 a single LSN as reference position of course to determine if a file
 needs to be backup up of course, if it has at least one block that has
 been modified with a LSN newer than the reference point.


It's the finding of that block that begs optimizing IMO.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Claudio Freire
On Tue, Aug 5, 2014 at 9:04 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:

 * When we take an incremental backup we need the WAL from the backup
 start LSN through to the backup stop LSN. We do not need the WAL
 between the last backup stop LSN and the new incremental start LSN.
 That is a huge amount of WAL in many cases and we'd like to avoid
 that, I would imagine. (So the space savings aren't just the delta
 from the main data files, we should also look at WAL savings).

 Yes, probably something along the lines of removing redundant FPW and
 stuff like that.

 Not what I mean at all, sorry for confusing.

 Each backup has a start LSN and a stop LSN. You need all the WAL
 between those two points (-X option)

 But if you have an incremental backup (b2), it depends upon an earlier
 backup (b1).

 You don't need the WAL between b1.stop_lsn and b2.start_lsn.

 In typical cases, start to stop will be a few hours or less, whereas
 we'd be doing backups at most daily. Which would mean we'd only need
 to store at most 10% of the WAL files because we don't need WAL
 between backups.

I was assuming you wouldn't store that WAL. You might not even have it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Josh Berkus
On 08/05/2014 06:25 PM, Alvaro Herrera wrote:

 I'm not sold on += as the syntax to use.  It just needs to be something
 different from =. 

Alternative is to use + in the value string:

shared_preload_libraries = '+auto_explain'

... not sure that's better.

The alternative is to add an append keyword to everything, which
strikes me as yucky.

 But I am thinking that whatever it is, it would be
 required.  

Yes, absolutely.

 The idea behing GUC_LIST_ADDITIVE is that += is only supported for
 variables that have that flag set.  If you tried to use += with other
 variables, an error would be raised.

Yes.

BTW, while there's unlikely to be a good reason to put search_path in
pg.conf with appends, there are a LOT of reasons to want to be able to
append to it during a session.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 BTW, while there's unlikely to be a good reason to put search_path in
 pg.conf with appends, there are a LOT of reasons to want to be able to
 append to it during a session.

[shrug...]  You can do that today with current_setting()/set_config().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
 On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 
  On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
  Thinking some more, there seems like this whole store-multiple-LSNs
  thing is too much. We can still do block-level incrementals just by
  using a single LSN as the reference point. We'd still need a complex
  file format and a complex file reconstruction program, so I think that
  is still next release. We can call that INCREMENTAL BLOCK LEVEL.
 
 Yes, that's the approach taken by pg_rman for its block-level
 incremental backup. Btw, I don't think that the CPU cost to scan all
 the relation files added to the one to rebuild the backups is worth
 doing it on large instances. File-level backup would cover most of the

Well, if you scan the WAL files from the previous backup, that will tell
you what pages that need incremental backup.

I am thinking we need a wiki page to outline all these options.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Bruce Momjian
On Tue, Aug  5, 2014 at 06:36:01PM -0700, Josh Berkus wrote:
 On 08/05/2014 06:25 PM, Alvaro Herrera wrote:
 
  I'm not sold on += as the syntax to use.  It just needs to be something
  different from =. 
 
 Alternative is to use + in the value string:
 
 shared_preload_libraries = '+auto_explain'
 
 ... not sure that's better.
 
 The alternative is to add an append keyword to everything, which
 strikes me as yucky.

I think you need _prefix_ and _suffix_ operators.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Bruce Momjian
On Tue, Aug  5, 2014 at 07:22:13PM -0600, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Tue, Aug  5, 2014 at 12:52:51PM -0700, Josh Berkus wrote:
  However, other than shared_preload_libraries, I can't think of a GUC
  which would benefit from this.  Can you?
 
  search_path?
 
 Given the security implications of search_path, assembling its value
 from independent parts sounds like a darn bad idea from here.

I can imagine someone wanting to prefix/suffix to search_path for a user
via ALTER USER.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
I've looked at another (admittedly sympathetic) dataset that is
publicly available: the flickr tags dataset [1]. I end up with a
single table of tags; it's a large enough table, at 388 MB, but the
tuples are not very wide. There are 7,656,031 tags/tuples.

Predictably enough, this query is very fast when an internal sort is
used on a patched Postgres:

select * from (select * from tags order by tag offset 1) ii;

Git master takes about 25 seconds to execute the query. Patched takes
about 6.8 seconds. That seems very good, but this is not really new
information.

However, with work_mem set low enough to get an external sort, the
difference is more interesting. If I set work_mem to 10 MB, then the
query takes about 10.7 seconds to execute with a suitably patched
Postgres. Whereas on master, it consistently takes a full 69 seconds.
That's the largest improvement I've seen so far, for any case.

I must admit that this did surprise me, but then I don't grok tape
sort. What's particularly interesting here is that when work_mem is
cranked up to 512MB, which is a high setting, but still not high
enough to do an internal sort, the difference closes in a bit. Instead
of 41 runs, there are only 2. Patched now takes 16.3 seconds.
Meanwhile, master is somewhat improved, and consistently takes 65
seconds to complete the sort.

This probably has something to do with CPU cache effects. I believe
that all world class external sorting algorithms are cache sensitive.
I'm not sure what the outcome would have been had there not been a
huge amount of memory available for the OS cache to use, which there
was. I think there is probably something to learn about how to improve
tape sort here.

Does anyone recall hearing complaints around higher work_mem settings
regressing performance?

[1] 
http://www.isi.edu/integration/people/lerman/load.html?src=http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html
, bottom of page
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-05 Thread Alvaro Herrera
FWIW I think I haven't responded appropriately to the points raised by
Heikki.  Basically, as I see it there are three main items:

1. the revmap physical-to-logical mapping is too complex; let's use
something else.

We had revmap originally in a separate fork.  The current approach grew
out of the necessity of putting it in the main fork while ensuring that
fast access to individual pages is possible.  There are of course many
ways to skin this cat; Heikki's proposal is to have it always occupy the
first few physical pages, rather than require a logical-to-physical
mapping table.  To implement this he proposes to move other pages out of
the way as the index grows.  I don't really have much love for this
idea.  We can change how this is implemented later in the cycle, if we
find that a different approach is better than my proposal.  I don't want
to spend endless time meddling with this (and I definitely don't want to
have this delay the eventual commit of the patch.)


2. vacuuming is not optimal

Right now, to eliminate garbage index tuples we need to first scan
the revmap to figure out which tuples are unreferenced.  There is a
concern that if there's an excess of dead tuples, the index becomes
unvacuumable because palloc() fails due to request size.  This is
largely theoretical because in order for this to happen there need to be
several million dead index tuples.  As a minimal fix to alleviate this
problem without requiring a complete rework of vacuuming, we can cap
that palloc request to maintenance_work_mem and remove dead tuples in a
loop instead of trying to remove all of them in a single pass.

Another thing proposed was to store range numbers (or just heap page
numbers) within each index tuple.  I felt that this would add more bloat
unnecessarily.  However, there is some padding space in index tuple that
maybe we can use to store range numbers.  I will think some more about
how we can use this to simplify vacuuming.


3. avoid MMTuple as it is just unnecessary extra complexity.

The main thing that MMTuple adds is not the fact that we save 2 bytes
by storing BlockNumber as is instead of within a TID field.  Instead,
it's that we can construct and deconstruct using our own design, which
means we can use however many Datum entries we want and however many
null flags.  In normal heap and index tuples, there are always the
same number of datum/nulls.  In minmax, the number of nulls is twice the
number of indexed columns; the number of datum values is determined by
how many datum values are stored per opclass (sortable opclasses
store 2 columns, but geometry would store only one).  If we were to use
regular IndexTuples, we would lose that .. and I have no idea how it
would work.  In other words, MMTuples look fine to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to debug into InitPostgres() and InitCatalogCache()?

2014-08-05 Thread Alvaro Herrera
土卜皿 wrote:
 2014-08-05 22:08 GMT+08:00 土卜皿 pengcz.n...@gmail.com:
 
  hi, all
I already can debug general postgres using ddd and select
  pg_backend_pid(); ,  now, I want to study the details of the system
  catalog cache and system cache management, so I need to debug the function
  InitPostgres() and InitCatalogCache(), and I tried the following steps:

   For debug initialization including system catalog cache, I add some code
 like:
 
  bool forDebug = true;
 
 while (forDebug){
   forDebug = true;
 }
 
 in the InitPosgres()'s starting position.

There's also the -W switch, which makes it wait precisely to let you
attach a debugger.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Append to a GUC parameter ?

2014-08-05 Thread Fabrízio de Royes Mello
On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Josh Berkus j...@agliodbs.com writes:
  BTW, while there's unlikely to be a good reason to put search_path in
  pg.conf with appends, there are a LOT of reasons to want to be able to
  append to it during a session.

 [shrug...]  You can do that today with current_setting()/set_config().


+1

With a very simple statement you can do that:

fabrizio=# SELECT current_setting('search_path');
 current_setting
-
 $user,public
(1 row)

fabrizio=# SELECT set_config('search_path',
current_setting('search_path')||', another_schema', false);
   set_config

 $user,public, another_schema
(1 row)


Or you can create a simple sql function to concat your configs:

fabrizio=# CREATE FUNCTION concat_config(name TEXT, value TEXT, is_local
BOOLEAN)
fabrizio-# RETURNS text
fabrizio-# LANGUAGE sql
fabrizio-# AS $$ SELECT set_config(name, current_setting(name)||','||value,
is_local); $$;
CREATE FUNCTION
fabrizio=# SELECT concat_config('search_path', 'another_schema', false);
 concat_config
---
 $user,public,another_schema
(1 row)

fabrizio=# SELECT concat_config('search_path', 'schema2', false);
 concat_config
---
 $user,public,another_schema,schema2
(1 row)


Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Noah Misch
On Tue, Aug 05, 2014 at 07:32:35PM -0700, Peter Geoghegan wrote:
 select * from (select * from tags order by tag offset 1) ii;
 
 Git master takes about 25 seconds to execute the query. Patched takes
 about 6.8 seconds. That seems very good, but this is not really new
 information.
 
 However, with work_mem set low enough to get an external sort, the
 difference is more interesting. If I set work_mem to 10 MB, then the
 query takes about 10.7 seconds to execute with a suitably patched
 Postgres. Whereas on master, it consistently takes a full 69 seconds.
 That's the largest improvement I've seen so far, for any case.

Comparator cost affects external sorts more than it affects internal sorts.
When I profiled internal and external int4 sorting, btint4cmp() was 0.37% of
the internal sort profile and 10.26% of the external sort profile.

 I must admit that this did surprise me, but then I don't grok tape
 sort. What's particularly interesting here is that when work_mem is
 cranked up to 512MB, which is a high setting, but still not high
 enough to do an internal sort, the difference closes in a bit. Instead
 of 41 runs, there are only 2. Patched now takes 16.3 seconds.
 Meanwhile, master is somewhat improved, and consistently takes 65
 seconds to complete the sort.

 Does anyone recall hearing complaints around higher work_mem settings
 regressing performance?

Jeff Janes has mentioned it:
http://www.postgresql.org/message-id/CAMkU=1zVD82voXw1vBG1kWcz5c2G=supgohpkm0thwmprk1...@mail.gmail.com

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Peter Geoghegan
On Tue, Aug 5, 2014 at 8:55 PM, Noah Misch n...@leadboat.com wrote:
 Comparator cost affects external sorts more than it affects internal sorts.
 When I profiled internal and external int4 sorting, btint4cmp() was 0.37% of
 the internal sort profile and 10.26% of the external sort profile.

Did you attempt to characterize where wall time was being spent? With
a difference like that, it seems likely that it's largely down to the
fact that quicksort is cache oblivious, rather than, say, that there
were more comparisons required.

 I must admit that this did surprise me, but then I don't grok tape
 sort. What's particularly interesting here is that when work_mem is
 cranked up to 512MB, which is a high setting, but still not high
 enough to do an internal sort, the difference closes in a bit. Instead
 of 41 runs, there are only 2. Patched now takes 16.3 seconds.
 Meanwhile, master is somewhat improved, and consistently takes 65
 seconds to complete the sort.

 Does anyone recall hearing complaints around higher work_mem settings
 regressing performance?

 Jeff Janes has mentioned it:
 http://www.postgresql.org/message-id/CAMkU=1zVD82voXw1vBG1kWcz5c2G=supgohpkm0thwmprk1...@mail.gmail.com

I knew that I'd heard that at least once. Apparently some other
database systems have external sorts that tend to be faster than
equivalent internal sorts. I'd guess that that is an artifact of their
having a substandard internal sort, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-08-05 Thread Noah Misch
On Tue, Aug 05, 2014 at 09:32:59PM -0700, Peter Geoghegan wrote:
 On Tue, Aug 5, 2014 at 8:55 PM, Noah Misch n...@leadboat.com wrote:
  Comparator cost affects external sorts more than it affects internal sorts.
  When I profiled internal and external int4 sorting, btint4cmp() was 0.37% of
  the internal sort profile and 10.26% of the external sort profile.
 
 Did you attempt to characterize where wall time was being spent?

No.  I have the perf data files, if you'd like to see any particular report.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-08-05 Thread furuyao
  I have improved the patch  by making following changes:
 
  1. Since stream_stop() was redundant, stream_stop() at the time of
 WAL file closing was deleted.
 
  2. Change the Flash judging timing for the readability of source code.
 I have changed the Flash judging timing , from the continuous
 message after receiving to
 before the feedbackmassege decision of continue statement after
 execution.
 
  Thanks for the updated version of the patch!
 
  While reviewing the patch, I found that HandleCopyStream() is still
  long and which decreases the readability of the source code.
  So I feel inclined to refactor the HandleCopyStream() more for better
  readability. What about the attached refactoring patch?
 
 Sorry, I forgot to attached the patch in previous email. So attached.

Thank you for the refactoring patch.
I did a review of the patch. 

-   break;  /* ignore the rest of 
this XLogData packet */

+   return true;/* ignore the rest of this 
XLogData packet */

For break statement at close of wal file, it is a return to true. 
It may be a behavior of continue statement. Is it satisfactory?

The walreceiver distributes XLogWalRcvProcessMsg and XLogWalRcvWrite, but isn't 
that division necessary?

Regards,

--
Furuya Osamu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Incremental Backup

2014-08-05 Thread Simon Riggs
On 6 August 2014 03:16, Bruce Momjian br...@momjian.us wrote:
 On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
 On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 
  On 5 August 2014 22:38, Claudio Freire klaussfre...@gmail.com wrote:
  Thinking some more, there seems like this whole store-multiple-LSNs
  thing is too much. We can still do block-level incrementals just by
  using a single LSN as the reference point. We'd still need a complex
  file format and a complex file reconstruction program, so I think that
  is still next release. We can call that INCREMENTAL BLOCK LEVEL.

 Yes, that's the approach taken by pg_rman for its block-level
 incremental backup. Btw, I don't think that the CPU cost to scan all
 the relation files added to the one to rebuild the backups is worth
 doing it on large instances. File-level backup would cover most of the

 Well, if you scan the WAL files from the previous backup, that will tell
 you what pages that need incremental backup.

That would require you to store that WAL, which is something we hope
to avoid. Plus if you did store it, you'd need to retrieve it from
long term storage, which is what we hope to avoid.

 I am thinking we need a wiki page to outline all these options.

There is a Wiki page.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers