Re: [HACKERS] Passing tabular data around using python functions

2012-07-31 Thread Jan Urbański

On 30/07/12 14:33, Achim Domma wrote:

Hi,


Hi Achim,

this list is meant for discussing the development of PostgreSQL, in the 
future you might want to ask your question on pgsql-general.


However, to answer your question:


I call the function like this:

select * from vectormatch(array(select (docid,conceptid,rank)::fps from fps 
where docid = 4205591))

and get the following output:

NOTICE:type 'list'
CONTEXT:  PL/Python function vectormatch
NOTICE:  ['(4205591,1,1)', '(4205591,1219,1)', ...]
CONTEXT:  PL/Python function vectormatch

I'm quite surprised that there are strings in the list and not tuples!? I tried 
my best, but I have no idea what I might be doing wrong. The main purpose of my 
sample/experiment is, to pass the results of a query to a function and to 
process it there. Any hint would be very appreciated.


Yes, it's a missing feature of PL/Python, but in your case you could 
work around it by writing your function like this:


create or replace function vectormatch(docid integer[], conceptid 
integer[], rank float4[])

returns table(docid integer, weigth float4)
as $$
data = zip(docid, conceptid, rank)
plpy.notice(data)
...
$$ language plpythonu;

and then calling it like this:

select vectormatch(array_agg(docid), array_agg(conceptid), 
array_agg(rank)) from fps where docid = 4205591;


Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Patch to compute Max LSN of Data Pages

2012-07-31 Thread Amit kapila
 Based on the discussion and suggestions in this mail chain, following 
 features can be implemented:

 1. To compute the value of max LSN in data pages based on user input whether 
 he wants it for an individual

   file,  a particular directory or whole database.

 2a. To search the available WAL files for the latest checkpoint record and 
 prints the value.
 2b. To search the available WAL files for the latest checkpoint record and 
 recreates a pg_control file pointing at

 that checkpoint.

 I have kept both options to address different kind of corruption scenarios.

 I think I can see all of those things being potentially useful.  There
 are a couple of pending patches that will revise the WAL format
 slightly; not sure how much those are likely to interfere with any
 development you might do on (2) in the meantime.

Based on above conclusion, I have prepared a patch which implements Option-1



To find the value of max LSN in data pages based on user input whether he wants 
for
- An individual file
- A particular directory
- Whole database

Corresponding pg_resetxlog options are as follows
  -p {file | dir}print max LSN from specified file or directory path
  -P print max LSN from whole database

Note: in case of -p {file | dir} input path should be absolute path or relative 
from data base directory.

These options are useful when pg_control, WAL files and data files are missing 
or corrupted.
Using above options user can able to find the max LSN number and can be able to 
compute the next redo log sequence number.

Sample output:
postgres@linux: pg_resetxlog -P /home/postgres/installation/bin/data
Maximum LSN found is: 73325448, WAL segment file name (fileid, seg): 
0004

Design:
Based on user option display max LSN.
1. Finding max LSN in an individual file [pg_resetxlog option: -p file-name]
A. Open the given file and check for the number of blocks;
B. Read page header and validate; if valid find the max lsn number; if invalid 
log the page-id and filename and continue to next page.

2. Finding max LSN a folder (excluding sub directories) [pg_resetxlog option: 
-p folder-name]
Note: Here we are not traversing through sub directories, as some times it 
may possible to have recursive loops because of soft links
Read all the file in the given folder using ReadDir function
If file name / folder name start with pgsql_tmp ignore and continue to 
next.
Find the max LSN in this file (refer 1. Finding max LSN in an 
individual file)

3. Finding max LSN for whole database [pg_resetxlog option: -P]
A. Read the base directory
Format: pgDataDirecoty/base/databaseid/*
   1. Skip the folder if name is equal to “0” or “1”; [skip 
template database]
2. Form the new folder name as and call the function written in [2. Finding max 
LSN a folder]
B. Read the global directory
pgDataDirecoty/global
Note: here need to exclude the files [pg_controldata, .. ] which 
are taken care in folder reading function.
C. Read all table spaces
   Folder structure: pg_tblspc/table space id/CONSTANT PG VERSION 
STRING/Database ID/relfilenodes.
1. Read all table space names in pg_tblspc/*
1.1. For each folder form the path as
 pg_tblspc/tblspc-folder-name/CONSTANT PG VERSION 
STRING/
1.2. Read all the directories in pg_tblspc/table space id/CONSTANT PG VERSION 
STRING/*
1.2.1. For each folder form the path as “pg_tblspc/ 
tblspc-folder-name /CONSTANT-PG-VERSION STRING/db-id-folder-name”

Comments/Objections?



With Regards,

Amit Kapila.
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c 
b/src/bin/pg_resetxlog/pg_resetxlog.c
index d5d89ec..dcb62d1 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -54,6 +54,20 @@
 #include access/xlog_internal.h
 #include catalog/catversion.h
 #include catalog/pg_control.h
+#include catalog/catalog.h
+#include storage/bufpage.h
+#include storage/fd.h
+
+
+/* Page header size */
+#define PAGEHDRSZ (sizeof(PageHeaderData))
+
+
+/*
+ * relfile nodename validation allow only file name start with digit
+ */
+#define validateRelfilenodename(name) ((name[0] = '0')  (name[0] = '9'))
+
 
 extern int optind;
 extern char *optarg;
@@ -72,6 +86,9 @@ static void FindEndOfXLOG(void);
 static void KillExistingXLOG(void);
 static void KillExistingArchiveStatus(void);
 static void WriteEmptyXLOG(void);
+static void FindMaxLSNinFile(char *filename, XLogRecPtr *maxlsn);
+static void FindMaxLSNinDir(char *path, XLogRecPtr *maxlsn);
+static void FindMaxLSNinPgData(XLogRecPtr *maxlsn);
 static void usage(void);
 
 
@@ -92,6 +109,10 @@ main(int argc, char *argv[])
char   *DataDir;
int fd;
charpath[MAXPGPATH];
+   boolprint_max_lsn = false;
+   boolprint_pgdata_max_lsn = false;
+   

Re: [HACKERS] pgsql_fdw in contrib

2012-07-31 Thread Etsuro Fujita
Hi KaiGai-san,

Sorry about the delay in answering.  I have been swamped with another thing
lately.

 BTW, your patch does not make sense in my environment that is just
 after initdb without any parameter customizing. Could you give us
 the step to reproduce the Nested-Loop plan from Hash-Join?

I examined both the Nested-Loop and Hash-Join plans by using the enable_x
options, and found that their total costs are almost equal.  The result might
depend on the execution environment (My environment is Linux 2.6.18 on x86_64.).
I think it is preferable to use the enable_x options for this regression
test like src/test/regress/sql/join.sql.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] JSON function reference in docs

2012-07-31 Thread Thom Brown
Hi,

Could we add a reference from the JSON data type page to the JSON
functions page akin to how we do for the XML data type?  Something
like the attached patch.

Cheers

Thom


json_docs_link.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-31 Thread Merlin Moncure
On Mon, Jul 30, 2012 at 10:26 PM, Jan Wieck janwi...@yahoo.com wrote:
 On 7/30/2012 10:31 PM, Leon Smith wrote:

 This is not necessarily true,  on multiple levels.   I mean,  some of
 the programs I write are highly concurrent,  and this form of batching
 would have almost no risk of stalling the network buffer.And
 the possible use case would be when you are dealing with very small
 rows,  when there would typically be several rows inside a single
 network packet or network buffer.


 With highly concurrent you mean multi-threaded? Like one thread reads the
 rows in batches and pushes them into a queue while another thread processes
 them from that queue?

 If that is the case, then you just added a useless layer of buffering and
 the need for thread/thread context switches to PQsetSingleRowMode. Libpq's
 receiver thread is the kernel itself. Libpq tries to never read partial
 kernel buffers already. It always makes sure that there are at least 8K of
 free space in the inBuffer. In the case you describe above, where several
 rows fit into a single packet, libpq will receive them with a single system
 call in one read(2), then the application can get them as fast as possible,
 without causing any further context switches because they are already in the
 inBuffer.

Yeah: with asynchronous query processing the query gets sent and
control returns immediately to your code: that's the whole point.
Even if some data races to the network buffer, libpq doesn't 'see' any
data until you tell it to by asking for a result (which can block) or
draining the buffers with PQconsumeInput.  So there is no race in the
traditional sense and I'm ok with the PQsetSingleRowMode as such.

Removing malloc/free on row iteration seems only to be possible via
one of two methods: either a) you introduce a non-PGresult based
method of data extraction or b) you preserve the PGresult across row
iterations.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] several problems in pg_receivexlog

2012-07-31 Thread Alvaro Herrera

Excerpts from Magnus Hagander's message of jue jul 12 07:35:11 -0400 2012:
 On Tue, Jul 10, 2012 at 6:45 PM, Fujii Masao masao.fu...@gmail.com wrote:

  When an error happens after replication connection has been established,
  pg_receivexlog doesn't close an open file descriptor and release an 
  allocated
  memory area. This was harmless before 
  16282ae688de2b320cf176e9be8a89e4dfc60698
  because pg_receivexlog exits immediately when an error happens. But
  currently in an error case, pg_receivexlog tries reconnecting to the server
  infinitely, so file descriptors and memory would leak. I think this is 
  problem
  and should be fixed. The patch which I submitted yesterday changes
  pg_receivexlog so that it closes the open file and frees the memory area
  before reconnecting to the server.
 
 Thanks. I get it now, and this explains why I didn't see it before - I
 didn't check properly after we added the loop mode. Patch applied with
 minor changes (e.g. there's no point in doing PQfinish(tmpconn) right
 after you've verified tmpconn is NULL)

For some reason, Magnus neglected to backpatch this to 9.2, so I just
did.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] several problems in pg_receivexlog

2012-07-31 Thread Alvaro Herrera

Excerpts from Fujii Masao's message of mar jul 17 13:58:38 -0400 2012:

  You're right. If the error is detected, that function always returns false
  and the error message is emitted (but I think that current error message
  pg_basebackup: child process exited with error 1 is confusing),
  so it's OK. But if walsender in the server is terminated by SIGTERM,
  no error is detected and pg_basebackup background process gets out
  of the loop in ReceiveXlogStream() and returns true.
 
  Oh. Because the server does a graceful shutdown. D'uh, of course.
 
  Then yes, your suggested fix seems like a good one.
 
 Attached patch adds the fix.
 
 Also I found I had forgotten to set the file descriptor to -1 at the end of
 ReceiveXlogStream(), in previously-committed my patch. Attached patch
 fixes this problem.

This hasn't been committed yet AFAICT, and it probably needs a refresh
now after my changes to pg_basebackup.  Please update the patch.  Also,
if this is not in the Open Items list, please put it there so that we
don't forget it before the 9.2 release.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fixing syslogger rotation logic for first-time case

2012-07-31 Thread Tom Lane
We've had a couple of complaints recently from people who were unhappy
because the syslogger's log_truncate_on_rotation logic does not fire
during the first log rotation after it's forked off from the postmaster.
The key reason for that was that to know whether to truncate or not,
the code has to know if the rotation actually changed to a new file
name, and it did not have that information inherited from the
postmaster.  The attached patch deals with that problem by passing down
the pg_time_t that the log file name is computed from, and then
reconstructing the file name.  This is kind of the hard way in Unix-oid
platforms: we could just let the malloc'd file name hang around through
the fork.  But on Windows it would be necessary to include the file name
in the BackendParameters struct that's built on every child process
launch, and that seemed pretty costly, considering the overwhelming
majority of postmaster children don't need it.  So I did it like this.

Any objections?

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index bbf725dfbc5aa58a5b43f15998e2c0424906ecfb..bfef707050a82f7842b1e2da550efcfa4882ec82 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*** typedef struct
*** 441,446 
--- 441,447 
  	pid_t		PostmasterPid;
  	TimestampTz PgStartTime;
  	TimestampTz PgReloadTime;
+ 	pg_time_t	first_syslogger_file_time;
  	bool		redirection_done;
  	bool		IsBinaryUpgrade;
  	int			max_safe_fds;
*** MaxLivePostmasterChildren(void)
*** 4701,4707 
  
  /*
   * The following need to be available to the save/restore_backend_variables
!  * functions
   */
  extern slock_t *ShmemLock;
  extern LWLock *LWLockArray;
--- 4702,4708 
  
  /*
   * The following need to be available to the save/restore_backend_variables
!  * functions.  They are marked NON_EXEC_STATIC in their home modules.
   */
  extern slock_t *ShmemLock;
  extern LWLock *LWLockArray;
*** extern slock_t *ProcStructLock;
*** 4709,4714 
--- 4710,4716 
  extern PGPROC *AuxiliaryProcs;
  extern PMSignalData *PMSignalState;
  extern pgsocket pgStatSock;
+ extern pg_time_t first_syslogger_file_time;
  
  #ifndef WIN32
  #define write_inheritable_socket(dest, src, childpid) ((*(dest) = (src)), true)
*** save_backend_variables(BackendParameters
*** 4761,4766 
--- 4763,4769 
  	param-PostmasterPid = PostmasterPid;
  	param-PgStartTime = PgStartTime;
  	param-PgReloadTime = PgReloadTime;
+ 	param-first_syslogger_file_time = first_syslogger_file_time;
  
  	param-redirection_done = redirection_done;
  	param-IsBinaryUpgrade = IsBinaryUpgrade;
*** restore_backend_variables(BackendParamet
*** 4985,4990 
--- 4988,4994 
  	PostmasterPid = param-PostmasterPid;
  	PgStartTime = param-PgStartTime;
  	PgReloadTime = param-PgReloadTime;
+ 	first_syslogger_file_time = param-first_syslogger_file_time;
  
  	redirection_done = param-redirection_done;
  	IsBinaryUpgrade = param-IsBinaryUpgrade;
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 919cc49fa945f27642f8723fefec486cc6657478..0febf64d87f2b337b70e4c9e49d4399b6aa03c0e 100644
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***
*** 2,8 
   *
   * syslogger.c
   *
!  * The system logger (syslogger) is new in Postgres 8.0. It catches all
   * stderr output from the postmaster, backends, and other subprocesses
   * by redirecting to a pipe, and writes it to a set of logfiles.
   * It's possible to have size and age limits for the logfile configured
--- 2,8 
   *
   * syslogger.c
   *
!  * The system logger (syslogger) appeared in Postgres 8.0. It catches all
   * stderr output from the postmaster, backends, and other subprocesses
   * by redirecting to a pipe, and writes it to a set of logfiles.
   * It's possible to have size and age limits for the logfile configured
*** static bool pipe_eof_seen = false;
*** 91,96 
--- 91,97 
  static bool rotation_disabled = false;
  static FILE *syslogFile = NULL;
  static FILE *csvlogFile = NULL;
+ NON_EXEC_STATIC pg_time_t first_syslogger_file_time = 0;
  static char *last_file_name = NULL;
  static char *last_csv_file_name = NULL;
  static Latch sysLoggerLatch;
*** SysLoggerMain(int argc, char *argv[])
*** 291,296 
--- 292,304 
  		elog(FATAL, could not create syslogger data transfer thread: %m);
  #endif   /* WIN32 */
  
+ 	/*
+ 	 * Remember active logfile's name.  We recompute this from the reference
+ 	 * time because passing down just the pg_time_t is a lot cheaper than
+ 	 * passing a whole file path in the EXEC_BACKEND case.
+ 	 */
+ 	last_file_name = logfile_getname(first_syslogger_file_time, NULL);
+ 
  	/* remember active logfile parameters */
  	currentLogDir = pstrdup(Log_directory);
  	currentLogFilename = 

Re: [HACKERS] several problems in pg_receivexlog

2012-07-31 Thread Fujii Masao
On Wed, Aug 1, 2012 at 12:09 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Fujii Masao's message of mar jul 17 13:58:38 -0400 2012:

  You're right. If the error is detected, that function always returns false
  and the error message is emitted (but I think that current error message
  pg_basebackup: child process exited with error 1 is confusing),
  so it's OK. But if walsender in the server is terminated by SIGTERM,
  no error is detected and pg_basebackup background process gets out
  of the loop in ReceiveXlogStream() and returns true.
 
  Oh. Because the server does a graceful shutdown. D'uh, of course.
 
  Then yes, your suggested fix seems like a good one.

 Attached patch adds the fix.

 Also I found I had forgotten to set the file descriptor to -1 at the end of
 ReceiveXlogStream(), in previously-committed my patch. Attached patch
 fixes this problem.

 This hasn't been committed yet AFAICT, and it probably needs a refresh
 now after my changes to pg_basebackup.  Please update the patch.

I attached the updated version.

  Also,
 if this is not in the Open Items list, please put it there so that we
 don't forget it before the 9.2 release.

Yep, done.

Regards,

-- 
Fujii Masao


pgreceivexlog_check_stoppoint_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-31 Thread Jeff Davis
On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote: 
 So, do we want a TODO item about adding columns to a unique index that
 will not be used for uniqueness checks?

-1 from me, at least in its current form.

At it's heart, this is about separating the constraint from the index
that enforces it -- you'd like the columns to be available for querying
(for index only scans or otherwise), but not to take part in the
constraint.

And when you look at it from that perspective, this proposal is and
extremely limited form. You can't, for example, decide that an existing
index can be used for a new unique constraint. That's a lot more
plausible than the use cases mentioned in this thread as far as I can
see, but this proposal can't do that.

I tried proposing a more general use case when developing exclusion
constraints:

http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis

(allow user to specify multiple constraints enforced by one existing
index). But, at least at the time, my proposal didn't pass the
usefulness test:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php

even though my proposal was strictly more powerful than this one is.

Also, this proposal extends the weird differences between CREATE UNIQUE
INDEX and a the declaration of a UNIQUE constraint. For example, if you
want DEFERRABLE you need to declare the constraint, but if you want to
use an expression (rather than a simple column reference) you need to
create the index. This problem does not exist with exclusion
constraints.

In my opinion, new innovations in unique constraints would be better
served as part of exclusion constraints, and we should keep unique
constraints simple. If we make an improvement to UNIQUE, then we will
want to do similar things for exclusion constraints anyway, so it just
seems duplicative.

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQLs Extension

2012-07-31 Thread Anderson C. Carniel

Hi!
Thanks for quick response. 
 you can create new data types, new operators, new types of indexes if
 you want... the question is, what are you trying to do? 
I want to define a new type of geographic data, as well as the PostGIS's data. 
Also, my intention is to define new topological operators (eg intersection). 
For this, I wonder if using the approach of User-Defined Types is best. With 
this approach can I change the query processing to handle the new type the way 
I want? This approach directly affects the performance in query processing?

 what kind of changes?

Also, maybe I'll make changes to the SQL language and implement them in 
PostgreSQL. Per example, changes in syntax and define new reserved words (for 
any specific treatment, when it detected my data type).
Thanks for help.
Best regards,
Anderson Carniel[]s
 From: ja...@2ndquadrant.com
 Date: Fri, 27 Jul 2012 19:51:53 -0500
 Subject: Re: [HACKERS] PostgreSQLs Extension
 To: accarn...@gmail.com
 CC: pgsql-hackers@postgresql.org
 
 On Thu, Jul 26, 2012 at 2:56 PM, Anderson C. Carniel
 accarn...@gmail.com wrote:
 
  - I need to set a new data type specific, as the PostGIS. I saw that in
  PostgreSQL's Documentation there is a User-Defined Types. This is the best
  way to define a new data type? Using this approach, can I define the way
  queries are processed and thus define new operators? Or would I define
  functions via pgsql for this? It was not clear to me.
 
 you can create new data types, new operators, new types of indexes if
 you want... the question is, what are you trying to do?
 
  - Also, how could I make changes to the SQL language in PostgreSQL?
 
 
 what kind of changes?
 
 -- 
 Jaime Casanova www.2ndQuadrant.com
 Professional PostgreSQL: Soporte 24x7 y capacitación
  

[HACKERS] Help me develop new commit_delay advice

2012-07-31 Thread Peter Geoghegan
Many of you will be aware that the behaviour of commit_delay was
recently changed. Now, the delay only occurs within the group commit
leader backend, and not within each and every backend committing a
transaction:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f11e8be3e812cdbbc139c1b4e49141378b118dee

For those of you that didn't follow this development, I should point
out that I wrote a blogpost that described the idea, which will serve
as a useful summary:

http://pgeoghegan.blogspot.com/2012/06/towards-14000-write-transactions-on-my.html

I made what may turn out to be a useful observation during the
development of the patch, which was that for both the tpc-b.sql and
insert.sql pgbench-tools scripts, a commit_delay of half of my
wal_sync_method's reported raw sync speed looked optimal. I use Linux,
so my wal_sync_method happened to have been fdatasync. I measured this
using pg_test_fsync.

The devel docs still say of commit_delay and commit siblings: Good
values for these parameters are not yet clear; experimentation is
encouraged. This has been the case since Postgres 7.1 (i.e. it has
never been clear what good values were - the folk wisdom was actually
that commit_delay should always be set to 0). I hope to be able to
formulate some folk wisdom about setting commit_delay from 9.3 on,
that may go on to be accepted as an official recommendation within the
docs.

I am rather curious as to what experimentation shows optimal values
for commit_delay to be for a representative cross-section of hardware.
In particular, I'd like to see if setting commit_delay to half of raw
sync time appears to be optimal for both insert.sql and tpc-b.sql
workloads across different types of hardware with different sync
times. Now, it may be sort of questionable to take those workloads as
general proxies for performance, not least since they will literally
give Postgres as many *completely uniform* transactions as it can
handle. However, it is hard to think of another, better general proxy
for performance that is likely to be accepted as such, and will allows
us to reason about setting commit_delay.

While I am not completely confident that we can formulate a widely
useful, simple piece of advice, I am encouraged by the fact that a
commit_delay of 4,000 worked very well for both tpc-b.sql and
insert.sql workloads on my laptop, beating out settings of 3,000 and
5,000 on each benchmark. I am also encouraged by the fact that in some
cases, including both the insert.sql and tpc-b.sql cases that I've
already described elsewhere, there is actually no downside to setting
commit_delay - transaction throughput naturally improves, but
transaction latency is actually improved a bit too (or at least the
average and worst-cases). This is presumably due to the amelioration
of resource contention (from greater commit batching) more than
compensating for the obvious downside of adding a delay.

It would be useful, for a start, if I had numbers for a battery-backed
write cache. I don't have access to one right now though, nor do I
have access to any more interesting hardware, which is one reason why
I'm asking for help with this.

I like to run sync prior to running pg_test_fsync, just in case.

[peter@peterlaptop pg_test_fsync]$ sync

I then interpret the following output:

[peter@peterlaptop pg_test_fsync]$ pg_test_fsync
2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 112.940 ops/sec
fdatasync 114.035 ops/sec
fsync 21.291 ops/sec
*** SNIP ***

So if I can perform 114.035 8KiB sync operations per second, that's an
average of about 1 per 8.77 milliseconds, or 8770 microseconds to put
it in the units that commit_delay speaks. It is my hope that we will
find that when this number is halved, we will arrive at a figure that
is worth recommending as a general useful setting for commit_delay for
the system. I guess I could gain some additional insight by simply
changing my wal_sync_method, but I'd find it more interesting to look
at organic setups with faster (not slower) sync times than my system's
fdatasync. For those who are able to help me here, I'd like to see
pgbench-tools workloads for both tpc-b.sql and insert.sql with
incrementing values of commit_delay (increments of, say, 1000
microseconds, perhaps with less granularity where it isn't needed),
from 0 to $(1.5 times raw sync speed) microseconds.

Thanks
-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] Learning SQL: nested CTE and UNION

2012-07-31 Thread Tom Lane
Adam Mackler adammack...@gmail.com writes:
 ... But this does not work:

 WITH outmost AS (
   SELECT 1
   UNION (WITH innermost as (SELECT 2)
  SELECT * FROM innermost
  UNION SELECT 3)
 )
 SELECT * FROM outmost;

 Result:
 ERROR:  relation innermost does not exist
 LINE 4:  SELECT * FROM innermost

This is a bug :-(.  The parse analysis code seems to think that WITH can
only be attached to the top level or a leaf-level SELECT within a set
operation tree; but the grammar follows the SQL standard which says
no such thing.  The WITH gets accepted, and attached to the
intermediate-level UNION which is where syntactically it should go,
and then it's entirely ignored during parse analysis.  Will see about
fixing it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] build postgresql on Mac OS X mountain lion with ossp-uuid

2012-07-31 Thread Palle Girgensohn
Hi,

On the new Darwin 10.8 (aka mountain lion), I had to add

#ifdef __APPLE__
#if (__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ = 1080)
#define _XOPEN_SOURCE
#endif
#endif

to the very beginning of contrib/uuid-ossp/uuid-ossp.c to get the build
working with --with-ossp-uuid.

I suggested something similar to the macports guys, but I guess it would
be wise to get this upstream to the postgresql source?

Cheers,
Palle



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] New statistics for WAL buffer dirty writes

2012-07-31 Thread Robert Haas
On Sat, Jul 28, 2012 at 6:33 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 A concern I have is whether the XLogCtlWrite *Write pointer needs to
 be declared volatile, to prevent the compiler from pushing operations
 on them outside of the locks (and so memory barriers) that formally
 protect them.  However I see that existing code with Insert also does
 not use volatile, so maybe my concern is baseless.  Perhaps the
 compiler guarantees to not move operations on pointers over the
 boundaries of function calls?  The pattern elsewhere in the code seems
 to be to use volatiles for things protected by spin-locks (implemented
 by macros) but not for things protected by LWLocks.

Yes, our code is only correct if we assume that the compiler performs
no global optimizations - i.e. no movement of code between functions.

IMHO, the way we have it now is kind of a mess.  SpinLockAcquire and
SpinLockRelease are required to be CPU barriers, but they are not
required to be compiler barriers.  If we changed that so that they
were required to act as barriers of both flavors, then (1) we wouldn't
need volatile in as many places, (2) we would be less prone to bugs
caused by the omission of not-obviously-necessary volatile markings,
and (3) we would remove one possible source of breakage that might be
induced by a globally optimizing compiler.  As things stand today,
making a previously-global function static could result in working
code breaking, because the static function might be inlined where the
global function wasn't.  Ouch.

Anyway, unless and until we make a definitional change of the sort
described above, any pointers used within a spinlock critical section
must be volatile; and pray that the compiler doesn't inline anything
you weren't expecting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] build postgresql on Mac OS X mountain lion with ossp-uuid

2012-07-31 Thread Tom Lane
Palle Girgensohn gir...@pingpong.net writes:
 Hi,
 On the new Darwin 10.8 (aka mountain lion), I had to add

 #ifdef __APPLE__
 #if (__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ = 1080)
 #define _XOPEN_SOURCE
 #endif
 #endif

 to the very beginning of contrib/uuid-ossp/uuid-ossp.c to get the build
 working with --with-ossp-uuid.

This is not a postgres bug; this is an ossp bug, and that's the place
to fix it.  If you try, I suspect you'll find that uuid.h doesn't
compile on ML, quite independently of anything Postgres.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New statistics for WAL buffer dirty writes

2012-07-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 IMHO, the way we have it now is kind of a mess.  SpinLockAcquire and
 SpinLockRelease are required to be CPU barriers, but they are not
 required to be compiler barriers.  If we changed that so that they
 were required to act as barriers of both flavors,

Since they are macros, how do you propose to do that exactly?

I agree that volatile-izing everything in the vicinity is a sucky
solution, but the last time we looked at this there did not seem to
be a better one.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - assign result of query to psql variable

2012-07-31 Thread David Fetter
On Sat, Jul 28, 2012 at 06:11:21PM +0200, Pavel Stehule wrote:
 Hello
 
 2012/7/27 Tom Lane t...@sss.pgh.pa.us:
  Pavel Stehule pavel.steh...@gmail.com writes:
  2012/7/26 David Fetter da...@fetter.org:
  How about
  \gset var1,,,var2,var3...
 
  I don't like this - you can use fake variable - and ignoring some
  variable has no big effect on client
 
  Why assign to a variable you'll never use?
 
  so why you get data from server, when you would not to use it ?
 
  Yeah.  I don't see why you'd be likely to write a select that computes
  columns you don't actually want.
 
  Tom - your proposal release of stored dataset just before next
  statement, not like now on the end of statement?
 
  Huh?  I think you'd assign the values to the variables and then PQclear
  the result right away.
 
 yes - I didn't understand \g mechanism well.
 
 Here is patch - it is not nice at this moment and it is little bit
 longer than I expected - but it works
 
 It supports David's syntax
 
 postgres=# select 'Hello', 'World' \gset a,b
 postgres=# \echo :'a' :'b'
 'Hello' 'World'
 postgres=# select 'Hello', 'World';
  ?column? │ ?column?
 ──┼──
  Hello│ World
 (1 row)
 
 postgres=# \gset a
 to few target variables
 postgres=# \gset a,
 postgres=# \echo :'a'
 'Hello'
 
 Regards
 
 Pavel

Teensy code cleanup (trailing space) and SGML docs added.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
***
*** 1489,1495  testdb=gt;
  way. Use command\i/command for that.) This means that
  if the query ends with (or contains) a semicolon, it is
  immediately executed. Otherwise it will merely wait in the
! query buffer; type semicolon or literal\g/ to send it, or
  literal\r/ to cancel.
  /para
  
--- 1489,1495 
  way. Use command\i/command for that.) This means that
  if the query ends with (or contains) a semicolon, it is
  immediately executed. Otherwise it will merely wait in the
! query buffer; type semicolon, literal\g/ or 
literal\gset/literal to send it, or
  literal\r/ to cancel.
  /para
  
***
*** 1623,1628  Tue Oct 26 21:40:57 CEST 1999
--- 1623,1640 
/varlistentry
  
varlistentry
+ termliteral\gset/literal replaceable 
class=parametervariable/replaceable [ ,replaceable 
class=parametervariable/replaceable ... ] /term
+ 
+ listitem
+ para
+ Sends the current query input buffer to the server and stores
+ the query's target list a corresponding list of psql
+ variables.
+ /para
+ /listitem
+   /varlistentry
+ 
+   varlistentry
  termliteral\h/literal or literal\help/literal literal[ 
replaceable class=parametercommand/replaceable ]/literal/term
  listitem
  para
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
***
*** 748,753  exec_command(const char *cmd,
--- 748,776 
status = PSQL_CMD_SEND;
}
  
+   /* \gset send query and store result */
+   else if (strcmp(cmd, gset) == 0)
+   {
+   boolerror;
+ 
+   pset.gvars = psql_scan_slash_vars(scan_state, error);
+ 
+   if (!pset.gvars)
+   {
+   psql_error(\\%s: missing required argument\n, cmd);
+   status = PSQL_CMD_NOSEND;
+   }
+   else if (error)
+   {
+   psql_error(\\%s: syntax error\n, cmd);
+   status = PSQL_CMD_NOSEND;
+   tglist_free(pset.gvars);
+   pset.gvars = NULL;
+   }
+   else
+   status = PSQL_CMD_SEND;
+   }
+ 
/* help */
else if (strcmp(cmd, h) == 0 || strcmp(cmd, help) == 0)
{
*** a/src/bin/psql/command.h
--- b/src/bin/psql/command.h
***
*** 16,21  typedef enum _backslashResult
--- 16,22 
  {
PSQL_CMD_UNKNOWN = 0,   /* not done parsing yet (internal only) 
*/
PSQL_CMD_SEND,  /* query complete; send off */
+   PSQL_CMD_NOSEND,/* query complete, don't send */
PSQL_CMD_SKIP_LINE, /* keep building query */
PSQL_CMD_TERMINATE, /* quit program */
PSQL_CMD_NEWEDIT,   /* query buffer was changed 
(e.g., via \e) */
*** a/src/bin/psql/common.c
--- b/src/bin/psql/common.c
***
*** 826,831  PrintQueryResults(PGresult *results)

Re: [HACKERS] JSON function reference in docs

2012-07-31 Thread Tom Lane
Thom Brown t...@linux.com writes:
 Could we add a reference from the JSON data type page to the JSON
 functions page akin to how we do for the XML data type?  Something
 like the attached patch.

Seems reasonable; committed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers